Skip to content

Joint analysis of 5mC and 5hmC from single cells

Notifications You must be signed in to change notification settings

cxzhu/SIMPLE-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIMPLE-seq

Processing of SIMPLE-seq datasets.

SIMPLE-seq is a scalable method for joint analysis of 5mC and 5hmC from single cells. This repository provide the scripts for decoding the cellular barcodes of SIMPLE-seq datasets (modified from ligation-based combinatorial barcoding from SPLiT-seq), and for the identification of 5mC and 5hmC sites for individual cells.

Please have the following softwares installed first:

Analysis of SIMPLE-seq datasets include the following steps:

1. Pre-processing

Extract cellular barcode from Read2, map the reads to reference cell_ID, and convert the mapped cell ID samfiles to useable fastq files.

*** Please modification the paths to reference files according to the annotations in the script file.

Use sh shellscrips/01.pre_process_simple_seq_fastq.sh [sample_prefix].

The output of this step includes:

  1. Sample_combined.fq.gz This file is a combined fastq file including Read1 sequences/qualities and barcode sequences extracted from Read2.

  2. Sample_BC.sam This is a temporally file used to assign extracted barcode sequences to Cellular Barcode. Please delete this file if you have successful obtained Sample_BC_cov.fq.gz.

  3. Sample_BC_cov.fq.gz This is the fastq file with Read1 sequences and qualities, the Cellular Barocde and UMI from Read2 are now in ReadName section of the fastq file (and subsequent alignment files).

2. Mapping to the genome

As SIMPLE-seq only introduce "C-to-T" mutations on 5mC and 5hmC sites, we used bowtie2 (instead of other methylation aligner) for mapping.

Use sh shellscrips/02.proc_mapping.sh [sample_prefix].

3. Split the alignment files to 5mC and 5hmC

This step is to split 5mC and 5hmC reads to seperate alignment files (bam files) based on the indicator sequences.

Use perl perlscripts/02.split_modality.pl [sample_sorted.bam].

Three files will be generated, including [sample_sorted.bam_5mC.bam], [sample_sorted.bam_5hmC.bam] and [sample_sorted.bam_other.bam] . Reads cannot be perfectly assigned to 5mC or 5hmC will be written to "XXX_other.bam".

4. Generate cell-to-modification abundance matrices

This step will convert the bam files to an intermediate modification information file and then generate cell-to-modification abundance matrices.

Step.1 perl perlscripts/03.bam2srf.pl [sample_sorted.bam_5mC/5hmC.bam].

Step.2 perl perlscripts/04.srf2mtx.pl [input.rsf] [binsize].

The resulting matrix can be used for downstream single-cell analysis.

About

Joint analysis of 5mC and 5hmC from single cells

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published