SIMPLE-seq

Processing of SIMPLE-seq datasets.

SIMPLE-seq is a scalable method for joint analysis of 5mC and 5hmC from single cells. This repository provide the scripts for decoding the cellular barcodes of SIMPLE-seq datasets (modified from ligation-based combinatorial barcoding from SPLiT-seq), and for the identification of 5mC and 5hmC sites for individual cells.

Please have the following softwares installed first:

bowtie, http://bowtie-bio.sourceforge.net/index.shtml
bowtie2, http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
samtools, http://www.htslib.org/ samtools version >= 1.3.1 is required.
Trim_galore, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Optional: FastQC, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Compile the "simplecov" tool:

cd simpleconv

sh make.sh

Analysis of SIMPLE-seq datasets include the following steps:

1. Pre-processing

Extract cellular barcode from Read2, map the reads to reference cell_ID, and convert the mapped cell ID samfiles to useable fastq files.

*** Please modification the paths to reference files according to the annotations in the script file.

Use sh shellscrips/01.pre_process_simple_seq_fastq.sh [sample_prefix].

The output of this step includes:

Sample_combined.fq.gz This file is a combined fastq file including Read1 sequences/qualities and barcode sequences extracted from Read2.
Sample_BC.sam This is a temporally file used to assign extracted barcode sequences to Cellular Barcode. Please delete this file if you have successful obtained Sample_BC_cov.fq.gz.
Sample_BC_cov.fq.gz This is the fastq file with Read1 sequences and qualities, the Cellular Barocde and UMI from Read2 are now in ReadName section of the fastq file (and subsequent alignment files).

2. Mapping to the genome

As SIMPLE-seq only introduce "C-to-T" mutations on 5mC and 5hmC sites, we used bowtie2 (instead of other methylation aligner) for mapping.

Use sh shellscrips/02.proc_mapping.sh [sample_prefix].

3. Split the alignment files to 5mC and 5hmC

This step is to split 5mC and 5hmC reads to seperate alignment files (bam files) based on the indicator sequences.

Use perl perlscripts/02.split_modality.pl [sample_sorted.bam].

Three files will be generated, including [sample_sorted.bam_5mC.bam], [sample_sorted.bam_5hmC.bam] and [sample_sorted.bam_other.bam] . Reads cannot be perfectly assigned to 5mC or 5hmC will be written to "XXX_other.bam".

4. Generate cell-to-modification abundance matrices

This step will convert the bam files to an intermediate modification information file and then generate cell-to-modification abundance matrices.

Step.1 perl perlscripts/03.bam2srf.pl [sample_sorted.bam_5mC/5hmC.bam].

Step.2 perl perlscripts/04.srf2mtx.pl [input.rsf] [binsize].

The resulting matrix can be used for downstream single-cell analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Cell_BC_reference		Cell_BC_reference
perlscripts		perlscripts
shellscripts		shellscripts
simpleconv		simpleconv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIMPLE-seq

Please have the following softwares installed first:

Analysis of SIMPLE-seq datasets include the following steps:

1. Pre-processing

The output of this step includes:

2. Mapping to the genome

3. Split the alignment files to 5mC and 5hmC

4. Generate cell-to-modification abundance matrices

About

Releases

Packages

Languages

cxzhu/SIMPLE-seq

Folders and files

Latest commit

History

Repository files navigation

SIMPLE-seq

Please have the following softwares installed first:

Analysis of SIMPLE-seq datasets include the following steps:

1. Pre-processing

The output of this step includes:

2. Mapping to the genome

3. Split the alignment files to 5mC and 5hmC

4. Generate cell-to-modification abundance matrices

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages