This pipeline was created as part of the GOLD project.
conda env create -n gold -f env.yaml
git clone git@github.com:dridk/pacbio_rna_seq.git
FASTQ
The Fastq file path generated by PacBio SequencingBARCODE
The Fasta file path describing barcodes used by lima for demultiplexing ( see example in repository )PRIMERS
The Fasta file describing primers used for PacBio amplicon sequencing ( see example in repository )REFERENCE
The fasta reference file used by minimap2 for alignement ( e.g: hg19.fa )
Put your_file.fastq
generated by PacBio in the same folder than config.yaml and run the following command.
You can edit how many threads you want to use with --cores
option.
snakemake -Fp --cores 10 --configfile config.yaml
The pipeline will generate one file per barcode and amplicon. For instance HBB.bc1022.bam contains aligned reads from HBB amplicon and bc1022 barcode identifer.
debarcoding.{barcode}--{barcode}.fastq
: Demultiplexed reads{amplicon}.{barcode}.fastq
: Transcripts reads{amplicon}.{barcode}.bam
: Aligned transcripts Reads{amplicon}.{barcode}.bed
: Transcripts structures as a bed file{amplicon}.{barcode}.hash.bed
: Transcripts structures as a bed file with a unique ID to identify the transcript{amplicon}.{barcode}.hash.png
: Distribution plot of transcriptscluster.{amplicon}.png
: Transcripts abundance heatmap
For instance, the following heatmap shows transcript abundances for each barcode. Each transcript is identified by a hash number generated from the transcript structure bed file. This make possible to identify transcripts among differents samples.