GitHub

This pipeline was created as part of the GOLD project.

Installation

Dependencies

python >= 3.9
- seaborn
- pandas
- matplotlib
seqkit
lima
fastqc
minimap2
samtools
bedtools

Install environment from conda

conda env create -n gold -f env.yaml

Usage

Clone the repository

git clone git@github.com:dridk/pacbio_rna_seq.git

Edit config.yaml

FASTQ The Fastq file path generated by PacBio Sequencing
BARCODE The Fasta file path describing barcodes used by lima for demultiplexing ( see example in repository )
PRIMERS The Fasta file describing primers used for PacBio amplicon sequencing ( see example in repository )
REFERENCE The fasta reference file used by minimap2 for alignement ( e.g: hg19.fa )

Run the pipeline

Put your_file.fastq generated by PacBio in the same folder than config.yaml and run the following command. You can edit how many threads you want to use with --cores option.

snakemake -Fp --cores 10 --configfile config.yaml

Output

The pipeline will generate one file per barcode and amplicon. For instance HBB.bc1022.bam contains aligned reads from HBB amplicon and bc1022 barcode identifer.

debarcoding.{barcode}--{barcode}.fastq : Demultiplexed reads
{amplicon}.{barcode}.fastq : Transcripts reads
{amplicon}.{barcode}.bam : Aligned transcripts Reads
{amplicon}.{barcode}.bed : Transcripts structures as a bed file
{amplicon}.{barcode}.hash.bed : Transcripts structures as a bed file with a unique ID to identify the transcript
{amplicon}.{barcode}.hash.png : Distribution plot of transcripts
cluster.{amplicon}.png : Transcripts abundance heatmap

For instance, the following heatmap shows transcript abundances for each barcode. Each transcript is identified by a hash number generated from the transcript structure bed file. This make possible to identify transcripts among differents samples.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Readme.md		Readme.md
Snakefile		Snakefile
barcode.fa		barcode.fa
cluster.ACKR1.png		cluster.ACKR1.png
clustering.py		clustering.py
config.yaml		config.yaml
env.yaml		env.yaml
hash_bed.py		hash_bed.py
plot_hash_bed.py		plot_hash_bed.py
plot_transcript_count.py		plot_transcript_count.py
primers.fa		primers.fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Dependencies

Install environment from conda

Usage

Clone the repository

Edit config.yaml

Run the pipeline

Output

About

Releases

Packages

Languages

dridk/pacbio_rna_seq

Folders and files

Latest commit

History

Repository files navigation

Installation

Dependencies

Install environment from conda

Usage

Clone the repository

Edit config.yaml

Run the pipeline

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages