Skip to content

dridk/pacbio_rna_seq

Repository files navigation

This pipeline was created as part of the GOLD project.

Installation

Dependencies

Install environment from conda

conda env create -n gold -f env.yaml

Usage

Clone the repository

git clone git@github.com:dridk/pacbio_rna_seq.git

Edit config.yaml

  • FASTQ The Fastq file path generated by PacBio Sequencing
  • BARCODE The Fasta file path describing barcodes used by lima for demultiplexing ( see example in repository )
  • PRIMERS The Fasta file describing primers used for PacBio amplicon sequencing ( see example in repository )
  • REFERENCE The fasta reference file used by minimap2 for alignement ( e.g: hg19.fa )

Run the pipeline

Put your_file.fastq generated by PacBio in the same folder than config.yaml and run the following command. You can edit how many threads you want to use with --cores option.

snakemake -Fp --cores 10 --configfile config.yaml 

Output

The pipeline will generate one file per barcode and amplicon. For instance HBB.bc1022.bam contains aligned reads from HBB amplicon and bc1022 barcode identifer.

  • debarcoding.{barcode}--{barcode}.fastq : Demultiplexed reads
  • {amplicon}.{barcode}.fastq : Transcripts reads
  • {amplicon}.{barcode}.bam : Aligned transcripts Reads
  • {amplicon}.{barcode}.bed : Transcripts structures as a bed file
  • {amplicon}.{barcode}.hash.bed : Transcripts structures as a bed file with a unique ID to identify the transcript
  • {amplicon}.{barcode}.hash.png : Distribution plot of transcripts
  • cluster.{amplicon}.png : Transcripts abundance heatmap

For instance, the following heatmap shows transcript abundances for each barcode. Each transcript is identified by a hash number generated from the transcript structure bed file. This make possible to identify transcripts among differents samples.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages