Skip to content

Tutorial: Installing and Running Iso Seq 3 using Conda

Elizabeth Tseng edited this page Apr 8, 2021 · 42 revisions

Last Updated: 4/8/2021

Latest BioConda IsoSeq version: To Be Updated

This tutorial describes how to use the Linux developer's version of IsoSeq and its related downstream analysis under the Anaconda environment. Please refer to our official pbbioconda page for further information on Support, License, Copyright, and Disclaimer. Please report any issues using PBBioconda Issues

Who is this tutorial for?

  • Existing Iso-Seq users who are comfortable using the command line under Linux environment
  • Advanced users with previous experience using conda packages.

Who is this tutorial NOT for?

Iso-Seq is officially available through SMRT Analysis.

Installing IsoSeq using Anaconda

(1) Download the latest version of Anaconda. (2) Install Anaconda according to the tutorial.

bash ~/Downloads/
export PATH=$HOME/anaconda5.2/bin:$PATH

Add export PATH=$HOME/anaconda5.2/bin:$PATH line to .bashrc or .bash_profile in your home directory or you will need to type it everytime you log in.

(3) Confirm that conda is installed and update conda:

conda -V
conda update conda

(4) Create a virtual environment (tutorial). I will call it anaCogent5.2. Type y to agree to the interactive questions.

conda create -n anaCogent5.2 python=3.7 anaconda
source activate anaCogent5.2

Once you have activated the virtualenv, you should see your prompt changing to something like this:


(5) Install additional required libraries:

conda install -n anaCogent5.2 biopython
conda install -n anaCogent5.2 -c bx-python

(6) Install Iso-Seq 3 using bioconda. This will also install LIMA, PacBio's demultiplexing tool, as part of the dependency. Note that IsoSeq works only under Linux environment (Mac OS not supported).

conda install -n anaCogent5.2 -c bioconda isoseq3
conda install -n anaCogent5.2 -c bioconda pbccs

The packages below are optional:

conda install -n anaCogent5.2 -c bioconda pbcoretools # for manipulating PacBio datasets
conda install -n anaCogent5.2 -c bioconda bamtools    # for converting BAM to fasta
conda install -n anaCogent5.2 -c bioconda pysam       # for making CSV reports

Check your isoseq version:

$ isoseq3 --version
isoseq3 3.3.x (commit v3.3.x)
$ ccs --version
ccs 4.2.x (commit v4.2.x)
$ lima --version
lima 1.11.0 (commit v1.11.0)

Running IsoSeq


Please follow the IsoSeq tutorial. Here we list each step as described in the tutorial and explain the output.

0. Generate CCS

If you don't already have CCS, run

ccs [movie].subreads.bam [movie].ccs.bam --min-rq 0.9

Note that for isoseq starting version 3.2, we run Polish for CCS!

1. Classify full-length reads:


lima --isoseq --dump-clips --no-pbi --peek-guess -j 24 ccs.bam primers.fasta demux.bam       

lima identifies and removes the 5' and 3' cDNA primers. If the sample is barcoded, include the barcode as part of the primer. See IsoSeq: Primer removal and demultiplexing.

Use --peek-guess to remove spurious matches (only applicable if you supply multiple primer pairs).

The dumped clips (via --dump-clips) show the clipped primers. bq is the barcode score and bc is the primer index. Here, bc:0 is the Clontech 5' primer including the ATGGG overhang and bc:1 is the Clontech 3' primer. Note that the clips could be in either orientation, but the lima output will orient the output FL read to 5' -> 3'.

>m54254_171121_005529/73335088/0_30 bq:100 bc:0
>m54254_171121_005529/73335088/1953_1978 bq:100 bc:1
>m54254_171121_005529/73335094/0_24 bq:88 bc:1
>m54254_171121_005529/73335094/3386_3415 bq:80 bc:0

If multiple 5'/3' pairs of primers are given, lima will output one <prefix>.<5p>--<3p>.bam for each pair. If you want to analyze all the demultiplexed FL reads together to increase transcript recovery (Example: Same species, different tissues), you must make a combined data set:

dataset create --type ConsensusReadSet combined_demux.consensusreadset.xml \
    prefix.5p--barcode1_3p.bam \
    prefix.5p--barcode2_3p.bam \
    prefix.5p--barcode3_3p.bam ...

To remove polyA tails and artificial concatemers, run isoseq3 refine next.

isoseq3 refine --require-polya combined_demux.consensusreadset.xml primers.fasta flnc.bam

Use --require-polya if your transcripts have a polyA tail.

An intermediate flnc.bam file is produced which contains the FLNC reads. To convert to FASTA format, run:

bamtools convert -format fastq -in flnc.bam > flnc.fastq

Special: What to do for TeloPrime primers

IsoSeq supports variable polyA length. For TeloPrime, we recommend running lima without the As, then running isoseq3 refine with a smaller than default polyA length.

lima --isoseq --dump-clips ccs.bam primers.fasta output.bam

isoseq3 refine --require-polya --min-polya-length 12 output.5p--3p.bam primers.fasta flnc.bam

where primers.fasta is


2. Cluster FLNC reads:


isoseq3 refine --require-polya demux.P5--P3.bam barcodes.fasta flnc.bam
isoseq3 cluster flnc.bam polished.bam --verbose --use-qvs

Note: Because the ccs was run with Polish, the isoseq3 cluster output is already polished! No additional polishing step is required.

After completion, you will see the following files:


NOTE: QVs will not be available for the polished HQ fasta coming out of isoseq3 cluster.

5. Understanding and polished.cluster_report.csv is a CSV file showing which barcode/primers each FLNC read belongs to. It is the output of the isoseq3 refine step.

polished.cluster_report.csv is a CSV file showing which clusters each FLNC read belongs to. It is the output of the isoseq3 cluster step.

Having these two CSV files enables you to run Cupcake scripts such as collapse, get FL counts for each transcript, demux scripts the same way you did for Iso-Seq 1 and 2 output.

An example for (previously named classify_report.csv) is below:


And for polished.cluster_report.csv:


4. Which part of IsoSeq can be parallelized for speed up?

The following parts can be done in parallel:

  • ccs
  • lima

The following step cannot be done in parallel:

  • isoseq3 cluster

As an example, let's say you have three movies, you can run CCS in parallel to get three output: movie1.ccs.bam, movie2.ccs.bam, movie3.ccs.bam.

Now you can run lima and isoseq3 refine on each separately:

lima --isoseq --dump-clips -j 24 movie1.ccs.bam primers.fasta demux1.bam
lima --isoseq --dump-clips -j 24 movie2.ccs.bam primers.fasta demux2.bam 
lima --isoseq --dump-clips -j 24 movie3.ccs.bam primers.fasta demux3.bam 
isoseq3 refine --require-polya demux1.5p--3p.bam primers.fasta flnc1.bam
isoseq3 refine --require-polya demux2.5p--3p.bam primers.fasta flnc2.bam
isoseq3 refine --require-polya demux3.5p--3p.bam primers.fasta flnc3.bam

Now you can create a dataset XML that references the three output:

dataset create --type ConsensusReadSet combined.flnc.xml \
  flnc1.bam flnc2.bam flnc3.bam

Then run the cluster step using as much cores as possible:

isoseq3 cluster combined.flnc.xml polished.bam --verbose --use-qvs

The split cluster output can then be run in parallel again and combined later.

What to do after IsoSeq?

If you have a reference genome, you can follow this tutorial to map the transcripts back to the genome, remove redundancy, and generate GFF output.

If you do not have a reference genome, you may be interested in using Cogent to create gene families and reconstruct the coding portions of the genome.