# NanoCount command line usage

### Activate virtual environment

In [1]:
conda activate nanocount

(nanocount) 

: 1

### Running NanoCount

In [2]:
NanoCount --help

usage: NanoCount [-h] [--version] -i ALIGNMENT_FILE [-o COUNT_FILE]
                 [-b FILTER_BAM_OUT] [-l MIN_READ_LENGTH]
                 [-f MIN_QUERY_FRACTION_ALIGNED] [-t EQUIVALENT_THRESHOLD]
                 [-s SCORING_VALUE] [-c CONVERGENCE_TARGET] [-e MAX_EM_ROUNDS]
                 [-x] [-p PRIMARY_SCORE] [-a] [-d MAX_DIST_3_PRIME]
                 [-u MAX_DIST_5_PRIME] [-v] [-q]

NanoCount estimates transcripts abundance from Oxford Nanopore *direct-RNA
sequencing* datasets, using an expectation-maximization approach like RSEM,
Kallisto, salmon, etc to handle the uncertainty of multi-mapping reads

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Input/Output options:
  -i ALIGNMENT_FILE, --alignment_file ALIGNMENT_FILE
                        BAM or SAM file containing aligned ONT dRNA-Seq reads
                        including secondary and supplementary alignment
              

: 1

#### Basic command

In [3]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --max_dist_3_prime 10 --max_dist_5_prime 10
head ./output/tx_counts.tsv

[01;34m## Checking options and input files ##[0m
[01;34m## Initialise Nanocount ##[0m
[32m	Parse Bam file and filter low quality alignments[0m
[32m	Summary of alignments parsed in input bam file[0m
[32m		Valid alignments: 73,488[0m
[32m		Discarded alignment with invalid 5 prime end: 44,969[0m
[32m		Discarded alignment with invalid 3 prime end: 38,527[0m
[32m		Discarded unmapped alignments: 9,545[0m
[32m		Discarded negative strand alignments: 4,515[0m
[32m	Summary of reads filtered[0m
[32m		Reads with valid best alignment: 46,006[0m
[32m		Invalid secondary alignments: 25,727[0m
[32m		Reads with low query fraction aligned: 618[0m
[32m		Valid secondary alignments: 606[0m
[32m		Reads too short: 359[0m
[32m	Generate initial read/transcript compatibility index[0m
[01;34m## Start EM abundance estimate ##[0m
	Progress: 2.00 rounds [00:00, 13.0 rounds/s]
[32m	Exit EM loop after 2 rounds[0m
[32m	Convergence value: 0.0006951205037166979[0m
[01;34m## Summariz

: 1

In [4]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv -3 50
head ./output/tx_counts.tsv

usage: NanoCount [-h] [--version] -i ALIGNMENT_FILE [-o COUNT_FILE]
                 [-b FILTER_BAM_OUT] [-l MIN_READ_LENGTH]
                 [-f MIN_QUERY_FRACTION_ALIGNED] [-t EQUIVALENT_THRESHOLD]
                 [-s SCORING_VALUE] [-c CONVERGENCE_TARGET] [-e MAX_EM_ROUNDS]
                 [-x] [-p PRIMARY_SCORE] [-a] [-d MAX_DIST_3_PRIME]
                 [-u MAX_DIST_5_PRIME] [-v] [-q]
NanoCount: error: unrecognized arguments: -3 50
(nanocount) transcript_name	raw	est_count	tpm
YHR174W_mRNA	0.6345910956591089	29194.997946892963	634591.0956591088
YGR192C_mRNA	0.020279963483023952	933.0	20279.963483023952
YLR110C_mRNA	0.011520236490892493	530.0	11520.236490892494
YOL086C_mRNA	0.008259792201017259	380.0	8259.792201017259
YKL152C_mRNA	0.005455810111724558	251.0	5455.810111724558
YKL060C_mRNA	0.005412337521192888	249.0	5412.337521192888
YDL081C_mRNA	0.005151501978002869	237.0	5151.501978002869
YOR369C_mRNA	0.004477676824761988	206.0	4477.6768247619875
YDL130W_mRNA	0.0041733686910402

: 1

#### Adding extra transcripts information

The `extra_tx_info` option adds a columns with the transcript lengths and also includes all the zero-coverage transcripts in the results

In [5]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --extra_tx_info
head ./output/tx_counts.tsv

[01;34m## Checking options and input files ##[0m
[01;34m## Initialise Nanocount ##[0m
[32m	Parse Bam file and filter low quality alignments[0m
[32m	Summary of alignments parsed in input bam file[0m
[32m		Valid alignments: 150,779[0m
[32m		Discarded unmapped alignments: 9,545[0m
[32m		Discarded alignment with invalid 3 prime end: 6,205[0m
[32m		Discarded negative strand alignments: 4,515[0m
[32m	Summary of reads filtered[0m
[32m		Reads with valid best alignment: 85,200[0m
[32m		Invalid secondary alignments: 60,168[0m
[32m		Valid secondary alignments: 2,626[0m
[32m		Reads with low query fraction aligned: 1,544[0m
[32m		Reads too short: 817[0m
[32m	Generate initial read/transcript compatibility index[0m
[01;34m## Start EM abundance estimate ##[0m
	Progress: 2.00 rounds [00:00, 9.10 rounds/s]
[32m	Exit EM loop after 2 rounds[0m
[32m	Convergence value: 0.002000099856041238[0m
[01;34m## Summarize data ##[0m
[32m	Convert results to dataframe[0m
[32m	Co

: 1

#### Write selected alignment to BAM file

In [None]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv -b ./output/aligned_reads_selected.bam --extra_tx_info
head ./output/tx_counts.tsv

#### Relaxing the equivalence threshold

The default value is 0.9 (90% of the alignment score of the primary alignment) but this value could be lowered to allow more secondary alignments to be included in the uncertainty calculation.
Lowering the value bellow 0.75 might not be relevant and will considerably increase the computation time.

In [6]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --equivalent_threshold 0.8
head ./output/tx_counts.tsv

[01;34m## Checking options and input files ##[0m
[01;34m## Initialise Nanocount ##[0m
[32m	Parse Bam file and filter low quality alignments[0m
[32m	Summary of alignments parsed in input bam file[0m
[32m		Valid alignments: 150,779[0m
[32m		Discarded unmapped alignments: 9,545[0m
[32m		Discarded alignment with invalid 3 prime end: 6,205[0m
[32m		Discarded negative strand alignments: 4,515[0m
[32m	Summary of reads filtered[0m
[32m		Reads with valid best alignment: 85,200[0m
[32m		Valid secondary alignments: 49,096[0m
[32m		Invalid secondary alignments: 13,698[0m
[32m		Reads with low query fraction aligned: 1,544[0m
[32m		Reads too short: 817[0m
[32m	Generate initial read/transcript compatibility index[0m
[01;34m## Start EM abundance estimate ##[0m
	Progress: 17.0 rounds [00:02, 7.31 rounds/s]
[32m	Exit EM loop after 17 rounds[0m
[32m	Convergence value: 0.004896640500345573[0m
[01;34m## Summarize data ##[0m
[32m	Convert results to dataframe[0m
[32m	

: 1

#### verbose mode

Print additional information for QC and debugging

In [7]:
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --equivalent_threshold 0.8  --verbose

[01;34m## Checking options and input files ##[0m
[37m	[DEBUG]: Options summary[0m
[37m	[DEBUG]: 	Package name: NanoCount[0m
[37m	[DEBUG]: 	Package version: 0.2.6[0m
[37m	[DEBUG]: 	Timestamp: 2021-08-17 10:05:04.429901[0m
[37m	[DEBUG]: 	alignment_file: ./data/aligned_reads_sorted.bam[0m
[37m	[DEBUG]: 	count_file: ./output/tx_counts.tsv[0m
[37m	[DEBUG]: 	filter_bam_out: [0m
[37m	[DEBUG]: 	min_read_length: 50[0m
[37m	[DEBUG]: 	discard_suplementary: False[0m
[37m	[DEBUG]: 	min_query_fraction_aligned: 0.5[0m
[37m	[DEBUG]: 	equivalent_threshold: 0.8[0m
[37m	[DEBUG]: 	scoring_value: alignment_score[0m
[37m	[DEBUG]: 	convergence_target: 0.005[0m
[37m	[DEBUG]: 	max_em_rounds: 100[0m
[37m	[DEBUG]: 	extra_tx_info: False[0m
[37m	[DEBUG]: 	primary_score: primary[0m
[37m	[DEBUG]: 	max_dist_3_prime: 50[0m
[37m	[DEBUG]: 	max_dist_5_prime: -1[0m
[37m	[DEBUG]: 	verbose: True[0m
[37m	[DEBUG]: 	quiet: False[0m
[01;34m## Initialise Nanocount ##[0m
[32m	Parse Bam

: 1