# pycoQC CLI Usage

PycoQC CLI can generate a beautiful HTML formatted report containing interactive D3.js plots. On top of it, the CLI can also dump summary information in a JSON formated file allowing easy parsing with third party tools.

The report is dynamically generated depending on the information available in the summary file.

## CLI Usage

### Activate virtual environment

In [2]:
# Using virtualenvwrapper here but can also be done with Conda 
workon pycoQC

(pycoQC) (pycoQC) 

: 1

### Getting help

In [3]:
pycoQC -h

usage: pycoQC [-h] [--version]
              [--summary_file SUMMARY_FILE [SUMMARY_FILE ...]]
              [--barcode_file BARCODE_FILE [BARCODE_FILE ...]]
              [--html_outfile HTML_OUTFILE] [--json_outfile JSON_OUTFILE]
              [--min_pass_qual MIN_PASS_QUAL] [--filter_calibration]
              [--min_barcode_percent MIN_BARCODE_PERCENT] [--title TITLE]
              [--template_file TEMPLATE_FILE] [--config CONFIG]
              [--default_config] [--list_plots] [--verbose_level {2,1,0}]

pycoQC computes metrics and generates interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecallers

* Minimal usage
    pycoQC -f sequencing_summary.txt -o pycoQC_output.html
* Including Guppy barcoding file and json output
    pycoQC -f sequencing_summary.txt -b barcoding_sequencing.txt -o pycoQC_output.html -j pycoQC_output.json

optional arguments:
  -h, --help            show this help message and exit
  --version             show

: 1

### Usage examples

#### Basic usage 

In [4]:
pycoQC \
    -f data/summary/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz \
    -o data/output/Albacore-1.2.1_basecall-1D-DNA.html

PARSING DATA FILES
Importing raw data from sequencing summary files
Verifying fields and discarding unused columns
Droping lines containing NA values
Sorting run IDs by decreasing throughput
Reordering runids
Reindexing dataframe by read_ids
GENERATING HTML REPORT
	Parsing html config file
	Running method summary
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
	Running method reads_len_1D
	Running method reads_qual_1D
	Running method reads_len_qual_2D
	Running method output_over_time
	Running method len_over_time
	Running method qual_over_time
	Running method barcode_counts
		No barcode information available
	Running method channels_activity
	Loading HTML template
	Rendering plots in d3js
	Writing to HTML file
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-1.2.1_basecall-1D-DNA.html)

#### JSON data output on top of the html report

In [5]:
pycoQC \
    -f data/summary/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz \
    -o data/output/Guppy-2.1.3_basecall-1D_RNA.html \
    -j data/output/Guppy-2.1.3_basecall-1D_RNA.json

PARSING DATA FILES
Importing raw data from sequencing summary files
Verifying fields and discarding unused columns
Droping lines containing NA values
Sorting run IDs by decreasing throughput
Reordering runids
Reindexing dataframe by read_ids
GENERATING HTML REPORT
	Parsing html config file
	Running method summary
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
	Running method reads_len_1D
	Running method reads_qual_1D
	Running method reads_len_qual_2D
	Running method output_over_time
	Running method len_over_time
	Running method qual_over_time
	Running method barcode_counts
		No barcode information available
	Running method channels_activity
	Loading HTML template
	Rendering plots in d3js
	Writing to HTML file
GENERATING JSON REPORT
	Running summary_stats_dict method
	Writing to JSON file
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_RNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_RNA.json)

#### Including guppy barcoding information

In [7]:
pycoQC \
    -f data/summary/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz \
    -b data/summary/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz \
    -o data/output/Guppy-2.1.3_basecall-1D_DNA_barcode.html

PARSING DATA FILES
Importing raw data from sequencing summary files
Importing barcode information from barcode summary files
Verifying fields and discarding unused columns
Droping lines containing NA values
Sorting run IDs by decreasing throughput
Reordering runids
Cleaning up low frequency barcodes
Reindexing dataframe by read_ids
GENERATING HTML REPORT
	Parsing html config file
	Running method summary
	Running method barcode_summary
	Running method run_id_summary
	Running method reads_len_1D
	Running method reads_qual_1D
	Running method reads_len_qual_2D
	Running method output_over_time
	Running method len_over_time
	Running method qual_over_time
	Running method barcode_counts
	Running method channels_activity
	Loading HTML template
	Rendering plots in d3js
	Writing to HTML file
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_DNA_barcode.html)

#### Matching multiple files with a regex and add a title to report

In [6]:
pycoQC \
    -f data/summary/Albacore*RNA* \
    -o data/output/Albacore_all_RNA.html \
    --title "All RNA runs"

PARSING DATA FILES
Importing raw data from sequencing summary files
Verifying fields and discarding unused columns
Droping lines containing NA values
Filtering out zero length reads
Sorting run IDs by decreasing throughput
Reordering runids
Reindexing dataframe by read_ids
GENERATING HTML REPORT
	Parsing html config file
	Running method summary
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
	Running method reads_len_1D
	Running method reads_qual_1D
	Running method reads_len_qual_2D
	Running method output_over_time
	Running method len_over_time
	Running method qual_over_time
	Running method barcode_counts
		No barcode information available
	Running method channels_activity
	Loading HTML template
	Rendering plots in d3js
	Writing to HTML file
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore_all_RNA.html)

#### Tweak filtering parameters

* Flag reads with a quality above 8 as pass
* Discard reads aligned on the calibration standard
* Unset value of any barcode found in less than 10% of the reads
* Increase verbose_level

In [8]:
pycoQC \
    -f data/summary/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o data/output/Albacore-2.1.10_basecall-1D-DNA.html \
    --min_pass_qual 8 \
    --filter_calibration \
    --min_barcode_percent 10 \
    --verbose_level 2

PARSING DATA FILES
Importing raw data from sequencing summary files
	Sequencing summary files found: ['data/summary/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz']
	10,000 reads found in initial file
Verifying fields and discarding unused columns
	1D Run type
	Columns found: ['read_id', 'run_id', 'channel', 'start_time', 'sequence_length_template', 'mean_qscore_template', 'calibration_strand_genome_template']
Droping lines containing NA values
	0 reads discarded
Filtering out zero length reads
	40 reads discarded
Filtering out calibration strand reads
	0 reads discarded
Sorting run IDs by decreasing throughput
	Run-id order ['aae4df85078f7fe690547aeb688f2640644f323c', 'f54aa9064eb703797b98c83804bd65541b1ffc1b']
Reordering runids
	Processing reads with Run_ID aae4df85078f7fe690547aeb688f2640644f323c / time offset: 0
	Processing reads with Run_ID f54aa9064eb703797b98c83804bd65541b1ffc1b / time offset: 72.23357
Reindexing dataframe by read_ids
[pycoQC]
Runtime info
	package_na

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-2.1.10_basecall-1D-DNA.html)

#### Advanced configuration with custon json file

Although we recommend to stick to the default parameters, a json formatted configuration file can be provided to tweak the plots. A default configuration file can be generated using:

In [9]:
pycoQC --default_config > data/pycoQC_config.json

(pycoQC) 

: 1

Make changes to the file in your favorite text editor (channels_activity and barcode_counts plots was removed in this case). For more information refer to the API documentation

Run pycoQC with `--config` option

In [11]:
pycoQC \
    -f data/summary/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o data/output/Albacore-1.7.0_basecall-1D-DNA.html \
    --config data/pycoQC_config.json

PARSING DATA FILES
Importing raw data from sequencing summary files
Verifying fields and discarding unused columns
Droping lines containing NA values
Filtering out zero length reads
Sorting run IDs by decreasing throughput
Reordering runids
Cleaning up low frequency barcodes
Reindexing dataframe by read_ids
GENERATING HTML REPORT
	Parsing html config file
	Running method summary
	Running method barcode_summary
	Running method run_id_summary
	Running method reads_len_1D
	Running method reads_qual_1D
	Running method reads_len_qual_2D
	Running method output_over_time
	Running method len_over_time
	Running method qual_over_time
	Loading HTML template
	Rendering plots in d3js
	Writing to HTML file
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-1.7.0_basecall-1D-DNA.html)