# pycoQC CLI Usage

PycoQC CLI can generate a beautiful HTML formatted report containing interactive D3.js plots. On top of it, the CLI can also dump summary information in a JSON formated file allowing easy parsing with third party tools.

The report is dynamically generated depending on the information available in the summary file.

## CLI Usage

### Activate virtual environment

In [2]:
# Using virtualenvwrapper here but can also be done with Conda 
workon pycoQC

(pycoQC) (pycoQC) 

: 1

### Getting help

In [3]:
pycoQC -h

usage: pycoQC [-h] [--version]
              [--summary_file [SUMMARY_FILE [SUMMARY_FILE ...]]]
              [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
              [--bam_file [BAM_FILE [BAM_FILE ...]]]
              [--html_outfile HTML_OUTFILE] [--json_outfile JSON_OUTFILE]
              [--min_pass_qual MIN_PASS_QUAL] [--filter_calibration]
              [--filter_duplicated]
              [--min_barcode_percent MIN_BARCODE_PERCENT]
              [--report_title REPORT_TITLE] [--template_file TEMPLATE_FILE]
              [--config_file CONFIG_FILE] [--sample SAMPLE] [--default_config]
              [-v | -q]

pycoQC computes metrics and generates interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecallers

* Minimal usage
    pycoQC -f sequencing_summary.txt -o pycoQC_output.html
* Including Guppy barcoding file + html output + json output + log output
    pycoQC -f sequencing_summary.txt -b barcoding_sequencing.txt -o pyc

: 1

### Usage examples

#### Basic usage  (quiet mode)

In [5]:
pycoQC \
    -f ./data/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz \
    -o ./results/pycoQC/Albacore-1.2.1_basecall-1D-DNA.html \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Albacore-1.2.1_basecall-1D-DNA.html)

#### JSON data output on top of the html report

A json report can be generated on top (or instead) of the html report

It contains a summarized version of the data collected by pycoQC in a structured and easy to parse format

In [6]:
pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz \
    -o ./results/pycoQC/Guppy-2.1.3_basecall-1D_RNA.html \
    -j ./results/pycoQC/Guppy-2.1.3_basecall-1D_RNA.json

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['9835d20f1d205bdbd1fb4d464ae778de95beab24']
	Reordering runids
		Processing reads with Run_ID 9835d20f1d205bdbd1fb4d464ae778de95beab24 / time offset: 0
	Cast value to appropriate type
	Reindexing dataframe by read_ids
		10,000 Final valid reads
Loading plotting interface
Generating HTML report
	Parsing html config file
	Running method summary
		Computing plot
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
		There is only one run_id
	Running method read_len_1D
		Computing plot
	Running method align_len_1D
		No Alignment information available
	Running method read_qual_1D
		Computing plot
	Running method align_score_1D
		No align score information available
	Running method read_len_

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Guppy-2.1.3_basecall-1D_RNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Guppy-2.1.3_basecall-1D_RNA.json)

#### Including guppy barcoding information

In [7]:
pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz \
    -b ./data/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz \
    -o ./results/pycoQC/Guppy-2.1.3_basecall-1D_DNA_barcode.html

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['c4981b897c2bb47fed99916c19c9bd1bd43267a2']
	Reordering runids
		Processing reads with Run_ID c4981b897c2bb47fed99916c19c9bd1bd43267a2 / time offset: 0
	Cleaning up low frequency barcodes
		0 reads with low frequency barcode unset
	Cast value to appropriate type
	Reindexing dataframe by read_ids
		10,000 Final valid reads
Loading plotting interface
Generating HTML report
	Parsing html config file
	Running method summary
		Computing plot
	Running method barcode_summary
		Computing plot
	Running method run_id_summary
		There is only one run_id
	Running method read_len_1D
		Computing plot
	Running method align_len_1D
		No Alignment information available
	Running method read_qual_1D
		Computing plot
	Running method align_score_1D
		N

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Guppy-2.1.3_basecall-1D_DNA_barcode.html)

#### Matching multiple files with a regex and add a title to report

In [8]:
pycoQC \
    -f ./data/Albacore*RNA* \
    -o ./results/pycoQC/Albacore_all_RNA.html \
    --report_title "All RNA runs"

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		813 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415', '5074e0cd71f372314c30ca5158aab2172d915023', 'c675730269f2f96f300f1cfa613fe89c53b344c3', '2b9163100702bba6ac29d37dbc96ccad740aa05d', 'd0054681152930b21276405d948b115e46968ca6', '71055637dd56eca9416305332eba1ed37bbfffe1', '9835d20f1d205bdbd1fb4d464ae778de95beab24', 'db5916f2fe7957afac1d0aaccdec883342c4bc31', '93fa1ad3ebc8a6e505d991bcb052c2b8ceb278b5', '17b317b994031430f350cda1dc13a72f66572ece']
	Reordering runids
		Processing reads with Run_ID 7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415 / time offset: 0
		Processing reads with Run_ID 5074e0cd71f372314c30ca5158aab2172d915023 / time offset: 5309.74734
		Processing reads with Run_ID c675730269f2f96f300f1cfa613fe89c53b344c3 / time offset: 1591

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Albacore_all_RNA.html)

#### Tweak filtering parameters

* Flag reads with a quality above 8 as pass
* Discard reads aligned on the calibration standard
* Unset value of any barcode found in less than 10% of the reads

In [9]:
pycoQC \
    -f ./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/pycoQC/Albacore-2.1.10_basecall-1D-DNA.html \
    --min_pass_qual 8 \
    --filter_calibration \
    --min_barcode_percent 10

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		40 reads discarded
	Filtering out calibration strand reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['aae4df85078f7fe690547aeb688f2640644f323c', 'f54aa9064eb703797b98c83804bd65541b1ffc1b']
	Reordering runids
		Processing reads with Run_ID aae4df85078f7fe690547aeb688f2640644f323c / time offset: 0
		Processing reads with Run_ID f54aa9064eb703797b98c83804bd65541b1ffc1b / time offset: 72.23357
	Cast value to appropriate type
	Reindexing dataframe by read_ids
		9,960 Final valid reads
Loading plotting interface
Generating HTML report
	Parsing html config file
	Running method summary
		Computing plot
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
		Computing plot
	Running method read_len_1D
		Computing plot
	Running method a

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Albacore-2.1.10_basecall-1D-DNA.html)

#### Including Alignments information for a Bam file

In [10]:
pycoQC \
    -f ./large_data/sample_1_sequencing_summary.txt \
    -a ./large_data/sample_1.bam \
    -o ./results/pycoQC/Guppy-2.3_basecall-1D_alignment-DNA.html \
    -j ./results/pycoQC/Guppy-2.3_basecall-1D_alignment-DNA.json

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['ede7e01619570f500a43fd5d33ff8ab25d1b589b']
	Reordering runids
		Processing reads with Run_ID ede7e01619570f500a43fd5d33ff8ab25d1b589b / time offset: 0
	Cast value to appropriate type
	Reindexing dataframe by read_ids
		12,000 Final valid reads
Loading plotting interface
Generating HTML report
	Parsing html config file
	Running method summary
		Computing plot
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
		There is only one run_id
	Running method read_len_1D
		Computing plot
	Running method align_len_1D
		Computing plot
	Running method read_qual_1D
		Computing plot
	Running method align_score_1D
		Computing plot
	Running method read_len_read_qual_2D
		Computing plot
	Running met

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Guppy-2.3_basecall-1D_alignment-DNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Guppy-2.3_basecall-1D_alignment-DNA.json)

#### Advanced configuration with custon json file

Although we recommend to stick to the default parameters, a json formatted configuration file can be provided to tweak the plots. A default configuration file can be generated using:

In [11]:
pycoQC --default_config

{
  "summary": {
    "plot_title": "Run summary"
  },
  "barcode_summary": {
    "plot_title": "Run summary by barcode"
  },
  "run_id_summary": {
    "plot_title": "Run summary by Run ID"
  },
  "read_len_1D": {
    "plot_title": "Basecalled reads length",
    "color": "lightsteelblue",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "align_len_1D": {
    "plot_title": "Reads alignment length",
    "color": "mediumseagreen",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_qual_1D": {
    "plot_title": "Reads PHRED quality",
    "color": "salmon",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "align_score_1D": {
    "plot_title": "Reads alignment score",
    "color": "sandybrown",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_len_read_qual_2D": {
    "plot_title": "Basecalled reads length vs reads PHRED quality",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 2
  },
  "read_len_align_len_2D": {
    "plot_title": "Basecalled reads length vs alignments length",
 

: 1

To save and edit it redirect the std output to a file and make your changes using your favorite text editor.

To remove a plot from the report, just remove it (or comment it) from the configuration file

The configuration file accept all the arguments of the target plotting functions. For more information refer to the API documentation

In [None]:
pycoQC --default_config > data/pycoQC_config.json

Run pycoQC with `--config` option

In [12]:
pycoQC \
    -f ./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/pycoQC/Albacore-1.7.0_basecall-1D-DNA.html \
    --config ./data/pycoQC_config.json

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		438 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['db5916f2fe7957afac1d0aaccdec883342c4bc31']
	Reordering runids
		Processing reads with Run_ID db5916f2fe7957afac1d0aaccdec883342c4bc31 / time offset: 0
	Cleaning up low frequency barcodes
		0 reads with low frequency barcode unset
	Cast value to appropriate type
	Reindexing dataframe by read_ids
		9,562 Final valid reads
Loading plotting interface
Generating HTML report
	Parsing html config file
	Running method summary
		Computing plot
	Running method read_len_1D
		Computing plot
	Running method read_qual_1D
		Computing plot
	Running method read_len_read_qual_2D
		Computing plot
	Running method output_over_time
		Computing plot
	Running method len_over_time
		Computing plot
	Running method qual_over_time
		Computing plot
	Loadi

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/results/pycoQC/Albacore-1.7.0_basecall-1D-DNA.html)