# pycoQC CLI Usage

PycoQC CLI can generate a beautiful HTML formatted report containing interactive D3.js plots. On top of it, the CLI can also dump summary information in a JSON formated file allowing easy parsing with third party tools.

The report is dynamically generated depending on the information available in the summary file.

## CLI Usage

### Activate virtual environment

In [None]:
# Using virtualenvwrapper here but can also be done with Conda 
workon pycoQC

### Getting help

In [6]:
pycoQC -h

usage: pycoQC [-h] [--version]
              [--summary_file [SUMMARY_FILE [SUMMARY_FILE ...]]]
              [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
              [--bam_file [BAM_FILE [BAM_FILE ...]]]
              [--html_outfile HTML_OUTFILE] [--json_outfile JSON_OUTFILE]
              [--min_pass_qual MIN_PASS_QUAL] [--filter_calibration]
              [--min_barcode_percent MIN_BARCODE_PERCENT]
              [--report_title REPORT_TITLE] [--template_file TEMPLATE_FILE]
              [--config_file CONFIG_FILE] [--sample SAMPLE] [--default_config]
              [-v | -q]

pycoQC computes metrics and generates interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecallers

* Minimal usage
    pycoQC -f sequencing_summary.txt -o pycoQC_output.html
* Including Guppy barcoding file + html output + json output + log output
    pycoQC -f sequencing_summary.txt -b barcoding_sequencing.txt -o pycoQC_output.html -j pycoQC_output.jso

: 1

### Usage examples

#### Basic usage  (quiet mode)

In [9]:
pycoQC \
    -f data/summary/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz \
    -o data/output/Albacore-1.2.1_basecall-1D-DNA.html \
    --quiet

Checking arguments values
Initialising parser
Parsing input files
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-1.2.1_basecall-1D-DNA.html)

#### JSON data output on top of the html report

In [3]:
pycoQC \
    -f data/summary/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz \
    -o data/output/Guppy-2.1.3_basecall-1D_RNA.html \
    -j data/output/Guppy-2.1.3_basecall-1D_RNA.json

Checking arguments values
Initialising parser
Parsing input files
	Importing sequencing information from sequencing summary files
	Verifying fields and discarding unused columns
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['9835d20f1d205bdbd1fb4d464ae778de95beab24']
	Reordering runids
		Processing reads with Run_ID 9835d20f1d205bdbd1fb4d464ae778de95beab24 / time offset: 0
	Reindexing dataframe by read_ids
		10,000 Final valid reads
Loading plotting interface
	Parsing html config file
	Running method summary
	Plotting overall reads summary
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
	Plotting reads summary by run_id
	Running method reads_len_1D
	Plotting read length distribution
	Running method reads_qual_1D
	Plotting read quality distribution
	Running method reads_len_qual_2D
	Plotting read length vs read qual

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_RNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_RNA.json)

#### Including guppy barcoding information

In [4]:
pycoQC \
    -f data/summary/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz \
    -b data/summary/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz \
    -o data/output/Guppy-2.1.3_basecall-1D_DNA_barcode.html

Checking arguments values
Initialising parser
Parsing input files
	Importing sequencing information from sequencing summary files
	Verifying fields and discarding unused columns
	Importing barcode information from barcode summary files
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['c4981b897c2bb47fed99916c19c9bd1bd43267a2']
	Reordering runids
		Processing reads with Run_ID c4981b897c2bb47fed99916c19c9bd1bd43267a2 / time offset: 0
	Cleaning up low frequency barcodes
		0 reads with low frequency barcode unset
	Reindexing dataframe by read_ids
		10,000 Final valid reads
Loading plotting interface
	Parsing html config file
	Running method summary
	Plotting overall reads summary
	Running method barcode_summary
	Plotting reads summary by barcode
	Running method run_id_summary
	Plotting reads summary by run_id
	Running method reads_len_1D
	Plotting read length distributi

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Guppy-2.1.3_basecall-1D_DNA_barcode.html)

#### Matching multiple files with a regex and add a title to report

In [7]:
pycoQC \
    -f data/summary/Albacore*RNA* \
    -o data/output/Albacore_all_RNA.html \
    --report_title "All RNA runs"

Checking arguments values
Initialising parser
Parsing input files
	Importing sequencing information from sequencing summary files
	Verifying fields and discarding unused columns
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		813 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415', '5074e0cd71f372314c30ca5158aab2172d915023', 'c675730269f2f96f300f1cfa613fe89c53b344c3', '2b9163100702bba6ac29d37dbc96ccad740aa05d', 'd0054681152930b21276405d948b115e46968ca6', '71055637dd56eca9416305332eba1ed37bbfffe1', '9835d20f1d205bdbd1fb4d464ae778de95beab24', 'db5916f2fe7957afac1d0aaccdec883342c4bc31', '93fa1ad3ebc8a6e505d991bcb052c2b8ceb278b5', '17b317b994031430f350cda1dc13a72f66572ece']
	Reordering runids
		Processing reads with Run_ID 7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415 / time offset: 0
		Processing reads with Run_ID 5074e0cd71f372314c30ca5158aab2172d915023 / time offset: 5309.74734
		Pr

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore_all_RNA.html)

#### Tweak filtering parameters

* Flag reads with a quality above 8 as pass
* Discard reads aligned on the calibration standard
* Unset value of any barcode found in less than 10% of the reads

In [12]:
pycoQC \
    -f data/summary/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o data/output/Albacore-2.1.10_basecall-1D-DNA.html \
    --min_pass_qual 8 \
    --filter_calibration \
    --min_barcode_percent 10

Checking arguments values
Initialising parser
Parsing input files
	Importing sequencing information from sequencing summary files
	Verifying fields and discarding unused columns
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		40 reads discarded
	Filtering out calibration strand reads
		0 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['aae4df85078f7fe690547aeb688f2640644f323c', 'f54aa9064eb703797b98c83804bd65541b1ffc1b']
	Reordering runids
		Processing reads with Run_ID aae4df85078f7fe690547aeb688f2640644f323c / time offset: 0
		Processing reads with Run_ID f54aa9064eb703797b98c83804bd65541b1ffc1b / time offset: 72.23357
	Reindexing dataframe by read_ids
		9,960 Final valid reads
Loading plotting interface
	Parsing html config file
	Running method summary
	Plotting overall reads summary
	Running method barcode_summary
		No barcode information available
	Running method run_id_summary
	Plotting reads summary by run_i

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-2.1.10_basecall-1D-DNA.html)

#### Advanced configuration with custon json file

Although we recommend to stick to the default parameters, a json formatted configuration file can be provided to tweak the plots. A default configuration file can be generated using:

In [16]:
pycoQC --default_config

{
  "summary": {
    "plot_title": "Run summary"
  },
  "barcode_summary": {
    "plot_title": "Run summary by barcode"
  },
  "run_id_summary": {
    "plot_title": "Run summary by Run ID"
  },
  "reads_len_1D": {
    "plot_title": "Distribution of read length",
    "color": "lightsteelblue",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "reads_qual_1D": {
    "plot_title": "Distribution of read quality",
    "color": "salmon",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "reads_len_qual_2D": {
    "plot_title": "Mean read quality per sequence length",
    "colorscale": [
      [
        0.0,
        "rgba(255,255,255,0)"
      ],
      [
        0.1,
        "rgba(255,150,0,0)"
      ],
      [
        0.25,
        "rgb(255,100,0)"
      ],
      [
        0.5,
        "rgb(200,0,0)"
      ],
      [
        0.75,
        "rgb(120,0,0)"
      ],
      [
        1.0,
        "rgb(70,0,0)"
      ]
    ],
    "len_nbins": 200,
    "qual_nbins": 75,
    "smooth_sigma": 2
  },
  "outpu

: 1

To save and edit it redirect the std output to a file and make your changes using your favorite text editor. For more information refer to the API documentation

In [17]:
pycoQC --default_config > data/pycoQC_config.json

(pycoQC) 

: 1

Run pycoQC with `--config` option

In [18]:
pycoQC \
    -f data/summary/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o data/output/Albacore-1.7.0_basecall-1D-DNA.html \
    --config data/pycoQC_config.json

Checking arguments values
Initialising parser
Parsing input files
	Importing sequencing information from sequencing summary files
	Verifying fields and discarding unused columns
	Discarding lines containing NA values
		0 reads discarded
	Filtering out zero length reads
		438 reads discarded
	Sorting run IDs by decreasing throughput
		Run-id order ['db5916f2fe7957afac1d0aaccdec883342c4bc31']
	Reordering runids
		Processing reads with Run_ID db5916f2fe7957afac1d0aaccdec883342c4bc31 / time offset: 0
	Cleaning up low frequency barcodes
		0 reads with low frequency barcode unset
	Reindexing dataframe by read_ids
		9,562 Final valid reads
Loading plotting interface
	Parsing html config file
	Running method summary
	Plotting overall reads summary
	Running method barcode_summary
	Plotting reads summary by barcode
	Running method run_id_summary
	Plotting reads summary by run_id
	Running method reads_len_1D
	Plotting read length distribution
	Running method reads_qual_1D
	Plotting read quality d

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/demo/data/output/Albacore-1.7.0_basecall-1D-DNA.html)