auto-illumina-run-qc-check

Automated run-level QC check for illumina sequencing runs.

Installation

Clone this repo:

git clone git@github.com:BCCDC-PHL/auto-illumina-run-qc-check.git
cd auto-illumina-run-qc-check

Create a conda env, including python (>3.5,<3.11) and pip:

conda env create -f environment.yml -p ~/.conda/envs/auto-illumina-run-qc-check

...or with mamba:

mamba env create -f environment.yml -p ~/.conda/envs/auto-illumina-run-qc-check

Activate the conda env

conda activate auto-illumina-run-qc-check

Use pip to install:

pip install .

If you're developing/editing the source code then you can install in 'editable' mode:

pip install -e .

This will allow any changes to the codebase to be made immediately reflected in the tool.

Usage

Start the tool as follows:

auto-illumina-run-qc-check --config config.json

See the Configuration section of this document for details on preparing a configuration file.

More detailed logs can be produced by controlling the log level using the --log-level flag:

auto-illumina-run-qc-check --config config.json --log-level debug

Configuration

This tool takes a single config file, in JSON format, with the following structure:

{
    "excluded_runs_list": "excluded_runs.csv",
    "scan_interval_seconds": 10,
    "run_parent_dirs": [
	"/path/to/M00123/23",
	"/path/to/VH00123/23"
    ],
    "qc_thresholds": [
	{
	    "metric": "ErrorRate",
	    "threshold": 0.3,
	    "pass_above_or_below": "below",
	    "instrument_type": "MiSeq"
	},
	{
	    "metric": "ErrorRate",
	    "threshold": 0.3,
	    "pass_above_or_below": "below",
	    "instrument_type": "NextSeq"
	},
	{
	    "metric": "PercentGtQ30",
	    "threshold": 0.75,
	    "pass_above_or_below": "above"
	},
	{
	    "metric": "PercentAligned",
	    "threshold": 0.0,
	    "pass_above_or_below": "above"
	},
	{
	    "metric": "SumSampleFastqFileSizesMb",
	    "threshold": 100.0,
	    "pass_above_or_below": "above"
	}
    ]
}

Note that the keys instrument_type and flowcell_version are optional for each threshold. If included, they will only be applied to runs matching those values. The only supported instrument types are MiSeq and NextSeq. If those keys are not included, the threshold will be applied to all runs regardless of instrument type or flowcell version.

Outputs

This tool will write a file named qc_check_complete.json, with the following format:

{
  "checked_metrics": [
    {
      "metric": "ErrorRate",
      "value": 0.34,
      "threshold": 3.0,
      "pass_above_or_below": "below",
      "pass_fail": "PASS"
    },
    {
      "metric": "PercentGtQ30",
      "value": 89.62,
      "threshold": 70.0,
      "pass_above_or_below": "above",
      "pass_fail": "PASS"
    },
    {
      "metric": "PercentPf",
      "value": 77.4,
      "threshold": 60.0,
      "pass_above_or_below": "above",
      "pass_fail": "PASS"
    },
    {
      "metric": "SumSampleFastqFileSizesMb",
      "value": 236267.27,
      "threshold": 100.0,
      "pass_above_or_below": "above",
      "pass_fail": "PASS"
    }
  ],
  "overall_pass_fail": "PASS",
  "sequencing_run_id": "240101_VH00123_100_AAG4WXGB5",
  "instrument_type": "nextseq",
  "run_parameters": {
    "flowcell_version": "2"
  },
  "timestamp_qc_check_started": "2024-01-10T15:45:28.074190",
  "timestamp_qc_check_completed": "2024-01-10T15:45:28.152267"
}

It will also write a more detailed file named <RUN_ID>_qc_metrics.json.

The metrics at the top-level of this file are available to use for making QC pass/fail decisions.

{
  "ErrorRateR1": 0.24,
  "PercentGtQ30R1": 91.76,
  "ErrorRateR2": 0.44,
  "PercentGtQ30R2": 87.43,
  "NonIndexedErrorRate": 0.34,
  "NonIndexedIntensityCycle1": 126.0,
  "NonIndexedPercentAligned": 37.75,
  "NonIndexedPercentGtQ30": 89.59,
  "NonIndexedProjectedTotalYield": 37.54,
  "NonIndexedYieldTotal": 37.54,
  "ErrorRate": 0.34,
  "IntensityCycle1": 130.0,
  "PercentAligned": 37.75,
  "PercentGtQ30": 89.62,
  "ProjectedTotalYield": 39.8,
  "YieldTotal": 39.8,
  "ClusterDensity": 4974000,
  "PercentPf": 77.4,
  "Reads": [
    {
      "ReadNumber": 1,
      "IsIndexed": false,
      "YieldTotal": 18.77,
      "ProjectedTotalYield": 18.77,
      "PercentAligned": 38.08,
      "ErrorRate": 0.24,
      "IntensityCycle1": 130.0,
      "PercentGtQ30": 91.76
    },
    {
      "ReadNumber": 2,
      "IsIndexed": true,
      "YieldTotal": 1.13,
      "ProjectedTotalYield": 1.13,
      "PercentAligned": 0.0,
      "ErrorRate": 0,
      "IntensityCycle1": 133.0,
      "PercentGtQ30": 91.73
    },
    {
      "ReadNumber": 3,
      "IsIndexed": true,
      "YieldTotal": 1.13,
      "ProjectedTotalYield": 1.13,
      "PercentAligned": 0.0,
      "ErrorRate": 0,
      "IntensityCycle1": 133.0,
      "PercentGtQ30": 88.41
    },
    {
      "ReadNumber": 4,
      "IsIndexed": false,
      "YieldTotal": 18.77,
      "ProjectedTotalYield": 18.77,
      "PercentAligned": 37.43,
      "ErrorRate": 0.44,
      "IntensityCycle1": 123.0,
      "PercentGtQ30": 87.43
    }
  ],
  "LanesByRead": [
    {
      "ReadNumber": 1,
      "LaneNumber": 1,
      "TileCount": 32,
      "Density": 4974000,
      "DensityDeviation": 0.0,
      "PercentPf": 77.4,
      "PercentPfDeviation": 1.31,
      "Reads": 161440000,
      "ReadsPf": 124940000,
      "PercentGtQ30": 91.76,
      "Yield": 18.77,
      "CyclesError": "150",
      "PercentAligned": 38.08,
      "PercentAlignedDeviation": 0.22,
      "ErrorRate": 0.24,
      "ErrorRateDeviation": 0.05,
      "ErrorRate35": 0.08,
      "ErrorRate35Deviation": 0.02,
      "ErrorRate75": 0.12,
      "ErrorRate75Deviation": 0.03,
      "ErrorRate100": 0.15,
      "ErrorRate100Deviation": 0.03,
      "IntensityCycle1": 130.0,
      "IntensityCycle1Deviation": 10.0,
      "PhasingSlope": 0.092,
      "PhasingOffset": 1.326,
      "PrePhasingSlope": 0.122,
      "PrePhasingOffset": 0.592,
      "ClusterDensity": 4974000,
      "Occupancy": 96.48
    },
    {
      "ReadNumber": 2,
      "LaneNumber": 1,
      "TileCount": 32,
      "Density": 4974000,
      "DensityDeviation": 0.0,
      "PercentPf": 77.4,
      "PercentPfDeviation": 1.31,
      "Reads": 161440000,
      "ReadsPf": 124940000,
      "PercentGtQ30": 91.73,
      "Yield": 1.13,
      "CyclesError": "0",
      "PercentAligned": 0,
      "PercentAlignedDeviation": 0,
      "ErrorRate": 0,
      "ErrorRateDeviation": 0,
      "ErrorRate35": 0,
      "ErrorRate35Deviation": 0,
      "ErrorRate75": 0,
      "ErrorRate75Deviation": 0,
      "ErrorRate100": 0,
      "ErrorRate100Deviation": 0,
      "IntensityCycle1": 133.0,
      "IntensityCycle1Deviation": 8.0,
      "PhasingSlope": 0,
      "PhasingOffset": 0,
      "PrePhasingSlope": 0,
      "PrePhasingOffset": 0,
      "ClusterDensity": 4974000,
      "Occupancy": 96.48
    },
    {
      "ReadNumber": 3,
      "LaneNumber": 1,
      "TileCount": 32,
      "Density": 4974000,
      "DensityDeviation": 0.0,
      "PercentPf": 77.4,
      "PercentPfDeviation": 1.31,
      "Reads": 161440000,
      "ReadsPf": 124940000,
      "PercentGtQ30": 88.41,
      "Yield": 1.13,
      "CyclesError": "0",
      "PercentAligned": 0,
      "PercentAlignedDeviation": 0,
      "ErrorRate": 0,
      "ErrorRateDeviation": 0,
      "ErrorRate35": 0,
      "ErrorRate35Deviation": 0,
      "ErrorRate75": 0,
      "ErrorRate75Deviation": 0,
      "ErrorRate100": 0,
      "ErrorRate100Deviation": 0,
      "IntensityCycle1": 133.0,
      "IntensityCycle1Deviation": 7.0,
      "PhasingSlope": 0,
      "PhasingOffset": 0,
      "PrePhasingSlope": 0,
      "PrePhasingOffset": 0,
      "ClusterDensity": 4974000,
      "Occupancy": 96.48
    },
    {
      "ReadNumber": 4,
      "LaneNumber": 1,
      "TileCount": 32,
      "Density": 4974000,
      "DensityDeviation": 0.0,
      "PercentPf": 77.4,
      "PercentPfDeviation": 1.31,
      "Reads": 161440000,
      "ReadsPf": 124940000,
      "PercentGtQ30": 87.43,
      "Yield": 18.77,
      "CyclesError": "150",
      "PercentAligned": 37.43,
      "PercentAlignedDeviation": 0.36,
      "ErrorRate": 0.44,
      "ErrorRateDeviation": 0.12,
      "ErrorRate35": 0.15,
      "ErrorRate35Deviation": 0.04,
      "ErrorRate75": 0.24,
      "ErrorRate75Deviation": 0.05,
      "ErrorRate100": 0.29,
      "ErrorRate100Deviation": 0.06,
      "IntensityCycle1": 123.0,
      "IntensityCycle1Deviation": 8.0,
      "PhasingSlope": 0.093,
      "PhasingOffset": 2.133,
      "PrePhasingSlope": 0.117,
      "PrePhasingOffset": 1.237,
      "ClusterDensity": 4974000,
      "Occupancy": 96.48
    }
  ],
  "SumSampleFastqFileSizesMb": 236267.27,
}

Logging

This tool outputs structured logs in JSON Lines format:

Every log line should include the fields:

timestamp
level
module
function_name
line_num
message

...and the contents of the message key will be a JSON object that includes at event_type. The remaining keys inside the message will vary by event type.

{"timestamp": "2022-09-22T11:32:52.287", "level": "INFO", "module", "core", "function_name": "scan", "line_num", 56, "message": {"event_type": "scan_start"}}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
auto_illumina_run_qc_check		auto_illumina_run_qc_check
docs		docs
.gitignore		.gitignore
README.md		README.md
config_template.json		config_template.json
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-illumina-run-qc-check

Installation

Usage

Configuration

Outputs

Logging

About

Releases 6

Packages

Contributors 2

Languages

BCCDC-PHL/auto-illumina-run-qc-check

Folders and files

Latest commit

History

Repository files navigation

auto-illumina-run-qc-check

Installation

Usage

Configuration

Outputs

Logging

About

Topics

Resources

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Languages

Packages