- Overview
- News
- Example report
- Annotation resources
- CPSR Documentation
- Getting started
- Related work
- Contact
The Cancer Predisposition Sequencing Reporter (CPSR) is a computational workflow that interprets germline variants identified from next-generation sequencing in the context of cancer predisposition. The workflow is integrated with the framework that underlies the Personal Cancer Genome Reporter (PCGR), utilizing the Docker environment for encapsulation of code and software dependencies. While PCGR is intended for reporting and analysis of somatic variants detected in a tumor, CPSR is intended for reporting and ranking of germline variants in protein-coding genes that are implicated in cancer predisposition and inherited cancer syndromes.
CPSR accepts a query file with raw germline variant calls encoded in the VCF format (i.e. analyzing SNVs/InDels). Furthermore, through the use several different virtual cancer predisposition gene panels harvested from the Genomics England PanelApp, the user can flexibly put a restriction on which genes and findings are displayed in the cancer predisposition report.
Snapshots of sections in the cancer predisposition genome report:
The software performs extensive variant annotation on the selected geneset and produces an interactive HTML report, in which the user can investigate:
- ClinVar variants - pre-classified variants according to a five-level tier scheme in ClinVar (Pathogenic to Benign)
- Non-ClinVar variants - classified by CPSR through ACMG criteria (variant frequency levels and functional effects) into to a five-level tier scheme (Pathogenic to Benign)
- Variant biomarkers - cancer predisposition variants with reported implications for prognosis, diagnosis or therapeutic regimens
- Secondary findings (optional) - pathogenic ClinVar variants in the ACMG recommended list for reporting of secondary findings
- GWAS hits (optional) - variants overlapping with previously identified hits in genome-wide association studies (GWAS) of cancer phenotypes (i.e. low to moderate risk conferring alleles), using NHGRI-EBI Catalog of published genome-wide association studies as the underlying source.
The variant sets can be interactively explored and filtered further through different types of filters (phenotypes, genes, variant consequences, population MAF etc.). Importantly, the unclassified (i.e. non-ClinVar) variants are assigned a pathogenicity score based on the aggregation of scores according to previously established ACMG criteria. The ACMG criteria includes cancer-specific criteria, as outlined and specified in several previous studies (Huang et al., Cell, 2018; Nykamp et al., Genet Med., 2017; Maxwell et al., Am J Hum Genet., 2016; Amendola et al., Am J Hum Genet., 2016). See also Related work below).
The cancer predisposition report can show variants found in a number of well-known cancer predisposition genes, and the specific set of genes can be customized by the user by choosing any of the following virtual gene panels (0 - 42):
-
Panel 0 is a comprehensive, research-based superpanel assembled through known sources on cancer predisposition:
- A list of 152 genes that were curated and established within TCGA’s pan-cancer study (Huang et al., Cell, 2018)
- A list of 107 protein-coding genes that has been manually curated in COSMIC’s Cancer Gene Census v91,
- Genes from all [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk/ panels for inherited cancers and tumor syndromes (detailed below)
- Additional genes deemed relevant for cancer predisposition (contributed by the CPSR user community)
The combination of the above sources resulted in a non-redundant set of n = 433 genes of relevance for cancer predisposition.
Do you miss other genes of relevance for cancer predisposition/inherited tumor syndromes? Please forward a list of gene identifiers, preferably also with mode of inheritance and literature support to sigven AT ifi.uio.no, so we can include them in Panel 0.
-
Panels 1 - 42 are panels for inherited cancers and tumor syndromes assembled within the Genomics England PanelApp:
- 1 = Adult solid tumours cancer susceptibility
- 2 = Adult solid tumours for rare disease
- 3 = Bladder cancer pertinent cancer susceptibility
- 4 = Brain cancer pertinent cancer susceptibility
- 5 = Breast cancer pertinent cancer susceptibility
- 6 = Childhood solid tumours cancer susceptibility
- 7 = Colorectal cancer pertinent cancer susceptibility
- 8 = Endometrial cancer pertinent cancer susceptibility
- 9 = Familial Tumours Syndromes of the central & peripheral Nervous system
- 10 = Familial breast cancer
- 11 = Familial melanoma
- 12 = Familial prostate cancer
- 13 = Familial rhabdomyosarcoma
- 14 = GI tract tumours
- 15 = Genodermatoses with malignancies
- 16 = Haematological malignancies cancer susceptibility
- 17 = Haematological malignancies for rare disease
- 18 = Head and neck cancer pertinent cancer susceptibility
- 19 = Inherited MMR deficiency (Lynch syndrome)
- 20 = Inherited non-medullary thyroid cancer
- 21 = Inherited ovarian cancer (without breast cancer)
- 22 = Inherited pancreatic cancer
- 23 = Inherited polyposis
- 24 = Inherited predisposition to acute myeloid leukaemia (AML)
- 25 = Inherited predisposition to GIST
- 26 = Inherited renal cancer
- 27 = Inherited phaeochromocytoma and paraganglioma
- 28 = Melanoma pertinent cancer susceptibility
- 29 = Multiple endocrine tumours
- 30 = Multiple monogenic benign skin tumours
- 31 = Neuroendocrine cancer pertinent cancer susceptibility
- 32 - Neurofibromatosis Type 1
- 33 = Ovarian cancer pertinent cancer susceptibility
- 34 = Parathyroid Cancer
- 35 = Prostate cancer pertinent cancer susceptibility
- 36 = Renal cancer pertinent cancer susceptibility
- 37 = Rhabdoid tumour predisposition
- 38 = Sarcoma cancer susceptibility
- 39 = Sarcoma susceptibility
- 40 = Thyroid cancer pertinent cancer susceptibility
- 41 = Tumour predisposition - childhood onset
- 42 = Upper gastrointestinal cancer pertinent cancer susceptibility
- June 30th 2021: 0.6.2 release
- Updated bundle (ClinVar, CancerMine, UniprotKB, PanelApp, CIViC, GWAS catalog)
- Software upgrade (VEP, R/BioConductor)
- CHANGELOG
- November 30th 2020: 0.6.1 release
- Updated bundle (ClinVar, CancerMine, UniprotKB, CIViC, GWAS catalog)
- CHANGELOG
- VEP - Variant Effect Predictor v104 (GENCODE v38/v19 as the gene reference dataset), includes gnomAD r2.1, dbSNP build 154, 1000 Genomes Project - phase3
- ClinVar - Database of variants with clinical significance (June 2021)
- CIViC - clinical interpretations of variants in cancer (June 15th 2021)
- Cancer Hotspots - Resource for statistically significant mutations in cancer (v2 - 2017)
- dBNSFP - Database of non-synonymous functional predictions (v4.2, March 2021)
- UniProt/SwissProt KnowledgeBase - Resource on protein sequence and functional information (2021_03, June 2021)
- Pfam - Database of protein families and domains (v34.0, March 2021)
- CancerMine - Literature-derived database of tumor suppressor genes/proto-oncogenes (v36, June 2021)
- Genomics England PanelApp - panels as of June 25th 2021
- NHGRI-EBI GWAS catalog - GWAS catalog for cancer phenotypes, June 8th 2021)
IMPORTANT: If you use CPSR, please cite the following publication:
Sigve Nakken, Vladislav Saveliev, Oliver Hofmann, Pål Møller, Ola Myklebost, and Eivind Hovig. Cancer Predisposition Sequencing Reporter: a flexible variant report engine for high-throughput germline screening in cancer (2021). Int J Cancer. doi:10.1002/ijc.33749
An installation of Python (version 3.6) is required to run CPSR. Check that Python is installed by typing python --version
in your terminal window.
IMPORTANT NOTE: STEP 1 & 2 below outline installation guidelines for running CPSR with Docker. If you want to install and run CPSR without the use of Docker (i.e. through Conda), follow these instructions
Make sure you have a working installation of PCGR (version 0.9.2) and the accompanying data bundle(s) (walk through steps 1-2).
Download the 0.6.2 release of cpsr (run script)
usage: cpsr.py -h [options] --input_vcf <INPUT_VCF> --pcgr_dir <PCGR_DIR> --output_dir <OUTPUT_DIR> --genome_assembly <GENOME_ASSEMBLY> --sample_id <SAMPLE_ID>
Cancer Predisposition Sequencing Reporter - report of clinically significant cancer-predisposing germline variants
Required arguments:
--input_vcf INPUT_VCF
VCF input file with germline query variants (SNVs/InDels).
--pcgr_dir PCGR_DIR Directory that contains the PCGR data bundle directory, e.g. ~/pcgr-0.9.2
--output_dir OUTPUT_DIR
Output directory
--genome_assembly {grch37,grch38}
Genome assembly build: grch37 or grch38
--sample_id SAMPLE_ID
Sample identifier - prefix for output files
Panel options:
--panel_id VIRTUAL_PANEL_ID
Comma-separated string with identifier(s) of predefined virtual cancer predisposition gene panels,
choose any combination of the following identifiers:
0 = CPSR exploratory cancer predisposition panel
(n = 335, Genomics England PanelApp / TCGA Germline Study / Cancer Gene Census / Other)
1 = Adult solid tumours cancer susceptibility (Genomics England PanelApp)
2 = Adult solid tumours for rare disease (Genomics England PanelApp)
3 = Bladder cancer pertinent cancer susceptibility (Genomics England PanelApp)
4 = Brain cancer pertinent cancer susceptibility (Genomics England PanelApp)
5 = Breast cancer pertinent cancer susceptibility (Genomics England PanelApp)
6 = Childhood solid tumours cancer susceptibility (Genomics England PanelApp)
7 = Colorectal cancer pertinent cancer susceptibility (Genomics England PanelApp)
8 = Endometrial cancer pertinent cancer susceptibility (Genomics England PanelApp)
9 = Familial Tumours Syndromes of the central & peripheral Nervous system (Genomics England PanelApp)
10 = Familial breast cancer (Genomics England PanelApp)
11 = Familial melanoma (Genomics England PanelApp)
12 = Familial prostate cancer (Genomics England PanelApp)
13 = Familial rhabdomyosarcoma (Genomics England PanelApp)
14 = GI tract tumours (Genomics England PanelApp)
15 = Genodermatoses with malignancies (Genomics England PanelApp)
16 = Haematological malignancies cancer susceptibility (Genomics England PanelApp)
17 = Haematological malignancies for rare disease (Genomics England PanelApp)
18 = Head and neck cancer pertinent cancer susceptibility (Genomics England PanelApp)
19 = Inherited MMR deficiency (Lynch syndrome) - Genomics England PanelApp
20 = Inherited non-medullary thyroid cancer (Genomics England PanelApp)
21 = Inherited ovarian cancer (without breast cancer) (Genomics England PanelApp)
22 = Inherited pancreatic cancer (Genomics England PanelApp)
23 = Inherited polyposis (Genomics England PanelApp)
24 = Inherited predisposition to acute myeloid leukaemia (AML) - Genomics England PanelApp
25 = Inherited predisposition to GIST (Genomics England PanelApp)
26 = Inherited renal cancer (Genomics England PanelApp)
27 = Inherited phaeochromocytoma and paraganglioma (Genomics England PanelApp)
28 = Melanoma pertinent cancer susceptibility (Genomics England PanelApp)
29 = Multiple endocrine tumours (Genomics England PanelApp)
30 = Multiple monogenic benign skin tumours (Genomics England PanelApp)
31 = Neuroendocrine cancer pertinent cancer susceptibility (Genomics England PanelApp)
32 = Neurofibromatosis Type 1 (Genomics England PanelApp)
33 = Ovarian cancer pertinent cancer susceptibility (Genomics England PanelApp)
34 = Parathyroid Cancer (Genomics England PanelApp)
35 = Prostate cancer pertinent cancer susceptibility (Genomics England PanelApp)
36 = Renal cancer pertinent cancer susceptibility (Genomics England PanelApp)
37 = Rhabdoid tumour predisposition (Genomics England PanelApp)
38 = Sarcoma cancer susceptibility (Genomics England PanelApp)
39 = Sarcoma susceptibility (Genomics England PanelApp)
40 = Thyroid cancer pertinent cancer susceptibility (Genomics England PanelApp)
41 = Tumour predisposition - childhood onset (Genomics England PanelApp)
42 = Upper gastrointestinal cancer pertinent cancer susceptibility (Genomics England PanelApp)
--custom_list CUSTOM_LIST
Provide custom list of genes from virtual panel 0 (single-column txt file with Ensembl gene identifiers),
alternative to predefined panels provided with --panel_id)
--custom_list_name CUSTOM_LIST_NAME
Set name for custom made panel/list (single word - no whitespace), will be displayed in the report
--diagnostic_grade_only
For panel_id's 1-42 (Genomics England PanelApp) - consider genes with a GREEN status only, default: False
VEP options:
--vep_n_forks VEP_N_FORKS
Number of forks (option '--fork' in VEP), default: 4
--vep_buffer_size VEP_BUFFER_SIZE
Variant buffer size (variants read into memory simultaneously, option '--buffer_size' in VEP)
- set lower to reduce memory usage, default: 500
--vep_pick_order VEP_PICK_ORDER
Comma-separated string of ordered transcript properties for primary variant pick
( option '--pick_order' in VEP), default: canonical,appris,biotype,ccds,rank,tsl,length,mane
--vep_no_intergenic Skip intergenic variants during processing (option '--no_intergenic' in VEP), default: False
vcfanno options:
--vcfanno_n_proc VCFANNO_N_PROC
Number of vcfanno processes (option '-p' in vcfanno), default: 4
Other options:
--force_overwrite By default, the script will fail with an error if any output file already exists.
You can force the overwrite of existing result files by using this flag, default: False
--version show program's version number and exit
--basic Run functional variant annotation on VCF through VEP/vcfanno, omit Tier assignment/report generation (STEP 4), default: False
--no_vcf_validate Skip validation of input VCF with Ensembl's vcf-validator, default: False
--docker_uid DOCKER_USER_ID
Docker user ID. Default is the host system user ID. If you are experiencing permission errors,
try setting this up to root (`--docker_uid root`), default: None
--no_docker Run the CPSR workflow in a non-Docker mode, default: False
--preserved_info_tags PRESERVED_INFO_TAGS
Comma-separated string of VCF INFO tags from query VCF that should be kept in CPSR output TSV
--report_theme {default,cerulean,journal,flatly,readable,spacelab,united,cosmo,lumen,paper,sandstone,simplex,yeti}
Visual report theme (rmarkdown), default: default
--report_nonfloating_toc
Do not float the table of contents (TOC) in output HTML report, default: False
--report_table_display {full,light}
Set the level of detail/comprehensiveness in interactive datables of HTML report, very comprehensive (option 'full') or slim/focused ('light')
--ignore_noncoding Do not list non-coding variants in HTML report, default: False
--secondary_findings Include variants found in ACMG-recommended list for secondary findings (v3.0), default: False
--gwas_findings Report overlap with low to moderate cancer risk variants (tag SNPs) identified from genome-wide association studies, default: False
--gwas_p_value GWAS_P_VALUE
Required p-value for variants listed as hits from genome-wide association studies, default: 5e-06
--pop_gnomad {afr,amr,eas,sas,asj,nfe,fin,global}
Population source in gnomAD used for variant frequency assessment (ACMG classification), default: nfe
--maf_upper_threshold MAF_UPPER_THRESHOLD
Upper MAF limit (gnomAD global population frequency) for variants to be included in the report, default: 0.9
--classify_all Provide CPSR variant classifications (TIER 1-5) also for variants with exising ClinVar classifications in output TSV, default: False
--clinvar_ignore_noncancer
Ignore (exclude from report) ClinVar-classified variants reported only for phenotypes/conditions NOT related to cancer, default: False
--debug Print full docker commands to log, default: False
The cpsr software bundle contains an example VCF file.
Report generation with the example VCF, using the Adult solid tumours cancer susceptibility virtual gene panel, can be performed through the following command:
python ~/cpsr-0.6.2/cpsr.py
--input_vcf ~/cpsr-0.6.2/example.vcf.gz
--pcgr_dir ~/pcgr-0.9.2
--output_dir ~/cpsr-0.6.2
--genome_assembly grch37
--panel_id 1
--sample_id example
--secondary_findings
--classify_all
--maf_upper_threshold 0.2
--no_vcf_validate
Note that the example command also refers to the PCGR directory (pcgr-0.9.2), which contains the data bundle that are necessary for both PCGR and CPSR.
This command will run the Docker-based cpsr workflow and produce the following output files in the output folder:
- example.cpsr.grch37.vcf.gz (.tbi) - Bgzipped VCF file with relevant annotations appended by CPSR
- example.cpsr.grch37.pass.vcf.gz (.tbi) - Bgzipped VCF file with relevant annotations appended by CPSR (PASS variants only)
- example.cpsr_config.rds - CPSR configuration object (RDS format), mostly for debugging purposes
- example.cpsr.grch37.pass.tsv.gz - Compressed TSV file (generated with vcf2tsv) of VCF content with relevant annotations appended by CPSR
- example.cpsr.grch37.html - Interactive HTML report with clinically relevant variants in cancer predisposition genes organized into tiers
- example.cpsr.grch37.json.gz - Compressed JSON dump of HTML report content
- example.cpsr.grch37.snvs_indels.tiers.tsv - TSV file with key annotations of tier-structured SNVs/InDels
- CharGer - Characterization of Germline variants
- PathoMan - Pathogenicity of Mutation Analyzer (Beta)
- SherLoc - Variant classification
sigven AT ifi.uio.no