Last Updated on 2026-02-11
- Supplementary Materials
- Supplementary Data S1: NSCLC Patient Information
- Supplementary Data S2: NSCLC Protein-Affecting Variants
- Supplementary Data S3: NSCLC Missense Variants
- Supplementary Data S4: NSCLC HLA Loss of Heterozygosity
- Supplementary Data S5: NSCLC Shared Protein Lists
- Supplementary Data S6: NSCLC pVACseq Class I Predictions
- Supplementary Data S7: NSCLC pVACseq Class II Predictions
- Supplementary Data S8: NSCLC pVACseq Peptidome Combined Predictions
- Supplementary Data S9: NSCLC Tested Neoantigens
- References
This repository contains the supplementary data files for:
Immunopeptidomics-guided identification of functional neoantigens in non-small cell lung cancer [1]. This repository
is associated with
For Supplementary Figures S1-S11 and Supplementary Tables S1-S2, please refer to supplement-2026-02-11.pdf.
Supplementary Data S1 to S9 are csv files. The column names and contents of these files are described below.
Supplementary Data S1, a csv file containing patient information with 24 rows and 21 column variables. Each row in Supplementary Data S1 represents observations for a single patient.
| Column Variable | Description |
|---|---|
| accel_id | CRUK Accelerator patient identifier |
| target_lung_id | Targeted Lung Health Check patient identifier |
| tissue | NSCLC type: Adenocarcinoma or Squamous cell carcinoma |
| n_somatic_variants | Total number of somatic variants identified by whole exome sequencing |
| mut_burden_per_mb | Mutational burden: mutations per million bases of DNA. Exome target size was 35.7 Mb |
| obs_class_I | Number of observed HLA I peptides by mass spec. immunopeptidomics |
| obs_class_II | Number of observed HLA II peptides by mass spec. immunopeptidomics |
| HLA | Class I and II HLA allotypes identified by genomic sequencing |
| wet_weight | Wet weight of tumour tissue |
| tumour_purity | Tumour purity as calculated from WES by ASCAT |
| tumour_ploidy | Tumour ploidy as calculated from WES by ASCAT |
| til_status | Tumour infiltrating T-cell status by immunohistochemistry: Low, Moderate, High or NA |
| weeks_post_surgery | Number of weeks since surgery |
| status_as_of_2021_01_19 | Status since 2021-01-19: Alive, Deceased or NA |
| date_of_diagnosis | Date of diagnosis |
| smoking_status | Smoking status |
| notes_2 | Notes about smoking history |
| Column Variable | Description |
|---|---|
| accel_id | CRUK Accelerator patient identifier |
| vid | Unique variant identifier |
| chrom | Chromosome |
| pos | Genomic coordinate |
| ref | Reference base |
| alt | Variant base |
| type | Variant type: snv, ins, del or complex. Single nucleotide variant, insertion, deletion and complex variant respectively |
| gene_name | HGNC gene name |
| ensg | Ensembl gene identifier |
| ensp | Ensembl protein identifier |
| consequence | Variant consequence annotation |
| biotype | Gene biotype filter: protein_coding |
| canonical | Canonical transcript flag |
| vaf | Variant allele frequency |
| depth | Sequencing depth |
| filter | Variant filter status |
| info | Information field from VCF file |
| format | Format of VCF variable columns |
| sample_1 | Reference sample VCF variable values corresponding with format |
| sample_2 | Tumour sample VCF variable values corresponding with format |
| tissue | Lung tumour tissue type: Squamous or Adenocarcinoma |
| Column Variable | Description |
|---|---|
| accel_id | CRUK Accelerator patient identifier |
| vid | Unique variant identifier |
| chrom | Chromosome |
| pos | Genomic coordinate |
| ref | Reference base |
| alt | Variant base |
| type | Variant type: snv, ins, del or complex. Single nucleotide variant, insertion, deletion and complex variant respectively |
| gene_name | HGNC gene name |
| ensg | Ensembl gene identifier |
| ensp | Ensembl protein identifier |
| vaf | Variant allele frequency |
| depth | Sequencing depth |
| filter | Variant filter status |
| info | Information field from VCF file |
| format | Format of VCF variable columns |
| sample_1 | Reference sample VCF variable values corresponding with format |
| sample_2 | Tumour sample VCF variable values corresponding with format |
| tissue | Lung tumour tissue type: Squamous or Adenocarcinoma |
| Column Variable | Description |
|---|---|
| accel_id | CRUK Accelerator patient identifier |
| message | HLA LOH detection message |
| HLA_A_type1 | HLA allele type 1: A, B or C |
| HLA_A_type2 | HLA allele type 2: A, B or C |
| LossAllele | HLA allele that was lost |
| KeptAllele | HLA allele that was retained |
| Column Variable | Description |
|---|---|
| hla_class | HLA class (I or II) |
| intersection | Set intersection identifier |
| n_proteins | Number of patients observations of protein in set intersection |
| uniprot_id | UniProt protein identifier |
| gene_name | HGNC gene name |
| Column Variable | Description |
|---|---|
| sample | CRUK Accelerator patient identifier |
| chromosome | The chromosome of this variant |
| start | The start position of this variant in the zero-based, half-open coordinate system |
| stop | The stop position of this variant in the zero-based, half-open coordinate system |
| reference | The reference allele |
| variant | The alt allele |
| transcript | The Ensembl ID of the affected transcript |
| transcript_support_level | The transcript support level (TSL) of the affected transcript. NA if the VCF entry doesn't contain TSL information |
| ensembl_gene_id | The Ensembl ID of the affected gene |
| variant_type | The type of variant. missense for missense mutations, inframe_ins for inframe insertions, inframe_del for inframe deletions, and FS for frameshift variants |
| mutation | The amino acid change of this mutation |
| protein_position | The protein position of the mutation |
| gene_name | The Ensembl gene name of the affected gene |
| hgv_sc | The HGVS coding sequence variant name |
| hgv_sp | The HGVS protein sequence variant name |
| hla_allele | The HLA allele for this prediction |
| peptide_length | The peptide length of the epitope |
| sub_peptide_position | The one-based position of the epitope within the protein sequence used to make the prediction |
| mutation_position | The one-based position of the start of the mutation within the epitope sequence. 0 if the start of the mutation is before the epitope |
| mt_epitope_seq | The mutant epitope sequence |
| wt_epitope_seq | The wildtype (reference) epitope sequence at the same position in the full protein sequence. NA if there is no wildtype sequence at this position or if more than half of the amino acids of the mutant epitope are mutated |
| best_mt_score_method | Prediction algorithm with the lowest mutant ic50 binding affinity for this epitope |
| best_mt_score | Lowest ic50 binding affinity of all prediction algorithms used |
| corresponding_wt_score | ic50 binding affinity of the wildtype epitope. NA if there is no WT Epitope Seq |
| corresponding_fold_change | Corresponding WT Score / Best MT Score. NA if there is no WT Epitope Seq |
| tumor_dna_depth | Tumor DNA depth at this position. NA if VCF entry does not contain tumor DNA readcount annotation |
| tumor_dna_vaf | Tumor DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor DNA readcount annotation |
| tumor_rna_depth | Tumor RNA depth at this position. NA if VCF entry does not contain tumor RNA readcount annotation |
| tumor_rna_vaf | Tumor RNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor RNA readcount annotation |
| normal_depth | Normal DNA depth at this position. NA if VCF entry does not contain normal DNA readcount annotation |
| normal_vaf | Normal DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain normal DNA readcount annotation |
| gene_expression | Gene expression value for the annotated gene containing the variant. NA if VCF entry does not contain gene expression annotation |
| transcript_expression | Transcript expression value for the annotated transcript containing the variant. NA if VCF entry does not contain transcript expression annotation |
| median_mt_score | Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
| median_wt_score | Median ic50 binding affinity of the wildtype epitope across all prediction algorithms used. NA if there is no WT Epitope Seq |
| median_fold_change | Median WT Score / Median MT Score. NA if there is no WT Epitope Seq |
| mh_cflurry_wt_score | MHCflurry ic50 binding affinity for the wildtype epitope |
| mh_cflurry_mt_score | MHCflurry ic50 binding affinity for the mutant epitope |
| mh_cnuggets_i_wt_score | MHCnuggets-I ic50 binding affinity for the wildtype epitope |
| mh_cnuggets_i_mt_score | MHCnuggets-I ic50 binding affinity for the mutant epitope |
| net_mhc_wt_score | NetMHC ic50 binding affinity for the wildtype epitope |
| net_mhc_mt_score | NetMHC ic50 binding affinity for the mutant epitope |
| pick_pocket_wt_score | PickPocket ic50 binding affinity for the wildtype epitope |
| pick_pocket_mt_score | PickPocket ic50 binding affinity for the mutant epitope |
| cterm_7mer_gravy_score | Mean hydropathy of last 7 residues on the C-terminus of the peptide |
| max_7mer_gravy_score | Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence |
| difficult_n_terminal_residue | Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? (T/F) |
| c_terminal_cysteine | Is the C-terminal amino acid a Cysteine? (T/F) |
| c_terminal_proline | Is the C-terminal amino acid a Proline? (T/F) |
| cysteine_count | Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
| n_terminal_asparagine | Is the N-terminal amino acid an Asparagine? (T/F) |
| asparagine_proline_bond_count | Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
| b_rank | Rank of binding score: 1/median neoantigen binding affinity. Lower is better |
| f_rank | Rank of fold change: the difference in median binding affinity between neoantigen and wildtype peptide (agretopicity). Higher is better |
| m_rank | Ranks of mutant allele expression: the product of gene_expression and tumor_rna_vaf. Higher is better |
| d_rank | Rank of the tumor_dna_vaf. Higher is better |
| score | A score is calculated from the above ranks with the following formula: b_rank + f_rank + (m_rank * 2) + (d_rank/2). Higher is better |
| rank_score | The score converted to a rank, with the best being 1, splitting ties by first. Lower is better |
| rank_percent | The percentage rank score. Lower is better |
| Column Variable | Description |
|---|---|
| sample | CRUK Accelerator patient identifier |
| chromosome | The chromosome of this variant |
| start | The start position of this variant in the zero-based, half-open coordinate system |
| stop | The stop position of this variant in the zero-based, half-open coordinate system |
| reference | The reference allele |
| variant | The alt allele |
| transcript | The Ensembl ID of the affected transcript |
| transcript_support_level | The transcript support level (TSL) of the affected transcript. NA if the VCF entry doesn't contain TSL information |
| ensembl_gene_id | The Ensembl ID of the affected gene |
| variant_type | The type of variant. missense for missense mutations, inframe_ins for inframe insertions, inframe_del for inframe deletions, and FS for frameshift variants |
| mutation | The amino acid change of this mutation |
| protein_position | The protein position of the mutation |
| gene_name | The Ensembl gene name of the affected gene |
| hgv_sc | The HGVS coding sequence variant name |
| hgv_sp | The HGVS protein sequence variant name |
| hla_allele | The HLA allele for this prediction |
| peptide_length | The peptide length of the epitope |
| sub_peptide_position | The one-based position of the epitope within the protein sequence used to make the prediction |
| mutation_position | The one-based position of the start of the mutation within the epitope sequence. 0 if the start of the mutation is before the epitope |
| mt_epitope_seq | The mutant epitope sequence |
| wt_epitope_seq | The wildtype (reference) epitope sequence at the same position in the full protein sequence. NA if there is no wildtype sequence at this position or if more than half of the amino acids of the mutant epitope are mutated |
| best_mt_score_method | Prediction algorithm with the lowest mutant ic50 binding affinity for this epitope |
| best_mt_score | Lowest ic50 binding affinity of all prediction algorithms used |
| corresponding_wt_score | ic50 binding affinity of the wildtype epitope. NA if there is no WT Epitope Seq |
| corresponding_fold_change | Corresponding WT Score / Best MT Score. NA if there is no WT Epitope Seq |
| tumor_dna_depth | Tumor DNA depth at this position. NA if VCF entry does not contain tumor DNA readcount annotation |
| tumor_dna_vaf | Tumor DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor DNA readcount annotation |
| tumor_rna_depth | Tumor RNA depth at this position. NA if VCF entry does not contain tumor RNA readcount annotation |
| tumor_rna_vaf | Tumor RNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain tumor RNA readcount annotation |
| normal_depth | Normal DNA depth at this position. NA if VCF entry does not contain normal DNA readcount annotation |
| normal_vaf | Normal DNA variant allele frequency (VAF) at this position. NA if VCF entry does not contain normal DNA readcount annotation |
| gene_expression | Gene expression value for the annotated gene containing the variant. NA if VCF entry does not contain gene expression annotation |
| transcript_expression | Transcript expression value for the annotated transcript containing the variant. NA if VCF entry does not contain transcript expression annotation |
| median_mt_score | Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
| median_wt_score | Median ic50 binding affinity of the wildtype epitope across all prediction algorithms used. NA if there is no WT Epitope Seq |
| median_fold_change | Median WT Score / Median MT Score. NA if there is no WT Epitope Seq |
| mh_cnuggets_ii_wt_score | MHCnuggets-II ic50 binding affinity for the wildtype epitope |
| mh_cnuggets_ii_mt_score | MHCnuggets-II ic50 binding affinity for the mutant epitope |
| cterm_7mer_gravy_score | Mean hydropathy of last 7 residues on the C-terminus of the peptide |
| max_7mer_gravy_score | Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely hydrophobic regions within a longer amino acid sequence |
| difficult_n_terminal_residue | Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine? (T/F) |
| c_terminal_cysteine | Is the C-terminal amino acid a Cysteine? (T/F) |
| c_terminal_proline | Is the C-terminal amino acid a Proline? (T/F) |
| cysteine_count | Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across distant parts of the peptide |
| n_terminal_asparagine | Is the N-terminal amino acid an Asparagine? (T/F) |
| asparagine_proline_bond_count | Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide |
| net_mhci_ipan_wt_score | NetMHCIIpan ic50 binding affinity for the wildtype epitope |
| net_mhci_ipan_mt_score | NetMHCIIpan ic50 binding affinity for the mutant epitope |
| n_nalign_wt_score | NNalign ic50 binding affinity for the wildtype epitope |
| n_nalign_mt_score | NNalign ic50 binding affinity for the mutant epitope |
| sm_malign_wt_score | SMMalign ic50 binding affinity for the wildtype epitope |
| sm_malign_mt_score | SMMalign ic50 binding affinity for the mutant epitope |
| b_rank | Rank of binding score: 1/median neoantigen binding affinity. Lower is better |
| f_rank | Rank of fold change: the difference in median binding affinity between neoantigen and wildtype peptide (agretopicity). Higher is better |
| m_rank | Ranks of mutant allele expression: the product of gene_expression and tumor_rna_vaf. Higher is better |
| d_rank | Rank of the tumor_dna_vaf. Higher is better |
| score | A score is calculated from the above ranks with the following formula: b_rank + f_rank + (m_rank * 2) + (d_rank/2). Higher is better |
| rank_score | The score converted to a rank, with the best being 1, splitting ties by first. Lower is better |
| rank_percent | The percentage rank score. Lower is better |
| Column Variable | Description |
|---|---|
| sample_id | CRUK Accelerator patient identifier |
| hla_class | HLA class (I or II) |
| table_name | Identifier in the form accel_id/predicted_hla_allotype/peptide_length |
| length | Peptide length |
| hla_length_pref1 | HLA allotype preferred peptide length 1 |
| hla_length_pref2 | HLA allotype preferred peptide length 2 |
| gene_name | HGNC gene name |
| mt_epitope_seq | The mutant epitope sequence |
| median_mt_score | Median ic50 binding affinity of the mutant epitope across all prediction algorithms used |
| corresponding_fold_change | Corresponding WT Score / Best MT Score. NA if there is no WT Epitope Seq |
| gene_expression | Gene expression value for the annotated gene containing the variant |
| tumor_rna_vaf | Tumor RNA variant allele frequency (VAF) at this position |
| tumor_dna_vaf | Tumor DNA variant allele frequency (VAF) at this position |
| rank_percent | The percentage rank score. Lower is better |
| rank_score | The score converted to a rank, with the best being 1, splitting ties by first. Lower is better |
| Obs_I | The number of peptides from the source protein observed by mass spectrometry in HLA-I immunopeptidome |
| Obs_II | The number of peptides from the source protein observed by mass spectrometry in HLA-II immunopeptidome |
| Column Variable | Description |
|---|---|
| accel_id | CRUK Accelerator patient identifier |
| gene_name | Gene |
| mt_epitope_seq | Mutated (neoantigen) peptide sequence |
| wt_epitope_seq | Wildtype peptide sequence |
| peptide_length | Peptide length |
| table_name | Identifier in the form accel_id/predicted_hla_allotype/peptide_length e.g. A119/DRB1*04:04/15 |
| mutation | The mutation From/To |
| protein_position | Location of the mutation in the source protein, UNIPROT sequence number |
| Obs_I | The number of peptides from the source protein observed by mass spectrometry in HLA-I immunopeptidome |
| Obs_II | The number of peptides from the source protein observed by mass spectrometry in HLA-II immunopeptidome |
| median_mt_score | The median pVACseq predicted binding affinity of the neoantigen peptide |
| median_wt_score | The median pVACseq predicted binding affinity of the wildtype peptide |
| median_fold_change | The ratio between the median neoantigen affinity and wildtype peptide affinity |
| rank_percent | The overall rank percentage for the neoantigen from pVACseq for the peptide of that length and HLA allotype |
| mean_sfc_mt | Mean IFN-γ ELISPOT spot forming cells per million cells for the neoantigen peptide |
| mean_sfc_wt | Mean IFN-γ ELISPOT spot forming cells per million cells for the wildtype peptide |
| elispot_response | ELISPOT response category: Strong, Weak or None |
1. Nicholas B, Bailey A, McCann KJ, Wood O, Currall E, Johnson P, et al. Proteogenomics guided identification of functional neoantigens in non-small cell lung cancer. 2024. Available: http://dx.doi.org/10.1101/2024.05.30.596609
2. Hundal J, Carreno BM, Petti AA, Linette GP, Griffith OL, Mardis ER, et al. pVAC-seq: A genome-guided in silico approach to identifying tumor neoantigens. Genome Medicine. 2016;8: 11. doi:10.1186/s13073-016-0264-5