# VA Value Sets

In [1]:
from civicpy import civic
from collections import Counter

In [2]:
evidence = civic.get_all_evidence()



In [3]:
c = Counter([x.variant_origin for x in evidence])
c

Counter({'Somatic': 4798,
         'Rare Germline': 1621,
         'N/A': 391,
         'Common Germline': 62,
         None: 39,
         'Unknown': 78})

`Somatic` is directly mappable to GENO term, as does `Unknown`. Others do not directly map.

In [4]:
na_variants = set([e.variant.name for e in evidence if e.variant_origin == 'N/A'])
na_variants

{'AKAP9-BRAF',
 'ALTERNATIVE TRANSCRIPT (ATI)',
 'AMPLIFICATION',
 'C609Y',
 'CYTOPLASMIC EXPRESSION',
 'D835Y',
 'DELETION',
 'E70* (c.208G>T)',
 'ETV6-NTRK3',
 'ETV6-NTRK3 FUSION',
 'EXON 18 OVEREXPRESSION',
 'EXPRESSION',
 'FGFR2-BICC1',
 'FUSION',
 'H1112L',
 'ISOFORM EXPRESSION',
 'KIF5B-RET',
 'LMNA-NTRK1',
 'LOSS',
 'LOSS-OF-FUNCTION',
 'MET-ATXN7L1 fusion',
 'MUTATION',
 'NON-AMPLIFICATION',
 'NRG1 FUSIONS',
 'NTRK1 FUSIONS',
 'NTRK2-STRN fusion',
 'NUCLEAR EXPRESSION',
 'NUCLEAR TRANSLOCATION',
 'OVEREXPRESSION',
 'P655R',
 'PDE4DIP-NTRK1',
 'PHOSPHORYLATION',
 'PROMOTER METHYLATION',
 'Q61H',
 'R130*',
 'R173C',
 'R82_V84del (c.244_252del)',
 'SERUM LEVELS',
 'SPLICE VARIANT 7',
 'SQSTM1-NTRK1',
 'T172 PHOSPHORYLATION',
 'TMP3-NTRK1',
 'TMPRSS2-ERG',
 'TRIM24-BRAF',
 'UNDEREXPRESSION',
 'WILD TYPE',
 'Y1092 PHOSPHORYLATION',
 'p16 EXPRESSION'}

From above, it is clear that in most cases, the use of `N/A` is proper for non-sequence forms of molecular variation (e.g. abundance or epigenetic modifications). These and `None` valued objects may be safely ignored.

This leaves only `Rare Germline` and `Common Germline`. This is clearly a 2-axis problem, we could use the `GENO` term for `Germline` and add a second field for rarity that leverages `polymorphic` (http://purl.obolibrary.org/obo/GENO_0000477) and `mutant` (http://purl.obolibrary.org/obo/GENO_0000480) terms from `GENO`.

In [8]:
predictive_evidence = [e for e in evidence if e.evidence_type == 'Predictive']

In [10]:
e = predictive_evidence[0]

In [16]:
c = Counter([e.clinical_significance for e in predictive_evidence])

In [17]:
c

Counter({'Resistance': 1614,
         'Sensitivity/Response': 2605,
         'N/A': 20,
         None: 17,
         'Adverse Response': 15,
         'Reduced Sensitivity': 20})

In [19]:
sum(c.values())

4291

In [20]:
pe_na = [e for e in predictive_evidence if e.clinical_significance == 'N/A']

In [24]:
for e in pe_na:
    print(e.site_link)

https://civicdb.org/links/evidence/83
https://civicdb.org/links/evidence/1673
https://civicdb.org/links/evidence/1973
https://civicdb.org/links/evidence/2990
https://civicdb.org/links/evidence/7219
https://civicdb.org/links/evidence/7608
https://civicdb.org/links/evidence/8064
https://civicdb.org/links/evidence/123
https://civicdb.org/links/evidence/997
https://civicdb.org/links/evidence/8039
https://civicdb.org/links/evidence/998
https://civicdb.org/links/evidence/1087
https://civicdb.org/links/evidence/6087
https://civicdb.org/links/evidence/501
https://civicdb.org/links/evidence/7234
https://civicdb.org/links/evidence/502
https://civicdb.org/links/evidence/1674
https://civicdb.org/links/evidence/412
https://civicdb.org/links/evidence/1671
https://civicdb.org/links/evidence/6438


From above, it appears "N/A" is used only in the context of "Does not support" direction evidence, or evidence that has not yet been reviewed/approved. The only outlier is https://civicdb.org/events/genes/3870/summary/variants/659/summary/evidence/1671/summary#evidence, which I believe was in error.