# GraphKB Variant Matching Tutorial

This tutorial is an interactive notebook which can be run using ([google colab](https://colab.research.google.com/github/bcgsc/pori_graphkb_python/blob/master/docs/pori_graphkb_python_tutorial.ipynb)) or a local jupyter server (**recommended** if matching patient data). This tutorial will cover basic matching of variants using the python GraphKB adaptor against an instance of the GraphKB API. 

Users must first have login credentials to an instance of GraphKB API (or use the demo server). Note for users using the demo credentials and server, the data is limited and more complete annotations would be expected for a production instance of GraphKB.

For the purposes of this tutorial we will be matching the known KRAS variant `p.G12D` to the demo instance of GraphKB. You can adjust the API instance by changing the setup variables below

To run this locally download this file and start the server from the command line as follows

```
jupyter notebook notebook.ipynb
```

You should now be able to see the notebook by openining `http://localhost:8888` in your browser

In [12]:
!pip3 install graphkb

Looking in indexes: https://pypi.bcgsc.ca/gsc/packages/
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
from graphkb import GraphKBConnection

GKB_API_URL = 'https://pori-demo.bcgsc.ca/graphkb-api/api'
GKB_USER = 'colab_demo'
GKB_PASSWORD = 'colab_demo'

graphkb_conn = GraphKBConnection(GKB_API_URL, use_global_cache=False)

graphkb_conn.login(GKB_USER, GKB_PASSWORD)

## Matching Variants

Now you are ready to match variants

In [2]:
from graphkb.match import match_positional_variant

variant_name = 'KRAS:p.G12D'
variant_matches = match_positional_variant(graphkb_conn, variant_name)

print(f'{variant_name} matched {len(variant_matches)} other variant representations')
print()

for match in variant_matches:
    print(variant_name, 'will match', match['displayName'])


KRAS:p.G12D matched 7 other variant representations

KRAS:p.G12D will match KRAS:p.(G12_G13)mut
KRAS:p.G12D will match KRAS:p.G12mut
KRAS:p.G12D will match KRAS:p.G12D
KRAS:p.G12D will match chr12:g.25398284C>T
KRAS:p.G12D will match KRAS:p.G12
KRAS:p.G12D will match KRAS:p.?12mut
KRAS:p.G12D will match KRAS mutation


We can see above that the KRAS protein variant has been matched to a number of other less specific mentions (ex. KRAS:p.G12mut) and also genomic equivalents (chr12:g.25398284C>T). Note that the results here will be dependent on the instance of GraphKB you are accessing.

## Annotating Variants

Now that we have matched the variant we will fetch the related statements to annotate this variant with its possible relevance

In [3]:
from graphkb.constants import BASE_RETURN_PROPERTIES, GENERIC_RETURN_PROPERTIES
from graphkb.util import convert_to_rid_list

# return properties should be customized to the users needs
return_props = (
    BASE_RETURN_PROPERTIES
    + ['sourceId', 'source.name', 'source.displayName']
    + [f'conditions.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'subject.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'evidence.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'relevance.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'evidenceLevel.{p}' for p in GENERIC_RETURN_PROPERTIES]
)

statements = graphkb_conn.query(
    {
        'target': 'Statement',
        'filters': {'conditions': convert_to_rid_list(variant_matches), 'operator': 'CONTAINSANY'},
        'returnProperties': return_props,
    }
)
print(f'annotated {len(variant_matches)} variant matches with {len(statements)} statements')
print()

for statement in statements[:5]:
    print(
        [c['displayName'] for c in statement['conditions'] if c['@class'].endswith('Variant')],
        statement['relevance']['displayName'],
        statement['subject']['displayName'],
        statement['source']['displayName'] if statement['source'] else '',
        [c['displayName'] for c in statement['evidence']],
    )

annotated 7 variant matches with 96 statements

['KRAS:p.(G12_G13)mut'] resistance Gefitinib [c1855] CIViC ['pmid:15696205']
['KRAS:p.(G12_G13)mut'] resistance Panitumumab [c1857] CIViC ['pmid:19223544']
['KRAS:p.(G12_G13)mut'] resistance Cetuximab [c1723] CIViC ['pmid:19223544']
['KRAS:p.(G12_G13)mut'] resistance Cetuximab [c1723] CIViC ['pmid:19603024']
['KRAS:p.(G12_G13)mut'] resistance Panitumumab [c1857] CIViC ['pmid:18316791']


## Categorizing Statements

Something we often want to know is if a statement is therapeutic, or prognostic, etc. The
naive approach is to base this on a list of known terms or a regex pattern. In GraphKB we can
leverage the ontology structure instead.

In this example we will look for all terms that would indicate a therapeutically relevent statement.

To do this we pick our 'base' terms. These are the terms we consider to be the highest level
of the ontology tree, the most general term for that category.

In [4]:
from graphkb.vocab import get_term_tree


BASE_THERAPEUTIC_TERMS = 'therapeutic efficacy'

therapeutic_terms = get_term_tree(graphkb_conn, BASE_THERAPEUTIC_TERMS, include_superclasses=False)

print(f'Found {len(therapeutic_terms)} equivalent terms')

for term in therapeutic_terms:
    print('-', term['name'])


Found 13 equivalent terms
- therapeutic efficacy
- targetable
- response
- sensitivity
- likely sensitivity
- no sensitivity
- no response
- resistance
- reduced sensitivity
- likely resistance
- innate resistance
- acquired resistance
- no resistance


We can filter the statements we have already retrieved, or we can add this to our original query
and filter before we retrive from the API

In [6]:
statements = graphkb_conn.query(
    {
        'target': 'Statement',
        'filters': {
            'AND': [
                {'conditions': convert_to_rid_list(variant_matches), 'operator': 'CONTAINSANY'},
                {'relevance': convert_to_rid_list(therapeutic_terms), 'operator': 'IN'},
            ]
        },
        'returnProperties': return_props,
    }
)

for statement in statements:
    print(
        [c['displayName'] for c in statement['conditions'] if c['@class'].endswith('Variant')],
        statement['relevance']['displayName'],
        statement['subject']['displayName'],
        statement['source']['displayName'] if statement['source'] else '',
        [c['displayName'] for c in statement['evidence']],
    )

['KRAS:p.G12mut'] response mek inhibitor [c69145] CGI ['pmid:18701506']
['KRAS:p.G12D'] sensitivity dactolisib + selumetinib CIViC ['pmid:19029981']
['KRAS mutation'] sensitivity Decitabine [c981] CIViC ['pmid:25968887']
['KRAS mutation'] sensitivity Trametinib [c77908] CIViC ['pmid:22169769']
['KRAS:p.G12D'] sensitivity Akt Inhibitor MK2206 [c90581] CIViC ['pmid:22025163']
['KRAS mutation'] sensitivity cetuximab + dasatinib CIViC ['pmid:20956938']
['KRAS mutation'] sensitivity b-raf/vegfr-2 inhibitor raf265 + selumetinib CIViC ['pmid:25199829']
['KRAS mutation'] sensitivity b-raf/vegfr-2 inhibitor raf265 + selumetinib CIViC ['pmid:25199829']
['KRAS mutation'] sensitivity afatinib + trametinib CIViC ['pmid:24685132']
['KRAS mutation'] sensitivity afatinib + trametinib CIViC ['pmid:24685132']
['KRAS mutation'] sensitivity docetaxel + selumetinib CIViC ['pmid:23200175']
['KRAS mutation'] sensitivity selumetinib + teprotumumab CIViC ['pmid:21985784']
['KRAS mutation'] sensitivity MEK Inhi