<a href="https://colab.research.google.com/github/bcgsc/pori_graphkb_python/blob/feature%2Fjupyter/docs/pori_graphkb_python_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GraphKB Variant Matching Tutorial

This tutorial is an interactive notebook which can be run using ([google colab](https://colab.research.google.com/drive/1JnbK3Tm9f9XhuwUqvmbXReLGk_2GuaI4?usp=sharing)) or a local jupyter server (**reccommended** if matching patient data). This tutorial will cover basic matching of variants using the python GraphKB adaptor against an instance of the GraphKB API. 

Users must first have login credentials to an instance of GraphKB API (or use the demo server). Note for users using the demo credentials and server, the data is limited and more complete annotations would be expected for a production instance of GraphKB.

For the purposes of this tutorial we will be matching the known KRAS variant `p.G12D` to the demo instance of GraphKB. You can adjust the API instance by changing the setup variables below

To run this locally download this file and start the server from the command line as follows

```bash
jupyter notebook notebook.ipynb
```

You should now be able to see the notebook by openining `http://localhost:8888` in your browser

In [None]:
!pip3 install graphkb



In [None]:
from getpass import getpass

from graphkb import GraphKBConnection

GKB_API_URL = 'https://pori-demo.bcgsc.ca/graphkb-api/api'
GKB_USER = 'colab_demo'
GKB_PASSWORD = 'colab_demo'

graphkb_conn = GraphKBConnection(GKB_API_URL, False)
graphkb_conn.login(GKB_USER, GKB_PASSWORD)

## Matching Variants

Now you are ready to match variants

In [None]:
from graphkb.match import match_positional_variant

variant_name = 'KRAS:p.G12D'
print(vars(graphkb_conn))
variant_matches = match_positional_variant(graphkb_conn, variant_name)

print(f'{variant_name} matched {len(variant_matches)} other variant representations')
print()

for match in variant_matches:
    print(variant_name, 'will match', match['displayName'])


{'http': <requests.sessions.Session object at 0x7fee5af35350>, 'token': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2MTgwMzM2NzksInVzZXIiOnsiY3JlYXRlZEF0IjoxNjE3OTk1NTc0MDU3LCJsYXN0TG9naW5BdCI6MTYxNzk5NzQ5NTIxMSwiY3JlYXRlZEJ5IjoiIzE0OjEiLCJuYW1lIjoiY29sYWJfZGVtbyIsImdyb3VwcyI6W3siY3JlYXRlZEF0IjoxNjEyNjQwMDM1Nzg2LCJwZXJtaXNzaW9ucyI6eyJMaWNlbnNlQWdyZWVtZW50Ijo0LCJUaGVyYXB5Ijo0LCJWb2NhYnVsYXJ5Ijo0LCJCaW9tYXJrZXIiOjQsIkNhdGVnb3J5VmFyaWFudCI6NCwiVXNlciI6NCwiT250b2xvZ3kiOjQsIkRpc2Vhc2UiOjQsIkUiOjQsIlBhdGh3YXkiOjQsIkluZmVycyI6NCwiQW5hdG9taWNhbEVudGl0eSI6NCwiU3ViQ2xhc3NPZiI6NCwiRmVhdHVyZSI6NCwiU291cmNlIjo0LCJVc2VyR3JvdXAiOjQsIkFsaWFzT2YiOjQsIk9wcG9zaXRlT2YiOjQsIlYiOjQsIkV2aWRlbmNlIjo0LCJFbGVtZW50T2YiOjQsIkdlbmVyYWxpemF0aW9uT2YiOjQsIlBvc2l0aW9uYWxWYXJpYW50Ijo0LCJTdGF0ZW1lbnQiOjQsIkFic3RyYWN0Ijo0LCJQdWJsaWNhdGlvbiI6NCwiQ3Jvc3NSZWZlcmVuY2VPZiI6NCwiVGFyZ2V0T2YiOjQsIkNhdGFsb2d1ZVZhcmlhbnQiOjQsIlZhcmlhbnQiOjQsIkRlcHJlY2F0ZWRCeSI6NCwiU2lnbmF0dXJlIjo0LCJDbGluaWNhbFRyaWFsIjo0LCJDdXJhdGVkQ29udGVudC

We can see above that the KRAS protein variant has been matched to a number of other less specific mentions (ex. KRAS:p.G12mut) and also genomic equivalents (chr12:g.25398284C>T). Note that the results here will be dependent on the instance of GraphKB you are accessing.

## Annotating Variants

Now that we have matched the variant we will fetch the related statements to annotate this variant with its possible relevance

In [None]:
from graphkb.constants import BASE_RETURN_PROPERTIES, GENERIC_RETURN_PROPERTIES
from graphkb.util import convert_to_rid_list

# return properties should be customized to the users needs
return_props = (
    BASE_RETURN_PROPERTIES
    + ['sourceId', 'source.name', 'source.displayName']
    + [f'conditions.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'subject.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'evidence.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'relevance.{p}' for p in GENERIC_RETURN_PROPERTIES]
    + [f'evidenceLevel.{p}' for p in GENERIC_RETURN_PROPERTIES]
)

statements = graphkb_conn.query(
    {
        'target': 'Statement',
        'filters': {'conditions': convert_to_rid_list(variant_matches), 'operator': 'CONTAINSANY'},
        'returnProperties': return_props,
    }
)
print(f'annotated {len(variant_matches)} variant matches with {len(statements)} statements')
print()

for statement in statements[:5]:
    print(
        [c['displayName'] for c in statement['conditions'] if c['@class'].endswith('Variant')],
        statement['relevance']['displayName'],
        statement['subject']['displayName'],
        statement['source']['displayName'] if statement['source'] else '',
        [c['displayName'] for c in statement['evidence']],
    )

annotated 7 variant matches with 96 statements

['KRAS:p.G12D'] favours diagnosis lung cancer CIViC ['pmid:23014527']
['KRAS:p.G12D'] likely pathogenic acute myeloid leukemia DoCM ['pmid:3122217']
['KRAS:p.G12D'] pathogenic non-small cell lung carcinoma DoCM ['pmid:18794081']
['KRAS:p.G12D'] pathogenic colorectal cancer DoCM ['pmid:21228335']
['KRAS:p.G12D'] likely pathogenic acute myeloid leukemia DoCM ['pmid:2278970']
