## Investigate MetaKB Dataset: Therapeutics & Combination Therapies
In order to better understand the MetaKB aggregate dataset, perform graph-directed lookups via neo4j to identify potential insights or avenues of discussion that might not previously be accessible.
  
This notebook focuses on exploring the Therapeutics & Combination Therapies datasets across different variants, studies, and gene contexts. Initial ideas are to look at FDA approval for therapies, diseases in use, and off-label use cases on a variant to variant or gene to gene basis.
  
**Current Data Version**: 5.20.0

### Grab Therapeutics Data

In [16]:
from neo4j import GraphDatabase

# Function to create a connection to the Neo4j database
def create_db_connection(uri, user, password):
    driver = GraphDatabase.driver(uri, auth=(user, password))
    return driver

# Function to execute a Cypher query
def execute_query(driver, query):
    with driver.session() as session:
        result = session.run(query)
        return [record for record in result]

# Connect to the Neo4j database
uri = "bolt://localhost:7687"
user = "neo4j"
password = "password"  # Replace 'your_password' with your actual password
driver = create_db_connection(uri, user, password)

# Strict, Must have Combination Therapies
query = """MATCH (s:Study)-[:HAS_VARIANT]->(variant),
      (s)-[:HAS_THERAPEUTIC]->(therapeutic),
      (therapeutic)-[:HAS_COMPONENTS]->(combination)
RETURN properties(s) AS Study, 
       properties(variant) AS VariantProperties,
       properties(therapeutic) AS TherapeuticProperties,
       COUNT(combination) AS NumberOfComponents,
       COLLECT(properties(combination)) AS CombinationTherapyProperties
"""

query_therapies = """
MATCH (s:Study)-[:HAS_VARIANT]->(variant),
      (s)-[:HAS_THERAPEUTIC]->(therapeutic)
OPTIONAL MATCH (therapeutic)-[:HAS_COMPONENTS]->(combination)
RETURN properties(s) AS Study, 
       properties(variant) AS VariantProperties,
       properties(therapeutic) AS TherapeuticProperties,
       COUNT(combination) AS NumberOfComponents,
       COLLECT(properties(combination)) AS CombinationTherapyProperties
"""

# Execute the query
result = execute_query(driver, query_therapies)

# Close the connection
driver.close()




In [17]:
import pandas as pd

data = []
for record in result:
    row = {
        'study_allele_origin': record['Study'].get('alleleOrigin', None),
        'study_id': record['Study']['id'],
        'study_direction': record['Study']['direction'],
        'study_predicate': record['Study']['predicate'],
        'study_type': record['Study']['type'],
        'variant_mp_score': record['VariantProperties'].get('civic_molecular_profile_score', None),
        'variant_id': record['VariantProperties']['id'],
        'variant_label': record['VariantProperties']['label'],
        'variant_type': record['VariantProperties'].get('variant_types',None),
        'therapeutic_type': record['TherapeuticProperties']['type'],
        'therapeutic_civic_type': record['TherapeuticProperties'].get('civic_therapy_interaction_type', None),
        'therapeutic_id': record['TherapeuticProperties']['id'],
        'number_of_components': record.get('NumberOfComponents', None),
        'combination_therapy_components': record.get('CombinationTherapyProperties', None)
    }

    data.append(row)

df = pd.DataFrame(data)

df

Unnamed: 0,study_allele_origin,study_id,study_direction,study_predicate,study_type,variant_mp_score,variant_id,variant_label,variant_type,therapeutic_type,therapeutic_civic_type,therapeutic_id,number_of_components,combination_therapy_components
0,somatic,civic.eid:238,supports,predictsResistanceTo,VariantTherapeuticResponseStudy,406.25,civic.mpid:34,EGFR T790M,"[{""label"": ""missense_variant"", ""system"": ""http...",TherapeuticAgent,,civic.tid:15,0,[]
1,somatic,civic.eid:1409,supports,predictsSensitivityTo,VariantTherapeuticResponseStudy,1378.50,civic.mpid:12,BRAF V600E,"[{""label"": ""missense_variant"", ""system"": ""http...",TherapeuticAgent,,civic.tid:4,0,[]
2,somatic,civic.eid:1592,supports,predictsSensitivityTo,VariantTherapeuticResponseStudy,406.25,civic.mpid:34,EGFR T790M,"[{""label"": ""missense_variant"", ""system"": ""http...",TherapeuticAgent,,civic.tid:187,0,[]
3,somatic,civic.eid:1867,supports,predictsSensitivityTo,VariantTherapeuticResponseStudy,406.25,civic.mpid:34,EGFR T790M,"[{""label"": ""missense_variant"", ""system"": ""http...",TherapeuticAgent,,civic.tid:187,0,[]
4,somatic,civic.eid:2994,supports,predictsSensitivityTo,VariantTherapeuticResponseStudy,379.00,civic.mpid:33,EGFR L858R,"[{""label"": ""missense_variant"", ""system"": ""http...",TherapeuticAgent,,civic.tid:15,0,[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1037,somatic,moa.assertion:961,none,predictsSensitivityTo,VariantTherapeuticResponseStudy,,moa.variant:66,ABL1 p.T315I (Missense),,TherapeuticAgent,,moa.normalize.therapy.rxcui:1364347,0,[]
1038,somatic,moa.assertion:963,none,predictsSensitivityTo,VariantTherapeuticResponseStudy,,moa.variant:146,BRAF p.V600K (Missense),,TherapeuticAgent,,moa.normalize.therapy.ncit:C106254,0,[]
1039,somatic,moa.assertion:967,none,predictsSensitivityTo,VariantTherapeuticResponseStudy,,moa.variant:254,EGFR p.L858R (Missense),,TherapeuticAgent,,moa.normalize.therapy.rxcui:337525,0,[]
1040,somatic,moa.assertion:969,none,predictsSensitivityTo,VariantTherapeuticResponseStudy,,moa.variant:254,EGFR p.L858R (Missense),,TherapeuticAgent,,moa.normalize.therapy.rxcui:328134,0,[]


### Inspect

In [18]:
df['variant_mp_score'].describe()

count     815.000000
mean      176.373620
std       382.819215
min         0.500000
25%         7.500000
50%        20.000000
75%        96.500000
max      1378.500000
Name: variant_mp_score, dtype: float64

In [20]:
df['variant_label'].value_counts()

variant_label
BRAF V600E                 70
EGFR L858R                 36
BRAF p.V600E (Missense)    31
EGFR T790M                 27
PIK3CA H1047R              22
                           ..
KDR R961W                   1
ERBB2 G778_S779insLPS       1
ERBB2 G776delinsLC          1
ERBB2 G776delinsCV          1
IDH1 p.R132L (Missense)     1
Name: count, Length: 474, dtype: int64

In [21]:
df['number_of_components'].value_counts()

number_of_components
0    922
2    101
3     19
Name: count, dtype: int64