# Rapid proteomic analysis for solid tumors reveals LSD1 as a drug target in an end‐stage cancer patient
#### Doll et al. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6068348/

### Abstract
Recent advances in mass spectrometry (MS)‐based technologies are now set to transform translational cancer proteomics from an idea to a practice. Here, we present a robust proteomic workflow for the analysis of clinically relevant human cancer tissues that allows quantitation of thousands of tumor proteins in several hours of measuring time and a total turnaround of a few days. We applied it to a chemorefractory metastatic case of the extremely rare urachal carcinoma. Quantitative comparison of lung metastases and surrounding tissue revealed several significantly upregulated proteins, among them lysine‐specific histone demethylase 1 (LSD1/KDM1A). LSD1 is an epigenetic regulator and the target of active development efforts in oncology. Thus, clinical cancer proteomics can rapidly and efficiently identify actionable therapeutic options. While currently described for a single case study, we envision that it can be applied broadly to other patients in a similar condition.

### Workflow

![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6068348/bin/MOL2-12-1296-g002.jpg)

**Figure 2** Proteomics workflow for the case study. (A) Timeline of the project. (B) Experimental design, including source of material, inStageTip sample preparation, and depiction of the analytical workflow


### Results

![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6068348/bin/MOL2-12-1296-g003.jpg)

**Figure 3** Proteins differentially expressed in the urachal carcinoma lung metastases. (A) Volcano plot of the p‐values (y‐axis) vs. the log2 protein abundance differences (x‐axis) between metastases and control, with lines of significance colored in black or gray lines corresponding to a 5% or 1% FDR, respectively. (B) Mechanisms of action of LSD1/KDM1A and inhibitory drug treatment proposed: **JATROSOME. TRANYLCYPROMIN**


### Workflow with the Clinical Knowledge Graph

1. Generate Analysis Report: Proteomics data
2. Identify Candidate Drug Treatments
3. Rank Candidates According to Toxicity

![Clinical_Knowledge_Graph](banner.jpg)

## Generate Analysis Report: Proteomics Data

### Report Manager

In [268]:
from report_manager import project

##### We load specific configuration for this project.

In [269]:
configuration_files = {"proteomics":"/Users/albertosantos/Development/Clinical_Proteomics_Department/ClinicalKnowledgeGraph(CKG)/code/src/report_manager/config/proteomics_P0000002.yml"}

##### We create a new project object that we can use to run the analyses

In [270]:
study_case_project = project.Project(identifier="P0000002", configuration_files=configuration_files, datasets={}, knowledge=None, report={})

##### We need to first build the project. This step will collect the project datasets from CKG and process them

In [271]:
study_case_project.build_project(force=False)

Loading project


##### We can now generate the report following the specified configuration

In [272]:
study_case_project.generate_report()

##### Ready! All the analyses are done and we can now access to all the results for each data type

In [273]:
study_case_project.list_datasets()

dict_keys(['multiomics', 'wes', 'proteomics'])

##### We will use the results from the proteomics analyses. We access the dataset 'proteomics' for further analysis.

In [274]:
proteomics_dataset = study_case_project.get_dataset(dataset='proteomics')

##### The available analysis for this dataset are:

In [275]:
proteomics_dataset.list_dataframes()

['go annotation',
 'number of modified proteins',
 'number of peptides',
 'number of proteins',
 'original',
 'pathway annotation',
 'processed',
 'protein biomarkers',
 'regulated',
 'regulation table']

###### In this case, we use the regulation table to extract proteins upregulated in the metastatic tissue compare to non-cancerous tissue.
***Note: the automated analysis from CKG return the comparison CONTROL vs CANCER***

In [276]:
regulation_table = proteomics_dataset.get_dataframe(dataset_name='regulation table')

In [277]:
regulation_table.head()

Unnamed: 0,-log10 pvalue,FC,Method,Note,correction,group1,group2,identifier,log2FC,mean(group1),mean(group2),padj,pvalue,rejected,s0,t-statistics
0,0.270536,1.27761,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,A1BG~P04217,0.353447,31.709831,32.063278,0.714817,0.536369,False,2,0.153337
1,2.399501,-3.612029,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,A1CF~Q9NQ94,-1.85281,26.563041,24.710232,0.055096,0.003986,False,2,-0.924007
2,1.053637,1.926331,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,A2M~P01023,0.945855,34.287108,35.232963,0.272086,0.088382,False,2,0.463202
3,1.485687,-2.80125,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,AAAS~Q9NRG9,-1.486071,26.316579,24.830508,0.150779,0.032682,False,2,-0.594606
4,0.707536,1.786161,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,AACS~Q86V21,0.836862,26.349952,27.186814,0.424504,0.196094,False,2,0.346611


In [278]:
regulation_table[regulation_table["identifier"].str.startswith('KDM1A')]

Unnamed: 0,-log10 pvalue,FC,Method,Note,correction,group1,group2,identifier,log2FC,mean(group1),mean(group2),padj,pvalue,rejected,s0,t-statistics
2225,2.700531,-8.022234,SAMR Two class paired,Maximum number of permutations: 4.0. Corrected...,FDR correction BH,CANCER,CONTROL,KDM1A~O60341,-3.004004,28.867004,25.863,0.039683,0.001993,True,2,-1.007213


##### As in the article, we use significantly regulated proteins with a fold change higher than two.

In [279]:
up_regulated_proteins = regulation_table.loc[(regulation_table.rejected) & (regulation_table.FC < -2), ['identifier']]

In [280]:
up_regulated_proteins.shape

(187, 1)

### Graph Database Connector

In [281]:
from graphdb_connector import query_utils, connector

##### We connect to CKG database using the default configuration

In [282]:
driver = connector.getGraphDatabaseConnectionConfiguration()

##### We load the existing database queries that we can use to extract knowledge from CKG

In [283]:
queries = query_utils.read_knowledge_queries()

### 1) Filter for Regulated Proteins Associated to Lung Cancer:

##### We want to check whether we can identify known connection between the upregulated proteins in metastases and the disease
##### We check if there are queries for these node types: Protein, Disease

In [284]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Protein", "Disease"], print_pretty=True)

Query id: Disease
Query Name:  associated diseases in at least two of the proteins specified
Description:  get relationships to diseases from a list of proteins. Limit the result to diseases associated to the disease studied and with a score higher than 3 (DISEASES).
Involves nodes: Protein,Disease
Involves relationships: Protein,Disease
Query:
 MATCH (project:Project)-[:STUDIES_DISEASE]-(d:Disease)-[:HAS_PARENT]->(parent_disease:Disease) WHERE project.id="PROJECTID" WITH COLLECT(parent_disease) + COLLECT(d) AS parent_diseases MATCH (protein:Protein)-[r]-(disease:Disease)-[:HAS_PARENT]->(parents:Disease) WHERE ((protein.name+"~"+protein.id) IN [PROTEINIDS]) AND toFloat(r.score)>1.0 AND parents IN parent_diseases RETURN (protein.name+"~"+protein.id) AS node1, disease.name AS node2, r.score AS weight, type(r) AS type

Query id: association_disease_score
Query Name:  specific disease
Description:  Return the list of proteins associated to a specific disease with a specific score.
Involves

##### The query named 'specific disease' can help us in this case

In [285]:
disease_query = selected_queries["association_disease_score"]["query"]
proteins = ['"{}"'.format(p) for p in up_regulated_proteins["identifier"].tolist()]
diseases = ['DOID:1324']
diseases = ['"{}"'.format(d) for d in diseases]
disease_query = disease_query.format(",".join(proteins),",".join(diseases), 1)

In [286]:
proteins_associated_lung_cancer = connector.getCursorData(driver=driver, query=disease_query, parameters={})

In [287]:
proteins_associated_lung_cancer.head()

Unnamed: 0,node1,node2,source,type,weight
0,CPQ~Q9Y646,lung cancer,DISEASES,ASSOCIATED_WITH,1.034
1,CEACAM6~P40199,lung cancer,DISEASES,ASSOCIATED_WITH,1.35
2,AGR2~O95994,lung cancer,DISEASES,ASSOCIATED_WITH,1.133
3,THBS1~P07996,lung cancer,DISEASES,ASSOCIATED_WITH,1.263
4,KRT20~P35900,lung cancer,DISEASES,ASSOCIATED_WITH,1.67


In [288]:
proteins_associated_lung_cancer.shape

(21, 5)

### 2) Identify Inhibitory Drugs for those Proteins

##### We use again the functionality 'find_queries_involving_nodes' to find queries involving nodes: Protein, Drug

In [289]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Protein", "Drug"], print_pretty=True)

Query id: Drug
Query Name:  associated drugs in at least two of the proteins specified
Description:  get relationships to drugs. Limit the result to drugs associated to at least two proteins with a score higher than 0.9 (STITCH).
Involves nodes: Protein,Drug
Involves relationships: Protein,Drug
Query:
 MATCH (protein:Protein)-[r:ACTS_ON]-(drug:Drug) WHERE ((protein.name+"~"+protein.id) IN [PROTEINIDS]) AND toFloat(r.score)>0.9 WITH drug, count(r) AS r_count WHERE r_count>1 MATCH (protein:Protein)-[r:ACTS_ON]-(drug) WHERE ((protein.name+"~"+protein.id) IN [PROTEINIDS]) AND toFloat(r.score)>0.9 RETURN (protein.name+"~"+protein.id) AS node1, drug.name AS node2, r.score AS weight, r.action AS type

Query id: association_drug_intervention_proteins
Query Name:  drug intervention- protein association
Description:  Return associations between a list of proteins and the drug intervention in the project
Involves nodes: Project,Protein,Clinical_variable,Drug
Involves relationships: Project,Protei

In [290]:
proteins = ['"{}"'.format(p) for p in proteins_associated_lung_cancer['node1'].tolist()]
drug_query = queries["association_drug_interaction_score"]["query"].format(",".join(proteins), 'inhibition', 0.8)

##### We search in CKG database for known inhibitory drugs for these proteins

In [291]:
drugs_proposed = connector.getCursorData(driver=driver, query=drug_query, parameters={})

In [292]:
drugs_proposed.head()

Unnamed: 0,Drug_desc,action,drug_id,node1,node2,source,type,weight
0,A major primary bile acid produced in the live...,inhibition,DB02659,CDH1~P12830,Cholic Acid,STITCH,ACTS_ON,0.957
1,A synthetic nonsteroidal estrogen used in the ...,inhibition,DB00255,CDH1~P12830,Diethylstilbestrol,STITCH,ACTS_ON,0.8
2,Ketamine is an NMDA receptor antagonist with a...,inhibition,DB01221,CDH1~P12830,Ketamine,STITCH,ACTS_ON,0.8
3,"Sorafenib (rINN), marketed as Nexavar by Bayer...",inhibition,DB00398,CDH1~P12830,Sorafenib,STITCH,ACTS_ON,0.8
4,Nicotine is highly toxic alkaloid. It is the p...,inhibition,DB00184,CDH1~P12830,Nicotine,STITCH,ACTS_ON,0.8


In [293]:
drugs_proposed.shape

(36, 8)

##### These list of inhibitory drugs could in principle be used to identify alternative treatments
#### We can already see that CKG found the same inhibitory drug that was identified in the study case published. However, many other options are proposed and could be further ranked using other criteria.

In [294]:
from analytics_core import utils
from analytics_core.viz import viz

In [295]:
net = viz.get_network(data=drugs_proposed, identifier="inhibition_drugs", args={"source":"node1", "target":"node2", "values":"weight", "node_size":"degree","title":"Proposed drugs", "color_weight":False})

In [296]:
viz.visualize_notebook_network(net["notebook"], notebook_type='jupyter', layout={'width':'100%', 'height':'700px'})

Cytoscape(data={'elements': [{'data': {'degree': 7, 'betweenness': 0.021212121212121213, 'eigenvector': -2.594…

### 3) Identify Drug Known Side Effects

##### In the case study, toxicity was in part the reason why the treatment regimens did not work. We could use the list of side effects to prioritize these drugs.

##### Let's find database queries to obtain these associations: Phenotype (side effect), Drug.

In [263]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Phenotype", "Drug"], print_pretty=True)

Query id: association_drug_sideeffects
Query Name:  drug side effect association
Description:  Return the list of side effects linked to drugs
Involves nodes: Phenotype,Drug
Involves relationships: Phenotype,Drug
Query:
 MATCH (sideeffect:Phenotype)-[r]-(drug:Drug) WHERE (drug.id IN [{}]) RETURN drug.name AS node1, sideeffect.name AS node2, type(r) AS type, r.source AS source

How to use it:
 drugs = ['DB00439', 'DB06196']
drug_side_effect_associations = queries["association_drug_sideeffects"]["query"].format(drugs)


In [264]:
drugs = drugs_proposed["drug_id"].unique()
drugs = ['"{}"'.format(d) for d in drugs]
sideeffects_query = queries["association_drug_sideeffects"]["query"].format(",".join(drugs))

In [265]:
side_effects = connector.getCursorData(driver=driver, query=sideeffects_query, parameters={})

In [266]:
side_effects.head()

Unnamed: 0,node1,node2,source,type
0,Vorinostat,Hyperkinesis,SIDER,HAS_SIDE_EFFECT
1,Vorinostat,Poor appetite,SIDER,HAS_SIDE_EFFECT
2,Vorinostat,Xerostomia,SIDER,HAS_SIDE_EFFECT
3,Vorinostat,Squamous cell carcinoma,SIDER,HAS_SIDE_EFFECT
4,Vorinostat,Headache,SIDER,HAS_SIDE_EFFECT


##### The treatment regimens are also available in CKG and their side effects can be used to rank the proposed drugs. We can prioritize drugs with side effects dissimilar to the ones that caused an adverse reaction in the patient.

##### We will in this case define a new query to obtain the treatment intervention

In [306]:
treatment_intervention='''MATCH (project:Project)-[:HAS_ENROLLED_SUBJECT]-
                        (subject:Subject)-[:HAD_INTERVENTION]-(treatment:Clinical_variable)
                        WHERE project.id="P0000002
                        RETURN treatment.name AS treatment"'''
side_effects_similarity = '''MATCH (d1:Drug)-[:HAS_SIDE_EFFECT]->(phenotype1)
                            WHERE d1.name in [{}]
                            WITH d1, unique(collect(id(phenotype1))) AS d1sideeffects
                            MATCH (d2:Drug)-[:HAS_SIDE_EFFECTS]->(phenotype2)
                            WHERE d2.id IN [{}]
                            WITH d1, d1sideeffects, d2, collect(id(phenotype2)) AS d2sideeffects
                            RETURN d1.name AS from,
                                   d2.name AS to,
                                   algo.similarity.jaccard(d1sideeffects, d2sideeffects) AS similarity'''

In [304]:
net = viz.get_network(data=side_effects, identifier="side_effects", args={"source":"node1", "target":"node2", "node_size":"degree","title":"Side effects", "color_weight":False})

In [305]:
viz.visualize_notebook_network(net["notebook"], notebook_type='jupyter', layout={'width':'100%', 'height':'700px'})

Cytoscape(data={'elements': [{'data': {'degree': 1, 'radius': 1, 'color': '#1acf66', 'cluster': 0, 'id': 'Cuta…

In [249]:
net_json = net['net_json']
graph_nx = utils.json_network_to_networkx(net_json)
sorted_degrees = sorted(graph_nx.degree, key=lambda x: x[1], reverse=False)
count = 30
top_20_lowest_degree = []
for n, degree in sorted_degrees:
    if n in side_effects["node1"].unique():
        top_20_lowest_degree.append((n,degree))
        count -=1
    if count==0:
        break

In [250]:
top_20_lowest_degree

[('Acetic acid', 5),
 ('Cholic Acid', 7),
 ('Lactic Acid', 9),
 ('D-Lactic acid', 9),
 ('Ketamine', 21),
 ('Bumetanide', 40),
 ('Lapatinib', 41),
 ('Vorinostat', 42),
 ('Dinoprostone', 51),
 ('Tranylcypromine', 54),
 ('(1R,2S)-2-Phenylcyclopropanaminium', 54),
 ('Calcitriol', 59),
 ('Gefitinib', 66),
 ('Nicotine', 86),
 ('Minocycline', 98),
 ('Fluvastatin', 99),
 ('Sorafenib', 104),
 ('Dasatinib', 144),
 ('Atorvastatin', 152),
 ('Paclitaxel', 211),
 ('Doxorubicin', 282),
 ('Epirubicin', 282)]

In [251]:
set(top_20_highest_degree).intersection(top_20_lowest_degree)

set()