# DODGEKB Chemical Similarity Appyter
DODGEKB (Defective Organ Development Genetic Effects Knowledge Base) is an initiative that will produce a knowledge graph that connects phenotypes, tissues and organs, cell types and cell lines, drugs, and genes, based on knowledge and annotations captured across Common Fund DCCs.

This Appyter provides knowledge from the chemical standpoint by facilitating the querying of small molecules (i.e. teratogens) to retrieve similiar small molecules based on Tanimoto structural similarity and similarity in the L1000 gene expression space. In this way, potentiallly teratogenic small molecules can be prioritized based on their structural and gene expression-based similarity to known teratogens.

In [None]:
#%%appyter init
from appyter import magic
magic.init(lambda _=globals: _())

In [None]:
import pandas as pd
import numpy as np

# Display / graphing
from IPython.display import display, HTML

# API access
import requests
import json

In [None]:
# Notebook display util functions
def make_clickable(link):
    return f'<a target="_blank" href="{link}">{link}</a>'

table_number = 0
figure_number = 0
def figure_header(label,title):
    global table_number
    global figure_number
    if label == 'Table':
        table_number += 1
        label = f'Table {table_number}'
    elif label == 'Figure':
        figure_number += 1
        label = f'Figure {figure_number}'
    display(HTML(f"<div style='font-size:1.25rem; padding:1rem 0;'><b>{label}</b>: {title}</div>"))
    
def figure_legend(label,title,content=''):
    global table_number
    global figure_number
    if label == 'Table':
        label = f'Table {table_number}'
    elif label == 'Figure':
        label = f'Figure {figure_number}'
    display(HTML(f'<style>div.caption {{text-align: center;}}</style><div class=caption><b>{label}</b>: <i>{title}</i>. {content} </div>'))

In [None]:
L1000_similarity_scores = requests.get('https://appyters.maayanlab.cloud/storage/DODGE-Chemical-Similarity/L1000_signature_similarity_scores.json').json()
ECFP4_similartiy_scores = requests.get('https://appyters.maayanlab.cloud/storage/DODGE-Chemical-Similarity/ECFP4_similarity_scores.json').json()
ECFP6_similarity_scores = requests.get('https://appyters.maayanlab.cloud/storage/DODGE-Chemical-Similarity/ECFP6_similarity_scores.json').json()

In [None]:
%%appyter hide_code

{% do SectionField(name='method_selection',
                   title='Input a small molecule of interest',
                   subtitle='Type a small molecule name of interest into the autocomplete field below to find\
                   related small molecules based on Tanimoto similarity and L1000 gene expression signature\
                   similarity.',
                   img='drug.png'
)%}

{% set drug = AutocompleteField(name = 'drug',
                                label = 'Small molecule name',
                                default = 'valproic-acid',
                                description = 'Enter the small molecule name of interest',
                                file_path = 'https://appyters.maayanlab.cloud/storage/DODGE-Chemical-Similarity/lincs_drugs.json',
                                section = 'method_selection'
)%}

In [None]:
%%appyter markdown
### Top 20 LINCS small molecules most similar to {{drug.value}} based on ECFP4 Tanimoto Similarity
The canonical SMILES strings of LINCS small molecules were converted into Extended Connectivity
Fingerprints (radius=4) using RDKit. Tanimoto similarity between all unique small molecules was computed.
The top 20 most similar small molecules, ranked by Tanimoto similarity, are displayed in the
table below along with a downloadable version with the top 100 most similar small molecules.

In [None]:
%%appyter code_exec
ecfp4 = pd.DataFrame.from_dict(ECFP4_similartiy_scores[{{drug}}],
                       orient='index',
                       columns = ['Tanimoto Similarity Score'])
filename = f"{{drug.value}}_ECFP4_Tanimoto_Similarity.csv"
ecfp4.to_csv(filename)
figure_header('Table', 'Top Predicted Compounds From ECFP4 Tanimoto Similarity<br>({})</br>'.format(make_clickable(filename)))
display(ecfp4.head(20))

In [None]:
%%appyter markdown
### Top 20 LINCS small molecules most similar to {{drug.value}} based on ECFP6 Tanimoto Similarity
The canonical SMILES strings of LINCS small molecules were converted into Extended Connectivity
Fingerprints (radius=6) using RDKit. Tanimoto similarity between all unique small molecules was computed.
The top 20 most similar small molecules, ranked by Tanimoto similarity, are displayed in the
table below along with a downloadable version with the top 100 most similar small molecules.

In [None]:
%%appyter code_exec
ecfp6 = pd.DataFrame.from_dict(ECFP6_similarity_scores[{{drug}}],
                       orient='index',
                       columns = ['Tanimoto Similarity Score'])
filename = f"{{drug.value}}_ECFP6_Tanimoto_Similarity.csv"
ecfp6.to_csv(filename)
figure_header('Table', 'Top Predicted Compounds From ECFP6 Tanimoto Similarity<br>({})</br>'.format(make_clickable(filename)))
display(ecfp6.head(20))

In [None]:
%%appyter markdown
### Top 20 LINCS small molecules most similar to {{drug.value}} based on L1000 gene expression similarity
Consensus signatures were computed for each unique LINCS small molecule and pairwise cosine similarity of the
gene expression vectors between all unique small molecules was computed. The top 20 most similar small molecules,
ranked by L1000 gene expression signature cosine similarity, are displayed in the table below along with a
downloadable version with the top 100 most similar small molecules.

In [None]:
%%appyter code_exec
l1000_ge = pd.DataFrame.from_dict(L1000_similarity_scores[{{drug}}],
                       orient='index',
                       columns = ['Cosine similarity of L1000 signatures'])
filename = f"{{drug.value}}_L1000_Signature_Similarity.csv"
l1000_ge.to_csv(filename)
figure_header('Table', 'Top Predicted Compounds From L1000 Gene Expression Signature Similarity<br>({})</br>'.format(make_clickable(filename)))
display(l1000_ge.head(20))