# Fragnet search example

Example of using fragnet search to fetch related molecule using the Fragment Network on the OpenRiskNet infrastructure.
**NOTE:** Details are subject to change.

Fragnet Search is an API around data from the Fragment Network. The Fragment Network was conceived by Astex and is used in their
fragment based drug design processes. It is described in this publication: https://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00809

Astex did not make the code available, and Anthony Bradley, while he was working at Diamond Light Source, reimplemented this in 
Python (see https://github.com/xchem/fragalysis). Informatics Matters have been working with Diamond to further develop this code
for use in Diamond's fragment screening program ([XChem](https://www.diamond.ac.uk/Instruments/Mx/Fragment-Screening.html) and 
are working on finding ohter uses for the powerful technology. One of these is the [Fragnet Search](https://fragnet.informaticsmatters.com/)
web application. More details can be found at: https://www.informaticsmatters.com/pages/fragment_network.html

The REST search API of Fragnet Search has been deployed to the OpenRiskNet reference site, allowing applications on the site to use this
API. The typical use is to expand a molecule using the fragment network and fetch a set of related molecules.

The key benefit of the fragment network over traditional chemical fingerprint based similarity searches are that the results are more 
'chemically intuitive', and this is especially true for small molecules such as fragments and building blocks.

In [1]:
import requests
import json
import urllib.parse
import pandas as pd

# requests_toolbelt module is used to handle the multipart responses.
# Need to `pip install requests-toolbelt` from a terminal to install. This might need doing each time the Notebook pod starts
try:
    from requests_toolbelt.multipart import decoder
except:
    %pip install requests_toolbelt
    from requests_toolbelt.multipart import decoder

Collecting requests_toolbelt
  Using cached https://files.pythonhosted.org/packages/60/ef/7681134338fc097acef8d9b2f8abe0458e4d87559c689a8c306d0957ece5/requests_toolbelt-0.9.1-py2.py3-none-any.whl
Installing collected packages: requests-toolbelt
Successfully installed requests-toolbelt-0.9.1
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Define some URLs and params
base_url = 'http://fragnet-search.fragnet-search.svc:8080/fragnet-search/rest'
expansion_url = base_url + '/v2/search/expand/'
keycloak_url = 'https://sso.prod.openrisknet.org/auth/realms/openrisknet/protocol/openid-connect/token'

# set to False if self signed certificates are being used
tls_verify=True

In [3]:
# Test the PING service. Should give a 200 response and return 'OK'.
# If not then nothing else is going to work.
# This endpoint is not authenticated

url = base_url + '/ping'

print("Requesting GET " + url)
resp = requests.get(url, verify=tls_verify)
print('Response Code: ' + str(resp.status_code))
print(resp.text)

Requesting GET http://fragnet-search.fragnet-search.svc:8080/fragnet-search/rest/ping
Response Code: 200
OK


## Authentication

You need to fetch an access token from the OpenRiskNet SSO environment (Keycloak) and pass that token in with your requests to the fragnet search API.
Details of this may well change soon.
Contact tdudgeon@informaticsmatters.com for details about how to log in.

In [4]:
# Need to specify your Keycloak SSO username and password so that we can get a token

import getpass
username = input('Username')
password = getpass.getpass('Password')

Username thomasexner
Password ·················


In [5]:
# Get token from Keycloak. This will have a finite lifetime.
# If your requests are getting a 401 error your token has probably expired.

data = {'grant_type': 'password', 'client_id': 'fragnet-search', 'username': username, 'password': password}
kresp = requests.post(keycloak_url, data = data)
print('Response code: ' + str(kresp.status_code))

j = kresp.json()
token = j['access_token']
print("Token length: " + str(len(token)))
#token

Response code: 200
Token length: 1409


## Run the expansion search
The parameters are:
* query_smiles - the molecule to search for as SMILES
* hops - the number of edges to traverse in the fragment network. Must be 1 or 2
* hac - the change in heavy atom count between the query and the result molecules
* rac - the change in ring atom count between the query and the result molecules

POST operations using Molfile format are also supported.

The result is JSON.

In [6]:
query_smiles = 'NC1CCCNC1'
hops = 1
hac = 3
rac = 1
url =  expansion_url  + urllib.parse.quote(query_smiles) + '?hac=' + str(hac) + '&rac=' + str(rac) + '&hops=' + str(hops)
print("Requesting GET " + url)
jobs_resp = requests.get(url, headers={'Authorization':  'bearer ' + token}, verify=tls_verify)
print('Response Code: ' + str(jobs_resp.status_code))
json = jobs_resp.json()

Requesting GET http://fragnet-search.fragnet-search.svc:8080/fragnet-search/rest/v2/search/expand/NC1CCCNC1?hac=3&rac=1&hops=1
Response Code: 200


## Extract the SMILES strings and get names and InChIs using chemidconvert

In [12]:
from IPython.core.display import HTML
def getNameFromSMILES(url):
    request = requests.get('https://chemidconvert.prod.openrisknet.org/v1/smiles/to/names', params={'smiles': url})
    return request.json()['names']
def getInChIFromSMILES(url):
    request = requests.get('https://chemidconvert.prod.openrisknet.org/v1/smiles/to/inchi', params={'smiles': url})
    return request.json()['inchi']
def getInChIKeyFromSMILES(url):
    request = requests.get('https://chemidconvert.prod.openrisknet.org/v1/smiles/to/inchikey', params={'smiles': url})
    return request.json()['inchikey']
def smiles_to_image_html(smiles):
    return '<img style="width:600px" src="http://chemidconvert.cloud.douglasconnect.com/v1/asSvg?smiles={}"/>'.format(urllib.parse.quote(smiles))

str(len(json['members']))
mols = []
for member in json['members']:
    #print(member['smiles'])
    mols.append(member['smiles'])

compounds = pd.DataFrame(mols, columns=['SMILES'])
#compounds = compounds.head()
compounds['Name'] = compounds.SMILES.apply(getNameFromSMILES)
compounds['InChI'] = compounds.SMILES.apply(getInChIFromSMILES)
compounds['InChIKey'] = compounds.SMILES.apply(getInChIKeyFromSMILES)

HTML(compounds.to_html(escape=False ,formatters=dict(SMILES=smiles_to_image_html)))


Unnamed: 0,SMILES,Name,InChI,InChIKey
0,,"[Piperidine, 110-89-4, 571261_SIAL, 643602_ALDRICH, Piperidine on Rasta Resin, W290807_ALDRICH, AI3-24114, CCRIS 967, Cyclopentimine, Cypentil, EINECS 203-813-0, FEMA No. 2908, HSDB 114, Hexazane, Pentamethyleneimine, Pentamethylenimine, Perhydropyridine, Piperidin [German], Piperidine [UN2401] [Corrosive], Pyridine, hexahydro-, UN2401, 80645_FLUKA, Azacyclohexane, C01746, Hexahydropyridine, Piperidine, ST5213814, InChI=1/C5H11N/c1-2-4-6-5-3-1/h6H,1-5H, PIP, Piperidine solution, NCIOpen2_007828, NCIMech_000312, CHEBI:18049, 104094_SIAL, LS-3053, 411027_ALDRICH, 33537_RIEDEL, 80640_FLUKA]","InChI=1S/C5H11N/c1-2-4-6-5-3-1/h6H,1-5H2",NQRYJNQNLNOLGT-UHFFFAOYSA-N
1,,"[1-(2-aminoethyl)piperidin-3-amine, 1-(2-aminoethyl)-3-piperidinamine, [1-(2-aminoethyl)-3-piperidyl]amine]","InChI=1S/C7H17N3/c8-3-5-10-4-1-2-7(9)6-10/h7H,1-6,8-9H2",NQCQWIFBJYNOQM-UHFFFAOYSA-N
2,,,"InChI=1S/C7H17N3/c8-3-5-10-4-1-2-7(9)6-10/h7H,1-6,8-9H2/t7-/m1/s1",NQCQWIFBJYNOQM-SSDOTTSWSA-N
3,,,"InChI=1S/C6H15N3/c7-4-6(8)2-1-3-9-5-6/h9H,1-5,7-8H2",GQNLOVFVDPNDBK-UHFFFAOYSA-N
4,,,"InChI=1S/C5H12N2O/c6-4-3-7-2-1-5(4)8/h4-5,7-8H,1-3,6H2",PSSWASGEGXCINO-UHFFFAOYSA-N
5,,,"InChI=1S/C5H12N2O/c6-4-3-7-2-1-5(4)8/h4-5,7-8H,1-3,6H2/t4-,5-/m1/s1",PSSWASGEGXCINO-RFZPGFLSSA-N
6,,,"InChI=1S/C5H11FN2/c6-4-1-2-8-3-5(4)7/h4-5,8H,1-3,7H2",TUPWXEMZUIDTEF-UHFFFAOYSA-N
7,,,"InChI=1S/C6H12N2O2/c7-5-3-8-2-1-4(5)6(9)10/h4-5,8H,1-3,7H2,(H,9,10)",ZTQJXSLKWQLXRB-UHFFFAOYSA-N
8,,,"InChI=1S/C5H12N2O/c6-4-1-5(8)3-7-2-4/h4-5,7-8H,1-3,6H2",WJROWXQXVGJHHA-UHFFFAOYSA-N
9,,"[piperidine-3,5-diamine, (5-amino-3-piperidyl)amine]","InChI=1S/C5H13N3/c6-4-1-5(7)3-8-2-4/h4-5,8H,1-3,6-7H2",VMMXQCOKHQCHHD-UHFFFAOYSA-N


## Check if we find something in ToxCast

In [14]:
try:
    from edelweiss_data import API, QueryExpression as Q
except ImportError:
    %pip install edelweiss_data
    from edelweiss_data import API, QueryExpression as Q

edelweiss_api_url = 'https://api.develop.edelweiss.douglasconnect.com'
api = API(edelweiss_api_url)
api.authenticate()

In [24]:
columns = [
#    ("Endpoint", "$.assay.component.endpoint"),
    ("Endpoint name", "$.assay.component.endpoint.assay_component_endpoint_name.value"),
    ("Biological target", "$.assay.component.endpoint.target.biological_process_target.value"),
    ("Entrez gene ID for the molecular target", "$.assay.component.endpoint.target.intended.intended_target_gene.intended_target_entrez_gene_id.value"),
    ("Symbol", "$.assay.component.endpoint.target.intended.intended_target_gene.intended_target_official_symbol.value"),
    ("Gene name", "$.assay.component.endpoint.target.intended.intended_target_gene.intended_target_gene_name.value"),
    ("Compounds", "$.compound"),
]

condition = Q.search_anywhere("EPA-ToxCast") & Q.search_anywhere("Tox21") & Q.search_anywhere("summary")

cquery = None
for compound in compounds['InChIKey'].values:
    if cquery is None:
        cquery = Q.fuzzy_search(Q.column('Compounds'), compound)
    else:
        cquery = cquery | Q.fuzzy_search(Q.column('Compounds'), compound)
condition =  condition & cquery

ToxCast = api.get_published_datasets(limit=1, columns=columns, condition=condition)
ToxCast

Unnamed: 0_level_0,Unnamed: 1_level_0,dataset,Endpoint name,Biological target,Entrez gene ID for the molecular target,Symbol,Gene name,Compounds
id,version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1163d790-c68d-4249-ad24-cd4114df121a,1,<PublishedDataset '1163d790-c68d-4249-ad24-cd4...,TOX21_AhR_LUC_Agonist,regulation of transcription factor activity,196,AHR,aryl hydrocarbon receptor,"[{'CAS': {'value': None}, 'InChI': {'value': N..."


In [26]:
ToxCastData = pd.DataFrame()
for index, row in ToxCast.iterrows():
    cquery = None
    for compound in compounds['InChIKey'].values:
        if cquery is None:
            cquery = Q.fuzzy_search(Q.column('InChI key'), compound)
        else:
            cquery = cquery | Q.fuzzy_search(Q.column('InChI key'), compound)

    tmpdata = row['dataset'].get_data(condition = cquery)

    tmpdata = tmpdata[tmpdata['InChI key'].isin(compounds['InChIKey'].values)]
    tmpdata['Assay']=row['Endpoint name']
    tmpdata = tmpdata[['Assay', 'DTXSID', 'Substance name', 'InChI key', 'CAS', 'IC50']]
    ToxCastData = pd.concat([ToxCastData, tmpdata])
    
ToxCastData.sort_values(by=['InChI key','Assay'])

Unnamed: 0,Assay,DTXSID,Substance name,InChI key,CAS,IC50
5016,TOX21_AhR_LUC_Agonist,DTXSID6021165,Piperidine,NQRYJNQNLNOLGT-UHFFFAOYSA-N,110-89-4,
