# STRING DB API Access for Protein Protein Interaction Enrichment

_Author: _ Natalia García Sánchez
_Date: 26/03/2023_
_Description: _ API access with IDENTIFIERS from custom parsing script `stringdb_mapping.sh` obtention in HDAC dataset

In [2]:
import requests
import pandas as pd

get current STRING db API REST version 

In [42]:
request_url =  "https://string-db.org/api/tsv/version"
res = requests.post(request_url, {'caller_identity':'https://github.com/Natpod/Bnapus_ME'})
" - ".join(res.text.split("\t")).split("\n")[:-1]

['string_version - stable_address',
 '11.5 - https://version-11-5.string-db.org']

load gene mappings

In [3]:
sdb_df = pd.read_csv('./Filtered_DEGs_mappings_string.csv')

In [43]:
sdb_df.head()

Unnamed: 0,gene_id,stringdb_id
0,GSBRNA2T00000015001,3708.A0A078F5D6
1,GSBRNA2T00000029001,3708.A0A078F5E8
2,GSBRNA2T00000065001,3708.A0A078F9A0
3,GSBRNA2T00000073001,3708.A0A078F1M2
4,GSBRNA2T00000106001,3708.A0A078F4M7


In [None]:
DEGs_df = pd.read_csv('./Filtered_DEG_LFC1_padj05_results_total.csv')
# merge with mappings
sdb_df = sdb_df.merge(DEGs_df, on='gene_id', how='inner')

This would be our gene list if it had a subset of less than 2000 genes. If we want API access to STRINGd for information extraction, we will neeed a list of less than 2000 genes

In [44]:
my_genes=sdb_df['stringdb_id']

One solution would be sorting by high changes in expression |LFC| > 4

In [None]:
my_genes=sdb_df[abs(sdb_df['log2FoldChange'])>4]['stringdb_id']

get network image

In [46]:
##
## Construct URL
##

string_api_url = "https://version-11-5.string-db.org/api"
method = "network"

request_url = "/".join([string_api_url, 'tsv', method])

##
## Set parameters
##


params = {

    "identifiers" : "%0d".join(my_genes), # your protein
    "species" : 3708, # species NCBI identifier 
    "caller_identity" : "https://github.com/Natpod/Bnapus_ME" # your app name

}

##
## Call STRING
##

response = requests.post(request_url, data=params)

for line in response.text.strip().split("\n"):

    l = line.strip().split("\t")
    p1, p2 = l[2], l[3]
    
    ## filter the interaction according to experimental score
    experimental_score = float(l[10])
    if experimental_score > 0.8:
         print 
        print("\t".join([p1, p2, "experimentally confirmed (prob. %.3f)" % experimental_score]))

['Error', 'ErrorMessage']
['input too large', "STRING website does not support networks larger than 2000 nodes. In order to visualize large STRING networks please use our <br/><br/> <a class='error_linkout' target='_blank' href='https://apps.cytoscape.org/apps/stringapp'>Cytoscape stringApp</a>. <br/><br/>The stringApp, among other features, gives you access to more network customization options, lets you augument the network with your data and performs the functional enrichment analysis."]


This cell will give interaction partners present in the DEGs PPI

In [None]:
string_api_url = "https://version-11-5.string-db.org/api"
method = "interaction_partners
request_url = "/".join([string_api_url, output_format, method])


## Set parameters
##

params = {

    "identifiers" : "%0d".join(my_genes), # your protein
    "species" : 3708, # species NCBI identifier 
    "limit" : 5,
    "caller_identity" : "https://github.com/Natpod/Bnapus_ME" # your app name

}

##
## Call STRING
##

response = requests.post(request_url, data=params)

##
## Read and parse the results
##

for line in response.text.strip().split("\n"):

    l = line.strip().split("\t")
    query_ensp = l[0]
    query_name = l[2]
    partner_ensp = l[1]
    partner_name = l[3]
    combined_score = l[5]

    ## print

    print("\t".join([query_ensp, query_name, partner_name, combined_score]))

This cell will return a df with the nodes and edges from the gene list which can be plotted in igraph


In [None]:
## Construct the request
##
string_api_url = "https://string-db.org/api"
request_url = "/".join([my_genes, "tsv", "network"])
params = {
    "identifiers": "\r".join(identifiers),  # your protein list
    "species": 3708,  # species NCBI identifier
    "caller_identity": "https://github.com/Natpod/Bnapus_ME",
    "required_score": 900, # high confidence PPI interactions
    "add_nodes": add_nodes
}
results = requests.post(request_url, data=params)
df = handle_results(results)

In [None]:
string_api_url = "https://version-11-5.string-db.org/api"
request_url = "/".join([string_api_url, 'tsv', 'ppi_enrichment'])

params = {
    "identifiers": "\r".join(identifiers),  # your protein list
    "species": 3708,  # species NCBI identifier
    "required_score": 900,
    "caller_identity": "https://github.com/Natpod/Bnapus_ME"
}
results = requests.post(request_url, data=params)
df = handle_results(results)