<a href="https://colab.research.google.com/github/VibrantStarling/Useful-code-chunks/blob/main/Gene_search_api_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Search for gene IDs on VEuPathDB with Python

This code walks you through how to run a remote ID search for VEuPathDB and any of it's component sites with python.

**You will need:**
- a list of IDs that you want to search.
- python packages: requests, json, pandas

You can run the search here, or you can take this code and make your own script.

---

## The Code

1. Load the packages

In [1]:
import requests
import json
import pandas as pd

2. Load your IDs into a list called `list_of_ids`

In [6]:
list_of_ids = ['A1CLC0','B8N0R8','B8N0V6','A0A3M7JTW8','B8N1G3','B8N3Q0','B8N3R1','B8N3S2']

3. Define a function to retrieve your IDs and make a pandas dataframe of your results.

**Some things you can change:**

- Either of the URLs can be changed to one of the component sites instead of VEuPathDB
- In step 2, when making `x` you can change the contents of the list `attributes` to anything listed in [this document](https://veupathdb.org/veupathdb/service/record-types/transcript). It is currently set to return the same thing you would see on your search results page.

In [12]:
def get_veupath_IDs(list_of_ids):
    # STEP 1: retrieve the location ID for your gene list:
    # this will be something like {"id":222113653}
    #ist_of_ids = ['A1CLC0','B8N0R8','B8N0V6','A0A3M7JTW8','B8N1G3','B8N3Q0','B8N3R1','B8N3S2']
    url = "https://veupathdb.org/veupathdb/service/users/current/datasets"
    payload = {"sourceType": "idList", "sourceContent": {"ids": list_of_ids}}
    r = requests.post(url, json=payload)
    location_ID = json.loads(r.text)
    #
    # STEP 2: Retrieve a JSON of your results from GenesByLocusTag
    # this uses the location ID and cookie from step 1 to pull back your results
    # and then converts them into a pandas dataframe
    url2 = "https://veupathdb.org/veupathdb/service/record-types/transcript/searches/GeneByLocusTag/reports/standard"
    attributes = ["primary_key","transcript_link","organism","project_id","input_id",
                  "gene_location_text","gene_product",
                  "gene_type"]
    x = requests.post(url2, cookies=r.cookies,
                    json={"searchConfig":{"parameters": {"ds_gene_ids": str(location_ID['id'])}},"reportConfig":{"attributes":attributes,"tables":[]}},
                    )
    results = json.loads(x.text)
    #
    matches = pd.DataFrame(columns=["primary_key","project_id","organism","input_id"])
    for item in results["records"]:
        matches = matches._append(item['attributes'],ignore_index=True)
    #
    matches['organism'] = matches.organism.str.replace("<i>", "").str.replace("</i>","")
    return matches

matches = get_veupath_IDs(list_of_ids)#
matches

Unnamed: 0,primary_key,project_id,organism,input_id,gene_location_text,gene_type,transcript_link,gene_product
0,ACLA_041650,FungiDB,Aspergillus clavatus NRRL 1,A1CLC0,"DS027056:2,485,855..2,487,985(-)",protein coding gene,ACLA_041650-t26_1,"potassium ion channel Yvc1, putative"
1,AFLA_000447,FungiDB,Aspergillus flavus NRRL3357,B8N0R8,"AAIH03000226:815,987..817,074(-)",protein coding gene,AFLA_000447_t1,Mediator of RNA polymerase II transcription su...
2,AFLA_000490,FungiDB,Aspergillus flavus NRRL3357,B8N0V6,"AAIH03000226:946,952..948,400(+)",protein coding gene,AFLA_000490_t1,"C2H2 finger domain protein, putative [Source:U..."
3,AFLA_000792,FungiDB,Aspergillus flavus NRRL3357,A0A3M7JTW8,"AAIH03000226:1,751,501..1,753,639(+)",protein coding gene,AFLA_000792_t1,"Transcriptional regulator Ngg1, putative [Sour..."
4,AFLA_000894,FungiDB,Aspergillus flavus NRRL3357,B8N1G3,"AAIH03000226:2,035,542..2,037,529(+)",protein coding gene,AFLA_000894_t1,unspecified product
5,AFLA_001005,FungiDB,Aspergillus flavus NRRL3357,B8N3Q0,"AAIH03000226:2,299,933..2,301,962(+)",protein coding gene,AFLA_001005_t1,BZIP domain-containing protein [Source:UniProt...
6,AFLA_001018,FungiDB,Aspergillus flavus NRRL3357,B8N3R1,"AAIH03000226:2,334,830..2,337,399(-)",protein coding gene,AFLA_001018_t1,Grh/CP2 DB domain-containing protein [Source:U...
7,AFLA_001026,FungiDB,Aspergillus flavus NRRL3357,B8N3S2,"AAIH03000226:2,368,747..2,373,687(-)",protein coding gene,AFLA_001026_t1,Protein kinase domain-containing protein [Sour...
8,F9C07_2309,FungiDB,Aspergillus flavus NRRL3357 2020,A0A3M7JTW8,"CP044622:4,555,451..4,557,589(-)",protein coding gene,F9C07_2309_t1,putative transcriptional regulator Ngg1
