# IEDB Query API (IQ-API) - Use Case 1H
**Goal**: Search for linear T cell epitopes arising from human proteins, restricted by HLA-A*02:01 that were tested in healthy individuals.

The goal of this use case is to query for T cell epitopes arising from human proteins, restricted by HLA-A*02:01 that were tested in healthy individuals.

For more information on the expressive syntax of PostgresT, refer to [this document](https://postgrest.org/en/stable/api.html#).  For more details on the tables that are part of the API, refer to [the swagger documetation](http://query-api.iedb.org/docs/swagger/).

---

First, let's import required modules, set some globals, and define a function to print the corresponding CURL command for each request.  I've tried to include that CURL command for each example so that you can copy/paste it into your terminal.  You may want to pipe the output to a tool like 'jq' to have it render neatly.

In [1]:
import requests
import json
import time
import pandas as pd
from io import StringIO

base_uri='https://query-api.iedb.org'

# funciton to print the CURL command given a request
def print_curl_cmd(req):
    url = req.url
    print("curl -X 'GET' '" + url + "'")



This may or may not have resulted in a warning about lzma compression.  That can be safely ignored...

## Querying the epitope table

First, let's try querying for all epitopes matching this search (457 as of Oct 15, 2021) by querying only the epitope_search table.

In [2]:
search_params={ 'structure_type': 'eq.Linear peptide',
                'tcell_ids': 'not.is.null',
                'mhc_allele_names': 'cs.{HLA-A*02:01}',
                'host_organism_iris': 'cs.{NCBITaxon:9606}',
                'source_organism_iris': 'cs.{NCBITaxon:9606}',
                'disease_names': 'cs.{healthy}',
                'order': 'structure_iri',
              }
table_name='epitope_search'
full_url=base_uri + '/' + table_name
result = requests.get(full_url, params=search_params)
print_curl_cmd(result)

curl -X 'GET' 'https://query-api.iedb.org/epitope_search?structure_type=eq.Linear+peptide&tcell_ids=not.is.null&mhc_allele_names=cs.%7BHLA-A%2A02%3A01%7D&host_organism_iris=cs.%7BNCBITaxon%3A9606%7D&source_organism_iris=cs.%7BNCBITaxon%3A9606%7D&disease_names=cs.%7Bhealthy%7D&order=structure_iri'


OK we have the result...now let's have a look.  **Note**: We are only printing the first record here to get a sense of what is returned.

In [3]:
print(json.dumps(result.json()[:1], indent=4))

[
    {
        "structure_id": 100725,
        "structure_iri": "IEDB_EPITOPE:100725",
        "structure_descriptions": [
            "ALWMRLLPL"
        ],
        "curated_source_antigens": [
            {
                "accession": "114318995",
                "name": "insulin [Homo sapiens]",
                "iri": "GENPEPT:114318995",
                "starting_position": 2,
                "ending_position": 10,
                "source_organism_name": "Homo sapiens (human)",
                "source_organism_iri": "NCBITaxon:9606"
            },
            {
                "accession": "ABI63346.1",
                "name": "insulin",
                "iri": "GENPEPT:ABI63346.1",
                "starting_position": 2,
                "ending_position": 10,
                "source_organism_name": "Homo sapiens (human)",
                "source_organism_iri": "NCBITaxon:9606"
            },
            {
                "accession": "NP_000198.1",
                "name": "proins

We can also load the output into a data frame.

In [4]:
df = pd.json_normalize(result.json())
df

Unnamed: 0,structure_id,structure_iri,structure_descriptions,curated_source_antigens,structure_type,linear_sequence,e_modification,linear_sequence_length,iedb_assay_ids,iedb_assay_iris,...,bcell_ids,bcell_iris,elution_ids,elution_iris,journal_names,reference_types,pubmed_ids,reference_titles,reference_authors,reference_dates
0,100725,IEDB_EPITOPE:100725,[ALWMRLLPL],"[{'accession': '114318995', 'name': 'insulin [...",Linear peptide,ALWMRLLPL,,9,"[1605715, 1606246, 1617672, 1627631, 1627642, ...","[IEDB_ASSAY:1605715, IEDB_ASSAY:1606246, IEDB_...",...,,,"[1605715, 1643711, 1643743, 1954138, 3829683, ...","[IEDB_ASSAY:1605715, IEDB_ASSAY:1643711, IEDB_...","[BMC Immunol, Cell Metab, Clin Exp Immunol, Cl...",[Literature],"[14617048, 17065344, 17327428, 18305140, 19837...",[CD8 T cell autoreactivity to preproinsulin ep...,[Emanuela Martinuzzi; Giulia Novelli; Matthieu...,"[2003, 2006, 2007, 2008, 2010, 2012, 2015, 2018]"
1,100843,IEDB_EPITOPE:100843,[FLFAVGFYL],"[{'accession': 'NP_066999.1', 'name': 'islet-s...",Linear peptide,FLFAVGFYL,,9,"[1605718, 1606254, 1617680, 1643063, 1643064, ...","[IEDB_ASSAY:1605718, IEDB_ASSAY:1606254, IEDB_...",...,,,"[1605718, 1643065, 1681498, 3829720]","[IEDB_ASSAY:1605718, IEDB_ASSAY:1643065, IEDB_...","[BMC Immunol, Clin Immunol, Diabetes, Diabetol...",[Literature],"[17065343, 17065344, 18358785, 28887632, 29562...",[Identification of Novel HLA-A*0201-restricted...,[Irene Jarchum; Lynn Nichol; Massimo Trucco; P...,"[2006, 2008, 2017, 2018]"
2,100844,IEDB_EPITOPE:100844,[FLIVLSVAL],"[{'accession': 'CAA39504.1', 'name': 'IAPP [Ho...",Linear peptide,FLIVLSVAL,,9,"[1605720, 1606253, 1617682, 1681501, 1681515, ...","[IEDB_ASSAY:1605720, IEDB_ASSAY:1606253, IEDB_...",...,,,"[1605720, 1681501, 3829711]","[IEDB_ASSAY:1605720, IEDB_ASSAY:1681501, IEDB_...","[BMC Immunol, Diabetes]",[Literature],"[17065343, 17065344, 29562882]",[Identification of Novel HLA-A*0201-restricted...,[John Sidney; Jose Luis Vela; Dave Friedrich; ...,"[2006, 2018]"
3,100845,IEDB_EPITOPE:100845,[FLWSVFMLI],"[{'accession': 'SRC244406', 'name': 'islet-spe...",Linear peptide,FLWSVFMLI,,9,"[1605719, 1606255, 1617681, 3829713, 5564277]","[IEDB_ASSAY:1605719, IEDB_ASSAY:1606255, IEDB_...",...,,,"[1605719, 3829713]","[IEDB_ASSAY:1605719, IEDB_ASSAY:3829713]","[BMC Immunol, Diabetes, Nat Biotechnol]",[Literature],"[17065344, 29562882, 30418433]",[High-throughput determination of the antigen ...,[John Sidney; Jose Luis Vela; Dave Friedrich; ...,"[2006, 2018]"
4,100882,IEDB_EPITOPE:100882,[GIVEQCCTSI],"[{'accession': 'ABI63346.1', 'name': 'insulin'...",Linear peptide,GIVEQCCTSI,,10,"[1617478, 1617485, 1617490, 1619777, 1621610, ...","[IEDB_ASSAY:1617478, IEDB_ASSAY:1617485, IEDB_...",...,,,"[1617478, 1617490, 1639832, 1643772, 1762197, ...","[IEDB_ASSAY:1617478, IEDB_ASSAY:1617490, IEDB_...","[Ann N Y Acad Sci, BMC Immunol, Clin Exp Immun...",[Literature],"[14617048, 15983206, 17130554, 17327428, 18390...",[CD8 T cell autoreactivity to preproinsulin ep...,[Emmanuelle Enée; Emanuela Martinuzzi; Philipp...,"[1996, 2003, 2005, 2006, 2007, 2008, 2010, 201..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,952989,IEDB_EPITOPE:952989,[YQMDIQQEL],"[{'accession': 'P04114.2', 'name': 'Apolipopro...",Linear peptide,YQMDIQQEL,,9,"[6352650, 6352814, 6352815, 6352824, 8571805]","[IEDB_ASSAY:6352650, IEDB_ASSAY:6352814, IEDB_...",...,,,"[6352650, 8571805]","[IEDB_ASSAY:6352650, IEDB_ASSAY:8571805]","[Acta Neuropathol, Sci Rep]",[Literature],"[29557506, 31757993]",[Induction of HLA-A2 restricted CD8 T cell res...,[Frank H Schaftenaar; Jacob Amersfoort; Hidde ...,"[2018, 2019]"
676,956443,IEDB_EPITOPE:956443,[RLARLALVL],"[{'accession': 'Q13641.1', 'name': 'Trophoblas...",Linear peptide,RLARLALVL,,9,"[6352830, 6352831, 6352832, 6352833, 6352834, ...","[IEDB_ASSAY:6352830, IEDB_ASSAY:6352831, IEDB_...",...,,,[6352837],[IEDB_ASSAY:6352837],[Cancer Immunol Immunother],[Literature],[31686124],[Preclinical development of T-cell receptor-en...,[Yuexin Xu; Alicia J Morales; Michael J Cargil...,[2019]
677,95972,IEDB_EPITOPE:95972,[VMAPRTVLL],"[{'accession': '1465390624', 'name': 'MHC clas...",Linear peptide,VMAPRTVLL,,9,"[1588321, 1597906, 1647392, 1647393, 1770471, ...","[IEDB_ASSAY:11103344, IEDB_ASSAY:11560295, IED...",...,,,"[1588321, 1597906, 1647392, 1647393, 1775259, ...","[IEDB_ASSAY:11103344, IEDB_ASSAY:11560295, IED...","[Acta Neuropathol, Anal Chem, Cancer Immunol I...","[Dual, Literature, Live, Submission]","[11564797, 11920559, 12411439, 12461076, 20877...",[A large fraction of HLA class I ligands are p...,[Ana Marcu; Leon Bichmann; Leon Kuchenbecker; ...,"[1997, 1998, 2001, 2002, 2003, 2010, 2012, 201..."
678,989936,IEDB_EPITOPE:989936,[KLQCVDLHV],"[{'accession': 'AAA60193.1', 'name': 'prostate...",Linear peptide,KLQCVDLHV,,9,"[6376241, 8592283]","[IEDB_ASSAY:6376241, IEDB_ASSAY:8592283]",...,,,[8592283],[IEDB_ASSAY:8592283],[Acta Neuropathol],"[Literature, Submission]",[29557506],[CD8+ T cells of Healthy Donors 1-4: Single Ce...,"[10x Genomics, Marian Christoph Neidert; Danie...","[2018, 2020]"


Alright, there are 680 records returned which is much more than we were expecting from performing this search through the web interface.  This is because some of these peptides were tested under all of the conditions that we specified, but not necessarily all in the same assay.  So, in order to do this correctly, we must first search on the tcell_search table to pull epitopes tested in a single assay that meets all of these conditions.

## Querying the tcell_search table

Let's search for records that match all of these conditions in a single T cell assay:

In [5]:
search_params={ 'structure_type': 'eq.Linear peptide',
                'mhc_allele_name': 'eq.HLA-A*02:01',
                'host_organism_iri': 'eq.NCBITaxon:9606',
                'source_organism_iri': 'eq.NCBITaxon:9606',
                'disease_names': 'cs.{healthy}',
                'order': 'structure_iri',
              }
table_name='tcell_search'
full_url=base_uri + '/' + table_name
result = requests.get(full_url, params=search_params)
print_curl_cmd(result)

curl -X 'GET' 'https://query-api.iedb.org/tcell_search?structure_type=eq.Linear+peptide&mhc_allele_name=eq.HLA-A%2A02%3A01&host_organism_iri=eq.NCBITaxon%3A9606&source_organism_iri=eq.NCBITaxon%3A9606&disease_names=cs.%7Bhealthy%7D&order=structure_iri'


Let's take a peek at this result.

In [6]:
df = pd.json_normalize(result.json())
df

Unnamed: 0,tcell_id,tcell_iri,structure_id,structure_iri,linear_sequence,structure_type,structure_description,reference_id,reference_iri,reference_type,...,r_object_source_organism_iri,r_object_source_organism_name,e_related_object_type,curated_source_antigen.accession,curated_source_antigen.name,curated_source_antigen.iri,curated_source_antigen.starting_position,curated_source_antigen.ending_position,curated_source_antigen.source_organism_name,curated_source_antigen.source_organism_iri
0,2103329,IEDB_ASSAY:2103329,100725,IEDB_EPITOPE:100725,ALWMRLLPL,Linear peptide,ALWMRLLPL,1027381,IEDB_REFERENCE:1027381,Literature,...,,,,NP_000198.1,proinsulin precursor,GENPEPT:NP_000198.1,2.0,10.0,Homo sapiens (human),NCBITaxon:9606
1,1617672,IEDB_ASSAY:1617672,100725,IEDB_EPITOPE:100725,ALWMRLLPL,Linear peptide,ALWMRLLPL,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,P01308.1,Insulin precursor,UNIPROT:P01308.1,2.0,10.0,Homo sapiens (human),NCBITaxon:9606
2,1617680,IEDB_ASSAY:1617680,100843,IEDB_EPITOPE:100843,FLFAVGFYL,Linear peptide,FLFAVGFYL,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,NP_066999.1,islet-specific glucose-6-phosphatase-related p...,GENPEPT:NP_066999.1,215.0,223.0,Homo sapiens (human),NCBITaxon:9606
3,1617682,IEDB_ASSAY:1617682,100844,IEDB_EPITOPE:100844,FLIVLSVAL,Linear peptide,FLIVLSVAL,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,P10997.1,Islet amyloid polypeptide precursor,GENPEPT:P10997.1,9.0,17.0,Homo sapiens (human),NCBITaxon:9606
4,1617681,IEDB_ASSAY:1617681,100845,IEDB_EPITOPE:100845,FLWSVFMLI,Linear peptide,FLWSVFMLI,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,SRC244406,islet-specific glucose-6-phosphatase-related p...,ONTIE:0002097,,,Homo sapiens (human),NCBITaxon:9606
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,6354238,IEDB_ASSAY:6354238,956443,IEDB_EPITOPE:956443,RLARLALVL,Linear peptide,RLARLALVL,1036010,IEDB_REFERENCE:1036010,Literature,...,,,,Q13641.1,Trophoblast glycoprotein,UNIPROT:Q13641.1,17.0,25.0,Homo sapiens (human),NCBITaxon:9606
727,6352834,IEDB_ASSAY:6352834,956443,IEDB_EPITOPE:956443,RLARLALVL,Linear peptide,RLARLALVL,1036010,IEDB_REFERENCE:1036010,Literature,...,,,,Q13641.1,Trophoblast glycoprotein,UNIPROT:Q13641.1,17.0,25.0,Homo sapiens (human),NCBITaxon:9606
728,6352843,IEDB_ASSAY:6352843,956443,IEDB_EPITOPE:956443,RLARLALVL,Linear peptide,RLARLALVL,1036010,IEDB_REFERENCE:1036010,Literature,...,,,,Q13641.1,Trophoblast glycoprotein,UNIPROT:Q13641.1,17.0,25.0,Homo sapiens (human),NCBITaxon:9606
729,6352847,IEDB_ASSAY:6352847,956443,IEDB_EPITOPE:956443,RLARLALVL,Linear peptide,RLARLALVL,1036010,IEDB_REFERENCE:1036010,Literature,...,,,,Q13641.1,Trophoblast glycoprotein,UNIPROT:Q13641.1,17.0,25.0,Homo sapiens (human),NCBITaxon:9606


OK perfect, that matches the 731 assay records on the IEDB as of Oct 15 2021.  Now we just need to make this list unique by epitope.


In [7]:
df.drop_duplicates(subset = ['structure_iri'])

Unnamed: 0,tcell_id,tcell_iri,structure_id,structure_iri,linear_sequence,structure_type,structure_description,reference_id,reference_iri,reference_type,...,r_object_source_organism_iri,r_object_source_organism_name,e_related_object_type,curated_source_antigen.accession,curated_source_antigen.name,curated_source_antigen.iri,curated_source_antigen.starting_position,curated_source_antigen.ending_position,curated_source_antigen.source_organism_name,curated_source_antigen.source_organism_iri
0,2103329,IEDB_ASSAY:2103329,100725,IEDB_EPITOPE:100725,ALWMRLLPL,Linear peptide,ALWMRLLPL,1027381,IEDB_REFERENCE:1027381,Literature,...,,,,NP_000198.1,proinsulin precursor,GENPEPT:NP_000198.1,2.0,10.0,Homo sapiens (human),NCBITaxon:9606
2,1617680,IEDB_ASSAY:1617680,100843,IEDB_EPITOPE:100843,FLFAVGFYL,Linear peptide,FLFAVGFYL,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,NP_066999.1,islet-specific glucose-6-phosphatase-related p...,GENPEPT:NP_066999.1,215.0,223.0,Homo sapiens (human),NCBITaxon:9606
3,1617682,IEDB_ASSAY:1617682,100844,IEDB_EPITOPE:100844,FLIVLSVAL,Linear peptide,FLIVLSVAL,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,P10997.1,Islet amyloid polypeptide precursor,GENPEPT:P10997.1,9.0,17.0,Homo sapiens (human),NCBITaxon:9606
4,1617681,IEDB_ASSAY:1617681,100845,IEDB_EPITOPE:100845,FLWSVFMLI,Linear peptide,FLWSVFMLI,1007450,IEDB_REFERENCE:1007450,Literature,...,,,,SRC244406,islet-specific glucose-6-phosphatase-related p...,ONTIE:0002097,,,Homo sapiens (human),NCBITaxon:9606
5,2103326,IEDB_ASSAY:2103326,100882,IEDB_EPITOPE:100882,GIVEQCCTSI,Linear peptide,GIVEQCCTSI,1027381,IEDB_REFERENCE:1027381,Literature,...,,,,ABI63346.1,insulin,GENPEPT:ABI63346.1,78.0,87.0,Homo sapiens (human),NCBITaxon:9606
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
721,5635546,IEDB_ASSAY:5635546,952616,IEDB_EPITOPE:952616,YLCNQDVAFL,Linear peptide,YLCNQDVAFL,1035280,IEDB_REFERENCE:1035280,Literature,...,,,,550544195,pyridoxal-dependent decarboxylase domain-conta...,GENPEPT:550544195,189.0,198.0,Homo sapiens (human),NCBITaxon:9606
722,6338160,IEDB_ASSAY:6338160,952620,IEDB_EPITOPE:952620,YLQGQRLDNV,Linear peptide,YLQGQRLDNV,1035280,IEDB_REFERENCE:1035280,Literature,...,,,,13529158,Secretogranin V (7B2 protein) [Homo sapiens],GENPEPT:13529158,186.0,195.0,Homo sapiens (human),NCBITaxon:9606
723,5635521,IEDB_ASSAY:5635521,952623,IEDB_EPITOPE:952623,YMCTHRLLL,Linear peptide,YMCTHRLLL,1035280,IEDB_REFERENCE:1035280,Literature,...,,,,3297877,"GNAS1, partial [Homo sapiens]",GENPEPT:3297877,379.0,387.0,Homo sapiens (human),NCBITaxon:9606
724,5635523,IEDB_ASSAY:5635523,952624,IEDB_EPITOPE:952624,YMCTHRLLLL,Linear peptide,YMCTHRLLLL,1035280,IEDB_REFERENCE:1035280,Literature,...,,,,3297877,"GNAS1, partial [Homo sapiens]",GENPEPT:3297877,379.0,388.0,Homo sapiens (human),NCBITaxon:9606


And there we go...the 457 unique epitope IDs we were after.  Note that we've lost some of the assay information by collapsing the data this way as the 'drop_duplicates' command will simply make the list unique according to the 'structure_iri'.  So if multiple assays have the same structure_iri, only the first one will be kept.  Depending on what we want to do with the data downstream, maybe this is OK, maybe we would want to collapse the data in another way, maybe we want to take those structure_iris and pull the corresponding records from the epitope_search table, or maybe something else entirely.

Below we try to pull the corresponding structures from the epitope table, but we are running into technical issues that need to be worked out.  The recommendation would be do the initial query as above and then work with the dataset in your code.

### Pulling the corresponding structures from the epitope table - (not yet working)

Let's try pulling the structure IRIs and recovering the corresponding epitopes from the epitope table as an example.  First, we collapse all the structure IRIs into a list and create the search string:

In [8]:
all_structure_iris = ','.join(map(str,df['structure_iri'].to_list()))
all_structure_iris_search =  'in.(' + all_structure_iris + ')'


Now we can search:

In [9]:
search_params={ 
                'structure_iri': all_structure_iris_search,
                #'structure_iri': 'in.(' + all_structure_iris + ')',
                'order': 'structure_iri',
              }
table_name='epitope_search'
full_url=base_uri + '/' + table_name
result = requests.get(full_url, params=search_params)
print_curl_cmd(result)

curl -X 'GET' 'https://query-api.iedb.org/epitope_search?structure_iri=in.%28IEDB_EPITOPE%3A100725%2CIEDB_EPITOPE%3A100725%2CIEDB_EPITOPE%3A100843%2CIEDB_EPITOPE%3A100844%2CIEDB_EPITOPE%3A100845%2CIEDB_EPITOPE%3A100882%2CIEDB_EPITOPE%3A100888%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100920%2CIEDB_EPITOPE%3A100981%2CIEDB_EPITOPE%3A100981%2CIEDB_EPITOPE%3A100981%2CIEDB_EPITOPE%3A100981%2CIEDB_EPITOPE%3A101025%2CIEDB_EPITOPE%3A101026%2CIEDB_EPITOPE%3A101199%2CIEDB_EPITOPE%3A101199%2CIEDB_EPITOPE%3A101237%2CIEDB_EPITOPE%3A101245%2CIEDB_EPITOPE%3A101248%2CIEDB_EPITOPE%3A101248%2CIEDB_EPITOPE%3A101248%2CIEDB_EPITOPE%3A102306%2CIEDB_EPITOPE%3A102433%2CIEDB_EPITOPE%3A102515%2CIEDB_EPITOPE%3A102620%2CIEDB_EPITOPE%3A102769%2CIEDB_EPITOPE%3A102811%2CIEDB_EPITOPE%3A102844%2CIEDB_EPITOP

In [10]:
print(json.dumps(result.json()[:1], indent=4))
#df = pd.json_normalize(result.json())
#df

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hmmm...OK it looks like the string of IDs is too long and we're getting gatweay errors.  I'm not sure if this issue is at the web server or PostgresT level, but we will investigate.  It should also be noted that we could be doing better error/exception handling when we call the API.  We'll add a tutorial on that in the near future.