# Programmatic access to UniProt using Python

All code here is based on EMBL-EBI webinar of [Programmatic access to UniProt using Python](https://www.youtube.com/watch?v=-uoKtReLGVs) recorded on 17/11/2021 given by Aurélien Luciani. 

Link to [GoogleCollab](https://colab.research.google.com/drive/1SU3j4VmXHYrYxOlrdK_NCBhajXb0iGes#scrollTo=zvDpOjPwgQoP)

Description of webinar:
`UniProt is a comprehensive, expert-led, publicly available database of protein sequence, function and variation information.

This webinar will give an overview of programmatic access to the UniProt database using Python and cover key aspects of protein entry searches, data filtering, batch downloads and give examples of further processing of downloaded target data.

Following a brief introduction to UniProt services, where to find relevant documentation and help features, the webinar will focus on worked examples. These will include how to programmatically search and retrieve protein entries and sequences, within the results. We will then show how to align orthologous sequences and filter for features of interest, such as disease variant information.

The webinar will also cover programmatic examples of the UniProt Retrieve/ID mapping service, batch downloads, processing, and filtering data by annotation type.
`

## Ways to get UniProt data

* FTP: Big one-off download, post-processing needed
* API: Medium-size download, customisable. One-off, or workflow intefration, scripts, etc
* Website download: small one-off download, customisable 


## UniProt API

### Method 01: Simple

In [4]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

query = "P33527"

# helper function to download data

def get_url(url, **kwargs):
    response = requests.get(url, **kwargs);
    if not response.ok:
        print(response.text)
        response.raise_for_status()
        sys.exit()
    return response

In [8]:
r = get_url(f"{web_api}/uniprotkb/search?query={query}")

data = r.json()

n_results = len(data["results"])
print(f"Number of results: {n_results}\n")

Number of results: 1



In [9]:
for (key, value) in r.headers.items():
    print(f"{key}: {value}")

Server: nginx/1.17.7
Vary: Accept-Encoding, Accept, Accept-Encoding, X-UniProt-Release, X-API-Deployment-Date, User-Agent
Cache-Control: public, max-age=86400
x-cache: hit cached
Content-Type: application/json
Access-Control-Allow-Credentials: true
Content-Encoding: gzip
Access-Control-Expose-Headers: Link, X-Total-Results, X-UniProt-Release, X-UniProt-Release-Date, X-API-Deployment-Date
X-API-Deployment-Date: 01-September-2022
Strict-Transport-Security: max-age=31536000; includeSubDomains
Date: Thu, 08 Sep 2022 08:38:41 GMT
X-UniProt-Release: 2022_03
X-Total-Results: 1
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
Connection: keep-alive
Access-Control-Allow-Methods: GET, PUT, POST, DELETE, PATCH, OPTIONS
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization
X-UniProt-Release-Date: 03-August-2022


### Method 02: Complex with pages

In [16]:
import requests, sys, json

WEBSITE_API = "https://rest.uniprot.org"

query="parkin AND (taxonomy_id:9606)"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [19]:
r = get_url(f"{web_api}/uniprotkb/search?query={query}")
#print(f"{web_api}/uniprotkb/search?query={query}")
#WEBSITE_API = "https://rest.uniprot.org"
#r = get_url(f"{web_api}/uniprotkb/search?query=parkin AND (taxonomy_id:9606)")
data = r.json()

total = r.headers.get("x-total-records")
page_total = len(data["results"])
print(f"total: {total}; page total: {page_total}")

# print(r.headers["Link"])

# print(r.links["next"]["url"])

while r.links.get("next", {}).get("url"):
  r = get_url(r.links["next"]["url"])

  data = r.json()

  total = r.headers.get("x-total-records")
  page_total = len(data["results"])
  print(f"total: {total}; page total: {page_total}")

total: None; page total: 25
total: None; page total: 25
total: None; page total: 25
total: None; page total: 25
total: None; page total: 25
total: None; page total: 11


### Method 03: Complex with stream

In [47]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

q_name="parkin AND (taxonomy_id:9606)"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [48]:
# stream good for simplicity (no pagination), but...
#  - harder to follow progress
#  - harder to resume on failure
#  - not sorted by score
r = get_url(f"{web_api}/uniprotkb/stream?query={q_name}")

data = r.json()

total = len(data["results"])
print(f"total: {total}")

total: 136


### Method 04: Complex with other formats

In [78]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

q_name  = "parkin AND (taxonomy_id:9606)"


# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [79]:
# r = get_url(f"{web_api}/uniprotkb/search?query={q_name}&size=1")
# r = get_url(f"{web_api}/uniprotkb/search?query={q_name}&size=1&format=xml")
# r = get_url(f"{web_api}/uniprotkb/search?query={q_name}&size=1", headers={"Accept": "application/xml"})
# r = get_url(f"{web_api}/uniprotkb/search?query={q_name}", headers={"Accept": "text/plain; format=list"})
# r = get_url(f"{web_api}/uniprotkb/search?query={q_name}", headers={"Accept": "text/plain; format=fasta"})
r = get_url(f"{web_api}/uniprotkb/search?query={q_name}", headers={"Accept": "text/plain; format=tsv"})
print(r.text)

Entry	Entry Name	Reviewed	Protein names	Gene Names	Organism	Length
O60260	PRKN_HUMAN	reviewed	E3 ubiquitin-protein ligase parkin (Parkin) (EC 2.3.2.31) (Parkin RBR E3 ubiquitin-protein ligase) (Parkinson juvenile disease protein 2) (Parkinson disease protein 2)	PRKN PARK2	Homo sapiens (Human)	465
Q6NUN9	ZN746_HUMAN	reviewed	Zinc finger protein 746 (Parkin-interacting substrate) (PARIS)	ZNF746 PARIS	Homo sapiens (Human)	644
Q96M98	PACRG_HUMAN	reviewed	Parkin coregulated gene protein (Molecular chaperone/chaperonin-binding protein) (PARK2 coregulated gene protein)	PACRG GLUP	Homo sapiens (Human)	296
Q8IWT3	CUL9_HUMAN	reviewed	Cullin-9 (CUL-9) (UbcH7-associated protein 1) (p53-associated parkin-like cytoplasmic protein)	CUL9 H7AP1 KIAA0708 PARC	Homo sapiens (Human)	2517
O15354	GPR37_HUMAN	reviewed	Prosaposin receptor GPR37 (Endothelin B receptor-like protein 1) (ETBR-LP-1) (G-protein coupled receptor 37) (Parkin-associated endothelin receptor-like receptor) (PAELR)	GPR37	Homo sapiens (Hum

### Method 05: Complex with custom columns

In [64]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

q_name  = "parkin AND (taxonomy_id:9606)"
q_field = "&fields=id,accession,length,ft_site"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [65]:
# Customise column choice
r = get_url(f"{web_api}/uniprotkb/search?query={q_name}", headers={"Accept": "text/plain; format=tsv"})
print(r.text)

Entry Name	Entry	Length	Site
PRKN_HUMAN	O60260	465	
ZN746_HUMAN	Q6NUN9	644	
PACRG_HUMAN	Q96M98	296	
CUL9_HUMAN	Q8IWT3	2517	
GPR37_HUMAN	O15354	613	
X5DR79_HUMAN	X5DR79	465	
UBP30_HUMAN	Q70CQ3	517	
PINK1_HUMAN	Q9BXM7	581	
MUL1_HUMAN	Q969V5	352	
UBP15_HUMAN	Q9Y4E8	981	
PACRL_HUMAN	Q8N7B6	248	
UBB_HUMAN	P0CG47	229	SITE 54; /note="Interacts with activating enzyme"; SITE 68; /note="Essential for function"; SITE 72; /note="Interacts with activating enzyme"
RNF41_HUMAN	Q9H4P4	317	
RS27A_HUMAN	P62979	156	SITE 54; /note="Interacts with activating enzyme"; SITE 68; /note="Essential for function"; SITE 72; /note="Interacts with activating enzyme"
RL40_HUMAN	P62987	128	SITE 54; /note="Interacts with activating enzyme"; SITE 68; /note="Essential for function"; SITE 72; /note="Interacts with activating enzyme"
PHB2_HUMAN	Q99623	299	
ARI1_HUMAN	Q9Y4X5	557	
PARL_HUMAN	Q9H300	379	
PDCD2_HUMAN	Q16342	344	
BAG5_HUMAN	Q9UL15	447	
TDRKH_HUMAN	Q9Y2W6	561	
MPPB_HUMAN	O75439	489	SITE 191; /note="Required for 

### Method 06: Search for Single Entry

In [66]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

q_name="O60260?fields=cc_function"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [54]:
# all of the entry
# r = get_url(f"{WEBSITE_API}/uniprotkb/O60260")
# only the function comments
r = get_url(f"{q_name}/uniprotkb/{query}")
print(json.dumps(r.json(), indent=2))

{
  "primaryAccession": "O60260",
  "comments": [
    {
      "texts": [
        {
          "evidences": [
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "10888878"
            },
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "10973942"
            },
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "11431533"
            },
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "11439185"
            },
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "11590439"
            },
            {
              "evidenceCode": "ECO:0000269",
              "source": "PubMed",
              "id": "12150907"
            },
            {
              "evidenceCode": "ECO:0000269",

### Method 07: Search for All Isoforms of an Entry

In [67]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

q_name   = "P17861&includeIsoform=true"
q_field  = "&fields=accession,cc_function,cc_subcellular_location,cc_ptm,sequence"
q_format = "&format=tsv"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

In [68]:
# isoform info for PRKN_HUMAN (but not interesting to see)
# r = get_url(f"{WEBSITE_API}/uniprotkb/search?query=O60260&fields=accession&includeIsoform=true&fields=cc_function,cc_subcellular_location,cc_ptm,sequence&format=tsv")
# isoform info for XBP1_HUMAN
r = get_url(f"{web_api}/uniprotkb/search?query={q_name}{q_field}{q_format}")

print(r.text)

Entry	Function [CC]	Subcellular location [CC]	Post-translational modification	Sequence
P17861	FUNCTION: Functions as a transcription factor during endoplasmic reticulum (ER) stress by regulating the unfolded protein response (UPR). Required for cardiac myogenesis and hepatogenesis during embryonic development, and the development of secretory tissues such as exocrine pancreas and salivary gland (By similarity). Involved in terminal differentiation of B lymphocytes to plasma cells and production of immunoglobulins (PubMed:11460154). Modulates the cellular response to ER stress in a PIK3R-dependent manner (PubMed:20348923). Binds to the cis-acting X box present in the promoter regions of major histocompatibility complex class II genes (PubMed:8349596). Involved in VEGF-induced endothelial cell (EC) proliferation and retinal blood vessel formation during embryonic development but also for angiogenesis in adult tissues under ischemic conditions. Functions also as a major regulator of the U

### Method 08: Multiple Entries Through Endpoint, Get Natural Variants & Compare Sequences

In [114]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

# manually selected ones
accessions = ["O60260", "Q7KTX7", "Q9WVS6", "Q9JK66"]
q_name     = ",".join(accessions)
q_format   = "&format=fasta"

# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

O60260,Q7KTX7,Q9WVS6,Q9JK66


In [70]:
# get the natural variants information
# r = get_url(f"{WEBSITE_API}/uniprotkb/accessions?accessions={joined}&fields=ft_variant,organism_name")
# print(json.dumps(r.json(), indent=2))

# get FASTA of these entries
r = get_url(f"{web_api}/uniprotkb/accessions?accessions={q_name}{q_format}")
fasta = r.text
print(fasta)

>sp|O60260|PRKN_HUMAN E3 ubiquitin-protein ligase parkin OS=Homo sapiens OX=9606 GN=PRKN PE=1 SV=2
MIVFVRFNSSHGFPVEVDSDTSIFQLKEVVAKRQGVPADQLRVIFAGKELRNDWTVQNCD
LDQQSIVHIVQRPWRKGQEMNATGGDDPRNAAGGCEREPQSLTRVDLSSSVLPGDSVGLA
VILHTDSRKDSPPAGSPAGRSIYNSFYVYCKGPCQRVQPGKLRVQCSTCRQATLTLTQGP
SCWDDVLIPNRMSGECQSPHCPGTSAEFFFKCGAHPTSDKETSVALHLIATNSRNITCIT
CTDVRSPVLVFQCNSRHVICLDCFHLYCVTRLNDRQFVHDPQLGYSLPCVAGCPNSLIKE
LHHFRILGEEQYNRYQQYGAEECVLQMGGVLCPRPGCGAGLLPEPDQRKVTCEGGNGLGC
GFAFCRECKEAYHEGECSAVFEASGTTTQAYRVDERAAEQARWEAASKETIKKTTKPCPR
CHVPVEKNGGCMHMKCPQPQCRLEWCWNCGCEWNRVCMGDHWFDV
>sp|Q7KTX7|PRKN_DROME E3 ubiquitin-protein ligase parkin OS=Drosophila melanogaster OX=7227 GN=park PE=1 SV=1
MSFIFKFIATFVRKMLELLQFGGKTLTHTLSIYVKTNTGKTLTVNLEPQWDIKNVKELVA
PQLGLQPDDLKIIFAGKELSDATTIEQCDLGQQSVLHAIRLRPPVQRQKIQSATLEEEEP
SLSDEASKPLNETLLDLQLESEERLNITDEERVRAKAHFFVHCSQCDKLCNGKLRVRCAL
CKGGAFTVHRDPECWDDVLKSRRIPGHCESLEVACVDNAAGDPPFAEFFFKCAEHVSGGE
KDFAAPLNLIKNNIKNVPCLACTDVSDTVLVFPCASQHVTCIDCFRHYCRSRLGERQFMP
HPDFGYTLPCPAG

## Clustalo

In [57]:
# submit align job using clustalo
r = requests.post("https://www.ebi.ac.uk/Tools/services/rest/clustalo/run", data={
    "email": "example@example.com",
    "iterations": 0,
    "outfmt": "clustal_num",
    "order": "aligned",
    "sequence": fasta
})

job_id = r.text
print(job_id)

# get job status
r = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}")
print(r.text)

clustalo-R20220908-110612-0888-61259205-p1m
FINISHED


In [58]:
# Update Job status
r = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}")
print(r.text)

FINISHED


In [59]:
# Result of clustal alignment
r = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num")
print(r.text)

CLUSTAL O(1.2.4) multiple sequence alignment


sp|Q7KTX7|PRKN_DROME      MSFIFKFIATFVRKMLELLQFGGKTLTHTLSIYVKTNTGKTLTVNLEPQWDIKNVKELVA	60
sp|O60260|PRKN_HUMAN      -----------------------------MIVFVRFNSSHGFPVEVDSDTSIFQLKEVVA	31
sp|Q9WVS6|PRKN_MOUSE      -----------------------------MIVFVRFNSSYGFPVEVDSDTSILQLKEVVA	31
sp|Q9JK66|PRKN_RAT        -----------------------------MIVFVRFNSSYGFPVEVDSDTSIFQLKEVVA	31
                                                       : ::*: *:.  : *::: : .* ::**:**

sp|Q7KTX7|PRKN_DROME      PQLGLQPDDLKIIFAGKELSDATTIEQCDLGQQSVLHAIRLRPPVQRQKIQSATLEEEEP	120
sp|O60260|PRKN_HUMAN      KRQGVPADQLRVIFAGKELRNDWTVQNCDLDQQSIVHIVQRP-WRKGQEMNAT--GGDDP	88
sp|Q9WVS6|PRKN_MOUSE      KRQGVPADQLRVIFAGKELPNHLTVQNCDLEQQSIVHIVQRP-RRRSHETNAS--GGDEP	88
sp|Q9JK66|PRKN_RAT        KRQGVPADQLRVIFAGKELQNHLTVQNCDLEQQSIVHIVQRP-QRKSHETNAS--GGDKP	88
                           : *:  *:*::******* :  *:::*** ***::* ::     : :: :::    :.*

sp|Q7KTX7|PRKN_DROME      SLSDEA--SKPL--------------NETL

In [60]:
# ID mapping flow

# Search through all mammalia
r = get_url(f"{WEBSITE_API}/uniprotkb/stream?query=parkin AND (taxonomy_id:40674)", headers={"Accept": "text/list"})
accessions = r.text.replace("\n", ",").strip()

print("accessions:", accessions)

# Send job to ID mapping endpoint
r = requests.post(f"{WEBSITE_API}/idmapping/run", data={"from": "UniProtKB_AC-ID", "to": "ChEMBL", "ids": accessions})
job_id = r.json()['jobId']

print("job ID:", job_id)

r = get_url(f"{WEBSITE_API}/idmapping/status/{job_id}")
print(json.dumps(r.json(), indent=2))

{"url":"http://rest.uniprot.org/uniprotkb/stream","messages":["Invalid request received. Requested media type/format not accepted: 'text/list'."]}


HTTPError: 400 Client Error: Bad Request for url: https://rest.uniprot.org/uniprotkb/stream?query=parkin%20AND%20(taxonomy_id:40674)

## ID Mapping flow

In [None]:
"""# Search through all mammalia
#r = get_url(f"{web_api}/uniprotkb/stream?query=parkin AND (taxonomy_id:40674)", headers={"Accept": "text/list"})
query = "P33527"
r = get_url(f"{web_api}/uniprotkb/stream?query={query}", headers={"Accept": "text/list"})
accessions = r.text.replace("\n", ",").strip()

print("accessions:", accessions)

# Send job to ID mapping endpoint
r = requests.post(f"{web_api}/idmapping/run", data={"from": "UniProtKB_AC-ID", "to": "ChEMBL", "ids": accessions})
job_id = r.json()['jobId']

print("job ID:", job_id)

r = get_url(f"{web_api}/idmapping/status/{job_id}")
print(json.dumps(r.json(), indent=2))"""

'# Search through all mammalia\n#r = get_url(f"{web_api}/uniprotkb/stream?query=parkin AND (taxonomy_id:40674)", headers={"Accept": "text/list"})\nquery = "P33527"\nr = get_url(f"{web_api}/uniprotkb/stream?query={query}", headers={"Accept": "text/list"})\naccessions = r.text.replace("\n", ",").strip()\n\nprint("accessions:", accessions)\n\n# Send job to ID mapping endpoint\nr = requests.post(f"{web_api}/idmapping/run", data={"from": "UniProtKB_AC-ID", "to": "ChEMBL", "ids": accessions})\njob_id = r.json()[\'jobId\']\n\nprint("job ID:", job_id)\n\nr = get_url(f"{web_api}/idmapping/status/{job_id}")\nprint(json.dumps(r.json(), indent=2))'

## Visual Exercise

In [None]:
query  = "parkin"
fields = "&fields=mass,reviewed,length,cc_function,annotation_score"

r = get_url(f"{web_api}/uniprotkb/stream?query={query}{fields}")
data = r.json()

print(len(data["results"]), data["results"][0])

175212 {'entryType': 'UniProtKB reviewed (Swiss-Prot)', 'primaryAccession': 'A0A078CGE6', 'annotationScore': 5.0, 'comments': [{'texts': [{'evidences': [{'evidenceCode': 'ECO:0000250', 'source': 'UniProtKB', 'id': 'Q9LJD8'}, {'evidenceCode': 'ECO:0000269', 'source': 'PubMed', 'id': '11489177'}], 'value': 'Serine/threonine-protein kinase involved in the spatial and temporal control system organizing cortical activities in mitotic and postmitotic cells (PubMed:11489177). Required for the normal functioning of the plasma membrane in developing pollen. Involved in the regulation of cell expansion and embryo development (By similarity)'}], 'commentType': 'FUNCTION'}], 'sequence': {'length': 1299, 'molWeight': 143609}}


In [None]:
import matplotlib.pyplot as plt

reviewed = ["grey" if "unreviewed" in entry["entryType"] else "gold" for entry in data["results"]]

mass = [entry["sequence"]["molWeight"] for entry in data["results"]]
plt.hist(mass, bins=100)
plt.show()

length = [entry["sequence"]["length"] for entry in data["results"]]
plt.hist(length, bins=100)
plt.show()

n_function = [len(entry.get("comments", [])) for entry in data["results"]]
plt.hist(n_function, bins=100)
plt.show()

score = [entry["annotationScore"] for entry in data["results"]]
plt.hist(score, bins=100)
plt.show()

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
plt.scatter(length, mass, alpha=0.3)
plt.xlabel("length (aa)")
plt.ylabel("mass (Da)")
plt.show()

plt.scatter(length, n_function, c=reviewed, alpha=0.3)
plt.xlabel("length (aa)")
plt.ylabel("# function comments")
plt.show()

plt.scatter(score, n_function, c=reviewed, alpha=0.3)
plt.xlabel("annotation score")
plt.ylabel("# function comments")
plt.show()

# Test Example

Retrieve ID & Name of List of Proteins ! 

In [150]:
import requests, sys, json

web_api = "https://rest.uniprot.org"

#accessions = ["P33527", "P12337", "P14174","P25024","P25025"]
accessions = ['P14174', 'P33527', 'P12337', 'P14174', 'P25024', 'P25025']
a_name_1     = ",".join(accessions)
print(q_name_1)
a_name_2  = "P12337"
#q_field = "&fields=id,accession,length,protein_name,reviewed,gene,organism"
a_field=""
#q_field = "&fields=id,accession,length,protein_name,organism_name"
# Helper function to download data
def get_url(url, **kwargs):
  response = requests.get(url, **kwargs);

  if not response.ok:
    print(response.text)
    response.raise_for_status()
    sys.exit()

  return response

P33527,P12337,P14174,P25024,P25025


In [151]:
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name}{a_field}&size=1")
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name}{a_field}&size=1&format=xml")
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name}{a_field}&size=1", headers={"Accept": "application/xml"})
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name}{a_field}", headers={"Accept": "text/plain; format=list"})
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name}{a_field}", headers={"Accept": "text/plain; format=fasta"})
r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name_1}{a_field}", headers={"Accept": "text/plain; format=tsv"})
print(r.text)

Entry	Entry Name	Reviewed	Protein names	Gene Names	Organism	Length
P33527	MRP1_HUMAN	reviewed	Multidrug resistance-associated protein 1 (EC 7.6.2.2) (ATP-binding cassette sub-family C member 1) (Glutathione-S-conjugate-translocating ATPase ABCC1) (EC 7.6.2.3) (Leukotriene C(4) transporter) (LTC4 transporter)	ABCC1 MRP MRP1	Homo sapiens (Human)	1531
P12337	EST1_RABIT	reviewed	Liver carboxylesterase 1 (EC 3.1.1.1) (Acyl-coenzyme A:cholesterol acyltransferase)		Oryctolagus cuniculus (Rabbit)	565
P14174	MIF_HUMAN	reviewed	Macrophage migration inhibitory factor (MIF) (EC 5.3.2.1) (Glycosylation-inhibiting factor) (GIF) (L-dopachrome isomerase) (L-dopachrome tautomerase) (EC 5.3.3.12) (Phenylpyruvate tautomerase)	MIF GLIF MMIF	Homo sapiens (Human)	115
P25024	CXCR1_HUMAN	reviewed	C-X-C chemokine receptor type 1 (CXC-R1) (CXCR-1) (CDw128a) (High affinity interleukin-8 receptor A) (IL-8R A) (IL-8 receptor type 1) (CD antigen CD181)	CXCR1 CMKAR1 IL8RA	Homo sapiens (Human)	350
P25025	CXCR2_HUMAN	

In [148]:
r = get_url(f"{web_api}/uniprotkb/accessions?accessions={a_name_2}{q_field}", headers={"Accept": "text/plain; format=tsv"})
print(r.text)

Entry	Entry Name	Reviewed	Protein names	Gene Names	Organism	Length
P12337	EST1_RABIT	reviewed	Liver carboxylesterase 1 (EC 3.1.1.1) (Acyl-coenzyme A:cholesterol acyltransferase)		Oryctolagus cuniculus (Rabbit)	565



In [149]:
# manually selected ones
accessions = ["O60260", "Q7KTX7", "Q9WVS6", "Q9JK66"]
accessions = ["P33527", "P12337", "P14174","P25024","P25025"]
joined = ",".join(accessions)

# get the natural variants information
# r = get_url(f"{web_api}/uniprotkb/accessions?accessions={joined}&fields=ft_variant,organism_name")
# print(json.dumps(r.json(), indent=2))

# get FASTA of these entries
r = get_url(f"{web_api}/uniprotkb/accessions?accessions={joined}")
fasta = r.text
print(fasta)

{"results":[{"entryType":"UniProtKB reviewed (Swiss-Prot)","primaryAccession":"P33527","secondaryAccessions":["A3RJX2","C9JPJ4","O14819","O43333","P78419","Q59GI9","Q9UQ97","Q9UQ99","Q9UQA0"],"uniProtkbId":"MRP1_HUMAN","entryAudit":{"firstPublicDate":"1994-02-01","lastAnnotationUpdateDate":"2022-08-03","lastSequenceUpdateDate":"2010-05-18","entryVersion":218,"sequenceVersion":3},"annotationScore":5.0,"organism":{"scientificName":"Homo sapiens","commonName":"Human","taxonId":9606,"lineage":["Eukaryota","Metazoa","Chordata","Craniata","Vertebrata","Euteleostomi","Mammalia","Eutheria","Euarchontoglires","Primates","Haplorrhini","Catarrhini","Hominidae","Homo"]},"proteinExistence":"1: Evidence at protein level","proteinDescription":{"recommendedName":{"fullName":{"evidences":[{"evidenceCode":"ECO:0000305"}],"value":"Multidrug resistance-associated protein 1"},"ecNumbers":[{"evidences":[{"evidenceCode":"ECO:0000269","source":"PubMed","id":"9281595"}],"value":"7.6.2.2"}]},"alternativeNames":