---
title: "JSON-JavaScript Object Notation"
author:
  - name: 
      given: Chase
      family: Clark
      non-dropping-particle: M
    roles: [original draft, review & editing]
    url: 
    affiliation: EVOQUANT LLC
    orcid: 0000-0001-6439-9397
categories: [beginner, python]
date: "2025-07-22"
description: ""
draft: false
appendix-cite-as: display
funding: "The author(s) received no funding for this work."
citation: true
execute:
  freeze: true
---

```sh

curl -sX 'GET' \
  'https://rest.uniprot.org/uniprotkb/P05067?fields=accession%2Cprotein_name%2Ccc_function' \
  -H 'accept: application/json' |\
jq |\
less

```

>JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. (https://www.json.org/json-en.html)

"UniProt is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information."

UniProt provides [several different ways](https://www.uniprot.org/help/programmatic_access) to access its data, including the web interface you might be familiar with using, a REST API, a SPARQL endpoint, and a Java API. In this tutorial, we will focus on the [REST API](https://www.uniprot.org/api-documentation/uniprotkb), which allows us to retrieve data in JSON format.

In addition to the APIs, UniProt also provides a bulk download service, which allows you to download large datasets via their FTP server. 

For this tutorial we will look at the specific REST API endpoint specified here: https://www.uniprot.org/api-documentation/uniprotkb#operations-UniProtKB-getByAccession


The **domain** of this API is `rest.uniprot.org`, and the **path** is `/uniprotkb/{accession}`. The `{accession}` part of the path is a placeholder for the accession number of the protein you want to retrieve. For example, if you wanted to retrieve information about the protein with accession number `P05067`, you would use the following URL: `https://rest.uniprot.org/uniprotkb/P05067`. Additionally, you can specify which fields you want to retrieve by using the `fields` query parameter. For example, if you wanted to retrieve the accession number, protein name, and functions of the protein with accession number `P05067`, you would use the following URL: `https://rest.uniprot.org/uniprotkb/P05067?fields=accession,protein_name,cc_function`.

REST APIs use HTTP URLs to access resources. This means URLs must be properly formatted and encoded, as they cannot contain spaces or certain special characters. For example, spaces are typically encoded as %20. Ensuring correct URL formatting is essential for the API to correctly interpret the request.

In [None]:
import requests

print("I'm a string that would break a url if not encoded properly")

I'm a string that would break a url if not encoded properly


In [None]:
temp_variable = requests.utils.quote("I'm a string that would break a url if not encoded properly")
print(temp_variable)

I%27m%20a%20string%20that%20would%20break%20a%20url%20if%20not%20encoded%20properly


In [None]:
temp = requests.utils.unquote(temp)
print(temp)

I'm a string that would break a url if not encoded properly


In [None]:
import requests

# Define the base URL for the UniProt REST API
rest_domain = "https://rest.uniprot.org"

# Define the specific endpoint for retrieving protein information by accession number
endpoint = "/uniprotkb/{accession}"

# Define the accession number for the protein of interest
accession = "P05067"

# Define the fields to retrieve from the API (comma-separated)
fields = "accession,protein_name,cc_function"

# URL-encode the fields to ensure proper formatting (no illegal characters)
encoded_fields = requests.utils.quote(fields)

# Paste together the full URL with parameters
url = f"{rest_domain}{endpoint.format(accession=accession)}?fields={encoded_fields}"


In [None]:
url

'https://rest.uniprot.org/uniprotkb/P05067?fields=accession%2Cprotein_name%2Ccc_function'

In [None]:
# Send the GET request to the API
response = requests.get(url)

Check if the request was successful (status code 200). There are several ways to do this.

In [None]:
# The response object will auto-print the status to the console
response

<Response [200]>

In [None]:
response.status_code

200

In [None]:
response.ok

True

You can get the raw data that the API returned

In [None]:
response.text

'{"entryType":"UniProtKB reviewed (Swiss-Prot)","primaryAccession":"P05067","proteinDescription":{"recommendedName":{"fullName":{"evidences":[{"evidenceCode":"ECO:0000312","source":"HGNC","id":"HGNC:620"}],"value":"Amyloid-beta precursor protein"},"shortNames":[{"evidences":[{"evidenceCode":"ECO:0000312","source":"HGNC","id":"HGNC:620"}],"value":"APP"}]},"alternativeNames":[{"fullName":{"value":"ABPP"}},{"fullName":{"value":"APPI"}},{"fullName":{"value":"Alzheimer disease amyloid A4 protein homolog"}},{"fullName":{"value":"Alzheimer disease amyloid protein"}},{"fullName":{"evidences":[{"evidenceCode":"ECO:0000305"}],"value":"Amyloid precursor protein"}},{"fullName":{"evidences":[{"evidenceCode":"ECO:0000250","source":"UniProtKB","id":"P12023"}],"value":"Amyloid-beta (A4) precursor protein"}},{"fullName":{"value":"Amyloid-beta A4 protein"}},{"fullName":{"value":"Cerebral vascular amyloid peptide"},"shortNames":[{"value":"CVAP"}]},{"fullName":{"value":"PreA4"}},{"fullName":{"value":"Prot

If you know the response is supposed to be JSON the response object has a built-in method to parse the JSON for you.

In [None]:
response.json

<bound method Response.json of <Response [200]>>

In [None]:
response.json()

{'entryType': 'UniProtKB reviewed (Swiss-Prot)',
 'primaryAccession': 'P05067',
 'proteinDescription': {'recommendedName': {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000312',
      'source': 'HGNC',
      'id': 'HGNC:620'}],
    'value': 'Amyloid-beta precursor protein'},
   'shortNames': [{'evidences': [{'evidenceCode': 'ECO:0000312',
       'source': 'HGNC',
       'id': 'HGNC:620'}],
     'value': 'APP'}]},
  'alternativeNames': [{'fullName': {'value': 'ABPP'}},
   {'fullName': {'value': 'APPI'}},
   {'fullName': {'value': 'Alzheimer disease amyloid A4 protein homolog'}},
   {'fullName': {'value': 'Alzheimer disease amyloid protein'}},
   {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000305'}],
     'value': 'Amyloid precursor protein'}},
   {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000250',
       'source': 'UniProtKB',
       'id': 'P12023'}],
     'value': 'Amyloid-beta (A4) precursor protein'}},
   {'fullName': {'value': 'Amyloid-beta A4 protein'}},
   {'fu

In [None]:
json_data = response.json()

The parsed JSON will be some combination of Python dictionaries and lists, depending on the structure of the JSON data. You can access the data using standard Python dictionary and list syntax.

In [None]:
type(json_data)

dict

In [None]:
json_data.keys()

dict_keys(['entryType', 'primaryAccession', 'proteinDescription', 'comments', 'extraAttributes'])

In [None]:
json_data["primaryAccession"]

'P05067'

In [None]:
json_data["proteinDescription"]

{'recommendedName': {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000312',
     'source': 'HGNC',
     'id': 'HGNC:620'}],
   'value': 'Amyloid-beta precursor protein'},
  'shortNames': [{'evidences': [{'evidenceCode': 'ECO:0000312',
      'source': 'HGNC',
      'id': 'HGNC:620'}],
    'value': 'APP'}]},
 'alternativeNames': [{'fullName': {'value': 'ABPP'}},
  {'fullName': {'value': 'APPI'}},
  {'fullName': {'value': 'Alzheimer disease amyloid A4 protein homolog'}},
  {'fullName': {'value': 'Alzheimer disease amyloid protein'}},
  {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000305'}],
    'value': 'Amyloid precursor protein'}},
  {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000250',
      'source': 'UniProtKB',
      'id': 'P12023'}],
    'value': 'Amyloid-beta (A4) precursor protein'}},
  {'fullName': {'value': 'Amyloid-beta A4 protein'}},
  {'fullName': {'value': 'Cerebral vascular amyloid peptide'},
   'shortNames': [{'value': 'CVAP'}]},
  {'fullName': {'value': '

In [None]:
json_data['proteinDescription']['alternativeNames']

[{'fullName': {'value': 'ABPP'}},
 {'fullName': {'value': 'APPI'}},
 {'fullName': {'value': 'Alzheimer disease amyloid A4 protein homolog'}},
 {'fullName': {'value': 'Alzheimer disease amyloid protein'}},
 {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000305'}],
   'value': 'Amyloid precursor protein'}},
 {'fullName': {'evidences': [{'evidenceCode': 'ECO:0000250',
     'source': 'UniProtKB',
     'id': 'P12023'}],
   'value': 'Amyloid-beta (A4) precursor protein'}},
 {'fullName': {'value': 'Amyloid-beta A4 protein'}},
 {'fullName': {'value': 'Cerebral vascular amyloid peptide'},
  'shortNames': [{'value': 'CVAP'}]},
 {'fullName': {'value': 'PreA4'}},
 {'fullName': {'value': 'Protease nexin-II'},
  'shortNames': [{'value': 'PN-II'}]}]