# Example notebook for using the PDB and BMRB APIs for structural biology data science applications

## Introduction
This notebook is designed to walk through some sample queries of both the PDB and BMRB in order to correlate NMR parameters with structure.  It is hoped that this will give some guidance as to the utility of the wwPDB API's as well as to an overall strategy of how to gather data from the different databases.  This is not intended to be a tutorial on Python and no claim is made about the efficiency or correctness of the code.

## Research Problem
For this example we will explore vicinal disulfide bonds in proteins - disulfide bonds between adjacent cysteines in a protein.  Vicinal disulfide bonds are rare in nature but can be biologically important<sup>1</sup>. As the protein backbone is strained from such a linkage, the hypothetical research question for this notebook is whether there are any abnormal NMR chemical shifts associated with such a structure. ![Vicinal Image](vicinal.png)

**Figure 1.** This illustration shows a comparison of the abnormal dihedral angles observed for vicinal disulfides (right).  This figure is from the poster presented at the 46th Experimental NMR Conference in Providence, RI. Susan Fox-Erlich, Heidi J.C. Ellis, Timothy O. Martyn, & Michael R. Gryk. (2005) StrucCheck: a JAVA Application to Derive Geometric Attributes from Arbitrary Subsets of Spatial Coordinates Obtained from the PDB.

<sup>1</sup>Xiu-hong Wang, Mark Connor, Ross Smith, Mark W. Maciejewski, Merlin E.H. Howden, Graham M. Nicholson, Macdonald J. Christie & Glenn F. King. Discovery and characterization of a family of insecticidal neurotoxins with a rare vicinal disulfide bridge. *Nat Struct Mol Biol* **7**, 505–513 (2000). https://www.nature.com/articles/nsb0600_505 https://doi.org/10.1038/nsb0600_505

## Strategy
Our overall strategy will be to query the RCSB PDB for all entries which have vicinal disulfide bonds. We will then cross-reference those entries with the BMRB in order to get available chemical shifts. Since we are interested in NMR chemical shifts, when we query the PDB it will be useful to limit our search to structures determined by NMR.

First we need to install and import the REST module which will be required for the PDB and BMRB.

https://www.rcsb.org/pages/webservices
https://github.com/uwbmrb/BMRB-API

In [1]:
%%capture
!pip install requests;

In [2]:
import requests

#### Building the PDB Query (Search API)

In order to find all PDB entries with vicinal disulfides, we will first search for all entries with at least one disulfide bond.  This is the disulfide_filter portion of the query.
In addition, as we are interested in the chemical shifts for vicinal disulfides, we will also restrict the results to only solution NMR studies.
Finally, as this is an example for illustration purposes and we want to keep the number of results small, we will further restrict the results to stuctures determined by Glenn King.  Hi Glenn!

This section makes use of the Search API at PDB.  Later, we will use the Data API.

In [3]:
pdbAPI = "https://search.rcsb.org/rcsbsearch/v1/query?json="
disulfide_filter = '{"type": "terminal", "service": "text", "parameters": {"operator": "greater_or_equal", "value": 1, "attribute": "rcsb_entry_info.disulfide_bond_count"}}'
NMR_filter = '{"type": "terminal", "service": "text", "parameters": {"operator": "exact_match", "value": "NMR", "attribute": "rcsb_entry_info.experimental_method"}}'
GK_filter = '{"type": "terminal", "service": "text", "parameters": {"operator": "exact_match", "value": "King, G.F.", "attribute": "audit_author.name"}}'

Now we can combine these three filters together using AND

In [4]:
filters = '{"type": "group", "logical_operator": "and", "nodes": [' + disulfide_filter + ',' + NMR_filter + ',' + GK_filter + ']}'

And add the return information.  *Note that we are specifying the polymer_instance ID's as that is where the disulfide bonds are noted.*

In [5]:
full_query = '{"query": ' + filters + ', "request_options": {"return_all_hits": true}, "return_type": "polymer_instance"}'

And finally submit the requst to the PDB.  The response should be 200 if the query was successful.

In [6]:
response = requests.get(pdbAPI + full_query)
print(response) # should return 200

<Response [200]>


In [7]:
print(type(response.json()))
#print(response.json())  #uncomment this line if you want to see the results

<class 'dict'>


Next we will extract just the PDB codes from our results and build a list.

In [8]:
pdb_results = response.json()

In [9]:
pdb_list = []
for x in pdb_results['result_set']:
    pdb_list.append (x['identifier'])
print (pdb_list)

['2MFA.A', '1G9P.A', '2MI5.A', '1VTX.A', '6OHX.A', '6AZA.A', '2MPQ.A', '2N1N.A', '2MUB.A', '2N6R.A', '2N6O.A', '2M36.A', '2MUN.A', '2N6N.A', '2KNI.A', '2N8F.A', '1C4E.A', '2M35.A', '6BA3.A', '2MT7.A', '2M6J.A', '2N8K.A', '1JUN.A', '1JUN.B', '6V6T.A', '2KSL.A', '1B8W.A', '1AXH.A', '1HP3.A', '2MF3.A', '6MZT.A', '2NBC.A', '1HVW.A', '2KYJ.A', '1DL0.A', '5WLX.A']


## PDB Data API
The basics of the data API are illustrated with this link:

https://data.rcsb.org/rest/v1/core/polymer_entity_instance/1DL0/A

This illustrates the REST query string, as well as how we need to append the PDB entry ID and polymer instance to the end.

In [10]:
data_query_base = "https://data.rcsb.org/rest/v1/core/polymer_entity_instance/"

In [11]:
def swapSymbols(iter):
    return iter.replace(".","/")
pdb_list2 = list(map(swapSymbols,pdb_list))
print(pdb_list2)

['2MFA/A', '1G9P/A', '2MI5/A', '1VTX/A', '6OHX/A', '6AZA/A', '2MPQ/A', '2N1N/A', '2MUB/A', '2N6R/A', '2N6O/A', '2M36/A', '2MUN/A', '2N6N/A', '2KNI/A', '2N8F/A', '1C4E/A', '2M35/A', '6BA3/A', '2MT7/A', '2M6J/A', '2N8K/A', '1JUN/A', '1JUN/B', '6V6T/A', '2KSL/A', '1B8W/A', '1AXH/A', '1HP3/A', '2MF3/A', '6MZT/A', '2NBC/A', '1HVW/A', '2KYJ/A', '1DL0/A', '5WLX/A']


Now we can loop through each PDB entry and request the polymer_entity_instance information. We will only care about disulfide bridges of adjacent residues

In [12]:
data_response = requests.get(data_query_base + "1DL0/A")
print(data_response) # should return 200

<Response [200]>


In [13]:
vds_list = []
for instance in pdb_list2:
    data_response = requests.get(data_query_base + instance)
    if data_response.status_code == 200:
        data_result = data_response.json()
        for x in data_result['rcsb_polymer_struct_conn']:
            if (x['connect_type'] == 'disulfide bridge' and x['connect_partner']['label_seq_id']-x['connect_target']['label_seq_id']==1):
                vds_list.append (data_result['rcsb_polymer_entity_instance_container_identifiers']['entry_id'])
print(vds_list)

['1DL0']


Our list is small (intentionally) but we can now use it to fetch chemical shifts from the BMRB.

## BMRB API
Our first step is to find the corresponding BMRB entries for the PDB entries in our list. The query we want is shown below:

http://api.bmrb.io/v2/search/get_bmrb_ids_from_pdb_id/

In [14]:
BMRB_LookupString = 'http://api.bmrb.io/v2/search/get_bmrb_ids_from_pdb_id/'

In [15]:
BMRB_ID_List = []
for PDB_ID in vds_list:
    BMRB_response = requests.get(BMRB_LookupString + PDB_ID)
    if BMRB_response.status_code == 200:
        BMRB_result = BMRB_response.json()
        for x in BMRB_result:
            for y in x['match_types']:
                if y == 'Author Provided':
                    BMRB_ID_List.append (x['bmrb_id'])
print(BMRB_ID_List)

['16140', '4685']


In [16]:
chemical_shifts_list = []
for ID in BMRB_ID_List:
    x = requests.get("http://api.bmrb.io/v2/entry/" + ID + "?saveframe_category=assigned_chemical_shifts")
    chemical_shifts_list.append (x.json())
#print(chemical_shifts_list)

### Alternate Approach
Look up through the BMRB adit_nmr_match csv file

loop_

      _Assembly_db_link.Author_supplied
      _Assembly_db_link.Database_code
      _Assembly_db_link.Accession_code
      _Assembly_db_link.Entry_mol_code
      _Assembly_db_link.Entry_mol_name
      _Assembly_db_link.Entry_experimental_method
      _Assembly_db_link.Entry_structure_resolution
      _Assembly_db_link.Entry_relation_type
      _Assembly_db_link.Entry_details
      _Assembly_db_link.Entry_ID
      _Assembly_db_link.Assembly_ID

      yes   PDB   1AXH   .   .   .   .   .

In [17]:
bmrb_link = "https://bmrb.io/ftp/pub/bmrb/nmr_pdb_integrated_data/adit_nmr_matched_pdb_bmrb_entry_ids.csv"