### Define the GraphQL query to fetch structural and publication data from RCSB PDB.
The query requests information about specific Protein Data Bank (PDB) entries.
It retrieves:
- PubMed data (abstract text, central ID, and DOI)
- PDB entry ID
- Structural title and descriptor

#### Import requests and Define the GraphQL query

In [30]:
import requests

my_query = '''
{
  entries(entry_ids: ["6GPB", "5GPB", "8GPB", "7GPB"]) {
    pubmed {
      rcsb_pubmed_abstract_text
      rcsb_pubmed_central_id
      rcsb_pubmed_doi
    }
    rcsb_id
    struct {
      title
      pdbx_descriptor
    }
  }
}
'''

#### Send a GET request to the RCSB PDB GraphQL API using the defined query.
The request is URL-encoded to ensure proper formatting.

In [31]:
response = requests.get('https://data.rcsb.org/graphql?query=%s' % requests.utils.requote_uri(my_query))

#### Convert the API response from JSON format into a Python dictionary.

In [32]:
response_json = response.json()
response_json.keys()

dict_keys(['data'])

In [82]:
response_json['data']

{'entries': [{'pubmed': {'rcsb_pubmed_abstract_text': "The crystal structure of phosphorylase b-heptulose 2-phosphate complex with oligosaccharide and AMP bound has been refined by molecular dynamics and crystallographic least-squares with the program XPLOR. Shifts in atomic positions of up to 4 A from the native enzyme structure were correctly determined by the program without manual intervention. The final crystallographic R value for data between 8 and 2.86 A resolution is 0.201, and the overall root-mean-square difference between the native and complexed structure is 0.58 A for all protein atoms. The results confirm the previous observation that there is a direct hydrogen bond between the phosphate of heptulose 2-phosphate and the pyridoxal phosphate 5'-phosphate group. The close proximity of the two phosphates is stabilized by an arginine residue, Arg569, which shifts from a site buried in the protein to a position where it can make contact with the product phosphate. There is a m

#### Extract relevant data from the response.

In [72]:
pdb_entries = response_json["data"]["entries"]

#### Print PDB ID, Title, Descriptor, and DOI in the requested format.

In [73]:
for entry in pdb_entries:
    print(f"PDB ID: {entry['rcsb_id']}")
    print(f"Title: {entry['struct']['title']}")
    print(f"Descriptor: {entry['struct']['pdbx_descriptor']}")
    print(f"DOI: {entry['pubmed']['rcsb_pubmed_doi'] if entry['pubmed'] else 'N/A'}\n")

PDB ID: 6GPB
Title: REFINED CRYSTAL STRUCTURE OF THE PHOSPHORYLASE-HEPTULOSE 2-PHOSPHATE-OLIGOSACCHARIDE-AMP COMPLEX
Descriptor: None
DOI: 10.1016/0022-2836(90)90271-M

PDB ID: 5GPB
Title: COMPARISON OF THE BINDING OF GLUCOSE AND GLUCOSE-1-PHOSPHATE DERIVATIVES TO T-STATE GLYCOGEN PHOSPHORYLASE B
Descriptor: None
DOI: 10.1021/bi00500a005

PDB ID: 8GPB
Title: STRUCTURAL MECHANISM FOR GLYCOGEN PHOSPHORYLASE CONTROL BY PHOSPHORYLATION AND AMP
Descriptor: None
DOI: 10.1016/0022-2836(91)90887-c

PDB ID: 7GPB
Title: STRUCTURAL MECHANISM FOR GLYCOGEN PHOSPHORYLASE CONTROL BY PHOSPHORYLATION AND AMP
Descriptor: None
DOI: 10.1016/0022-2836(91)90887-c



#### Initialize Redis connection


In [95]:
! pip install redis
import redis
red = redis.Redis(host="my_redis")
print(red.ping())

True


#### Store retrieved data in Redis with appropriate key-value pairs

In [96]:
for entry in pdb_entries:
    pdb_id = entry["rcsb_id"]
    
    # Set the title (ensure it is converted to bytes)
    red.set(f"{pdb_id}:title", entry["struct"]["title"].encode())  
    
    # Check if descriptor exists and is not None before encoding and setting
    descriptor = entry["struct"].get("pdbx_descriptor")
    if descriptor is not None:
        red.set(f"{pdb_id}:descriptor", descriptor.encode()) 
    
    # Set the DOI (ensure it is in bytes)
    red.set(f"{pdb_id}:doi", entry["pubmed"]["rcsb_pubmed_doi"].encode())

#### Retrieve and print data for a specific entry (e.g., '8GPB') from Redis

In [93]:
def get_properties(red, pdb_id):
    print(f"PDB ID: {pdb_id}")
    print(f"Title: {red.get(f'{pdb_id}:title')}")
    print(f"Descriptor: {red.get(f'{pdb_id}:descriptor')}")
    print(f"DOI: {red.get(f'{pdb_id}:doi')}\n")

#### Example retrieval for '8GPB'

In [97]:
get_properties(red, '8GPB')

PDB ID: 8GPB
Title: b'STRUCTURAL MECHANISM FOR GLYCOGEN PHOSPHORYLASE CONTROL BY PHOSPHORYLATION AND AMP'
Descriptor: b'N/A'
DOI: b'10.1016/0022-2836(91)90887-c'



### Function to store properties in Redis dynamically

In [104]:
def set_properties(red, p):
    query = '''
    {
      entries(entry_ids: ["'''+p+'''"]) {
        pubmed { rcsb_pubmed_doi }
        rcsb_id
        struct { title pdbx_descriptor }
      }
    }
    '''
    response = requests.get('https://data.rcsb.org/graphql?query=%s' % requests.utils.requote_uri(query))
    entry = response.json()["data"]["entries"][0]
    
    if entry:
        red.set(f"{p}:title", entry["struct"]["title"])
    else:
        red.set(f"{p}:title", "N/A")
        
        # Only set descriptor if it exists
        if entry:
            red.set(f"{p}:descriptor", entry["struct"]["pdbx_descriptor"])
        else:
            red.set(f"{p}:descriptor", "N/A")

        # Only set DOI if it exists
        if entry:
            red.set(f"{p}:doi", entry["pubmed"]["rcsb_pubmed_doi"])
        else:
            red.set(f"{p}:doi", "N/A")

#### Testing the functions

In [105]:
set_properties(red, '7GPB')
get_properties(red, '7GPB')

set_properties(red, '4GYD')
get_properties(red, '4GYD')

PDB ID: 7GPB
Title: b'STRUCTURAL MECHANISM FOR GLYCOGEN PHOSPHORYLASE CONTROL BY PHOSPHORYLATION AND AMP'
Descriptor: b'N/A'
DOI: b'10.1016/0022-2836(91)90887-c'

PDB ID: 4GYD
Title: b'Nostoc sp Cytochrome c6'
Descriptor: None
DOI: None

