# Day 1: Afternoon Lab

## **Programmatic Access to Biomedical Resources**

In this tutorial you will experience how to programatically access various biomedical resources

----

**1Ô∏è‚É£ GeneBank Retrieval**

üî∏ **Task**: In the following excercise you will be retrieving the gene data from the Entrez database. To do so you will use two functionalities from the BioPython Library

1.   Entrez
2.   SeqIO

The gene you will be fetching is **"NM_001301717"**


üìö **Optional Learning Resources**: David Boo's tutorials

- https://david-boo.github.io/biopython-tutorial-first/
- https://david-boo.github.io/biopython-tutorial-second/

In [None]:
!pip install biopython
from IPython.display import clear_output
clear_output()

In [None]:
from Bio import Entrez, SeqIO
def fetch_gene_sequence(gene_id):
    Entrez.email = "your_email@example.com" #Enter your email here to use the Entrez Database

    ##Write your Code Here
    handle =  #Use the Entrez.efetch method to retrieve the data from the nucleotide database
    record =  #Use the SeqIO.read the fetched data and make the formatting for "genbank"
    ##


    #Get the information from the record object
    gene_sequence = record.seq
    gene_length = len(gene_sequence)
    gene_description = record.description

    #Display information
    print(f"Gene ID: {gene_id}")
    print(f"Description: {gene_description}")
    print(f"Sequence Length: {gene_length} bp")
    print(f"Sequence: {gene_sequence[:100]}...")
    handle.close()

fetch_gene_sequence("NM_001301717")


<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

from Bio import Entrez, SeqIO
def fetch_gene_sequence(gene_id):
    Entrez.email = "your_email@example.com" #Enter your email here to use the Entrez Database

    ##Write your Code Here
    handle = Entrez.efetch(db="nucleotide", id=gene_id, rettype="gb", retmode="text") #Use the Entrez.efetch method to retrieve the data from the nucleotide database
    record = SeqIO.read(handle, "genbank") #Use the SeqIO.read the fetched data and make the formatting for "genbank"
    ##


    #Get the information from the record object
    gene_sequence = record.seq
    gene_length = len(gene_sequence)
    gene_description = record.description

    #Display information
    print(f"Gene ID: {gene_id}")
    print(f"Description: {gene_description}")
    print(f"Sequence Length: {gene_length} bp")
    print(f"Sequence: {gene_sequence[:100]}...")
    handle.close()

fetch_gene_sequence("NM_001301717")

```
</details>

**2Ô∏è‚É£ Explore how to get functional information (GO terms) of a given Protein from Uniprot using Python**


üî∏ **Task**: In this excercise you will be looking into the requests library to fetch proteins data. The goal is to get the go-term of each returned result.


Explore on the "requests" python module: https://www.w3schools.com/python/module_requests.asp

In [None]:
import requests
def fetch_go_terms(uniprot_id):
    url = f"https://rest.uniprot.org/uniprotkb/{uniprot_id}.json"

    #Write your code here
    response =  #Use the requests package to retrieve the data
    data = #Extract the data from the response use the ".json" method
    references =  #Get the references out of the data object, (Hint: they are stored inside "uniProtKBCrossReferences")
    ##


    #iterate over the references and find the ones from the GO database
    for ref in references:
      if ref.get("database") == "GO":
       print(ref.get("id"))


uniprot_id = 'P38398'  # Example UniProt ID (BRCA1_HUMAN)
go_terms = fetch_go_terms(uniprot_id)

<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

import requests
def fetch_go_terms(uniprot_id):
    url = f"https://rest.uniprot.org/uniprotkb/{uniprot_id}.json"

    #Write your code here
    response = requests.get(url) #Use the requests package to retrieve the data
    data = response.json() #Extract the data from the response
    references = data.get("uniProtKBCrossReferences", []) #Get the references out of the data object, (Hint: they are stored inside "uniProtKBCrossReferences")
    ##


    #iterate over the references and find the ones from the GO database
    for ref in references:
      if ref.get("database") == "GO":
       print(ref.get("id"))


uniprot_id = 'P38398'  # Example UniProt ID (BRCA1_HUMAN)
go_terms = fetch_go_terms(uniprot_id)

```
</details>

**3Ô∏è‚É£ PubMed Article Retrieval:**

 Task: Use the Entrez module from Biopython to search for articles on a specific topic. Fetch and list the PubMed IDs, titles and abstracts of articles related to a particular gene or disease.

In [None]:
from Bio import Entrez

# Set your email address
Entrez.email = "your_email@example.com"

# Define the search term (e.g., BRCA1 or a disease)
search_term = "BRCA1"

# Search for articles related to the search term in PubMed
handle =  #Use the Entrez.esearch function to look for unknown articles
record = Entrez.read(handle)
handle.close()

# Get the list of PubMed IDs (PMIDs) for the articles
pmid_list = record["IdList"]

# Fetch details for each article using the PMIDs
handle =  #Use the Entrez.efetch function to look for articles using the previously acquired list
records = Entrez.read(handle)
handle.close()


#print the data retrieved
for article in records['PubmedArticle']:
    pmid = article['MedlineCitation']['PMID']
    title = article['MedlineCitation']['Article']['ArticleTitle']
    abstract = article['MedlineCitation']['Article'].get('Abstract', {}).get('AbstractText', ['No abstract available'])[0]

    print(f"PubMed ID: {pmid}")
    print(f"Title: {title}")
    print(f"Abstract: {abstract}")
    print("-" * 80)

<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

from Bio import Entrez

# Set your email address
Entrez.email = "your_email@example.com"

# Define the search term (e.g., BRCA1 or a disease)
search_term = "BRCA1"

# Search for articles related to the search term in PubMed
handle = Entrez.esearch(db="pubmed", term=search_term, retmax=10) #Use the Entrez.esearch function to look for unknown articles
record = Entrez.read(handle)
handle.close()

# Get the list of PubMed IDs (PMIDs) for the articles
pmid_list = record["IdList"]

# Fetch details for each article using the PMIDs
handle = Entrez.efetch(db="pubmed", id=pmid_list, retmode="xml") #Use the Entrez.efetch function to look for articles using the previously acquired list
records = Entrez.read(handle)
handle.close()


#print the data retrieved
for article in records['PubmedArticle']:
    pmid = article['MedlineCitation']['PMID']
    title = article['MedlineCitation']['Article']['ArticleTitle']
    abstract = article['MedlineCitation']['Article'].get('Abstract', {}).get('AbstractText', ['No abstract available'])[0]

    print(f"PubMed ID: {pmid}")
    print(f"Title: {title}")
    print(f"Abstract: {abstract}")
    print("-" * 80)


```
</details>

**4Ô∏è‚É£ Basic Ontology Exploration:**

üî∏ **Task**: Download an ontology file in obo format and use obonet library and print out the basic structure, including IDs and labels. Write a script to list all the classes and their labels from the ontology.


See information on obonet usage: https://pypi.org/project/obonet/



In [None]:
!pip install obonet


In [None]:
!wget https://purl.obolibrary.org/obo/doid.obo

In [None]:
import obonet

# Load the ontology file
ontology_file = "doid.obo"  # Update with the path to your OBO file


graph = # Load the ontology file into the object

# Extract and print disease names and synonyms
print("Disease Names and Synonyms:")

for node, data in graph.nodes(data=True):
  if 'disease' in data.get('name', '').lower():  # Check if the term is related to diseases
          name = data.get('name', 'No name available')
          synonyms = data.get('synonym', ["No synonyms available"])

          print(f"Disease: {name}")
          print(f"Synonyms: {', '.join(synonyms)}")
          print()

<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python
import obonet

# Load the ontology file
ontology_file = "doid.obo"  # Update with the path to your OBO file


graph = obonet.read_obo(ontology_file) # Load the ontology

# Extract and print disease names and synonyms
print("Disease Names and Synonyms:")
for node, data in graph.nodes(data=True):
    if 'disease' in data.get('name', '').lower():  # Check if the term is related to diseases
        name = data.get('name', 'No name available')
        synonyms = data.get('synonym', ["No synonyms available"])

        print(f"Disease: {name}")
        print(f"Synonyms: {', '.join(synonyms)}")
        print()
```
</details>

**5Ô∏è‚É£ Simple Knowledge Graph Construction:**
Task: Create a basic knowledge graph using the networkx library to represent relationships between a few biomedical entities (e.g., gene-disease connections). Build a graph with nodes representing genes and diseases and edges representing known associations.

NetworkX is compromised of 2 parts and they are as follows:

| Type      | Function                                                                        | Example
|-----------|---------------------------------------------------------------------------------|---------------------------------|
| `Node`    | Nodes are the actual objects that we want to analyze                            | list of Genes/Disease           |
| `Edge`    | Edges represent the relationships that the nodes have with eachother.           | tuples with ("Gene","Disease")  |

Nodes Example: 

See for more information about networkx:
https://networkx.org/documentation/stable/tutorial.html#nodes


In [None]:
pip install networkx matplotlib

In [None]:

import networkx as nx
import matplotlib.pyplot as plt

# Create a new directed graph
G = nx.DiGraph()

# Add nodes with their types
genes = ["BRCA1", "TP53", "EGFR"]
diseases = ["Breast Cancer", "Lung Cancer", "Colorectal Cancer"]

# Add nodes to the graph (HINT: use the add_node method)


# Add edges representing relationships
# (gene, disease) format for edges
edges = [
    ("BRCA1", "Breast Cancer"),
    ("TP53", "Breast Cancer"),
    ("EGFR", "Lung Cancer"),
    ("TP53", "Lung Cancer"),
    ("EGFR", "Colorectal Cancer")
]

G.add_edges_from(edges)

# Position nodes using a layout algorithm
pos = nx.spring_layout(G, seed=42)

# Draw nodes
node_types = nx.get_node_attributes(G, 'node_type')
color_map = {'gene': 'lightblue', 'disease': 'lightgreen'}
node_colors = [color_map[node_types.get(node, 'unknown')] for node in G.nodes]

# Draw the graph
plt.figure(figsize=(10, 7))
nx.draw(G, pos, with_labels=True, node_color=node_colors, node_size=2000, font_size=10, font_weight='bold', edge_color='gray', linewidths=0.5, alpha=0.7)

# Show the plot
plt.title("Knowledge Graph: Gene-Disease Associations")
plt.show()

<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

import networkx as nx
import matplotlib.pyplot as plt

# Create a new directed graph
G = nx.DiGraph()

# Add nodes with their types
genes = ["BRCA1", "TP53", "EGFR"]
diseases = ["Breast Cancer", "Lung Cancer", "Colorectal Cancer"]

# Add nodes to the graph
G.add_nodes_from(genes, node_type='gene')
G.add_nodes_from(diseases, node_type='disease')

# Add edges representing relationships
# (gene, disease) format for edges
edges = [
    ("BRCA1", "Breast Cancer"),
    ("TP53", "Breast Cancer"),
    ("EGFR", "Lung Cancer"),
    ("TP53", "Lung Cancer"),
    ("EGFR", "Colorectal Cancer")
]

G.add_edges_from(edges)

# Position nodes using a layout algorithm
pos = nx.spring_layout(G, seed=42)

# Draw nodes
node_types = nx.get_node_attributes(G, 'node_type')
color_map = {'gene': 'lightblue', 'disease': 'lightgreen'}
node_colors = [color_map[node_types.get(node, 'unknown')] for node in G.nodes]

# Draw the graph
plt.figure(figsize=(10, 7))
nx.draw(G, pos, with_labels=True, node_color=node_colors, node_size=2000, font_size=10, font_weight='bold', edge_color='gray', linewidths=0.5, alpha=0.7)

# Show the plot
plt.title("Knowledge Graph: Gene-Disease Associations")
plt.show()


```
</details>

#### Contributed by: Suhaib Alghamdi

- [LinkedIn Profile](https://www.linkedin.com/in/suhaib-alghamdi/)