###  Gene Mapping Summary

This analysis extracts the top genes based on the `rTS` metric from two datasets. For each gene:

1. The NCBI Gene ID (from the dataset) is cleaned by removing the `_AT1` suffix.
2. The NCBI Entrez API is used to retrieve:

   * the official gene symbol (e.g., `PHGDH`),
   * and the full gene name (e.g., `phosphoglycerate dehydrogenase`).
3. The gene symbol is then queried on the [BiGG Models Database](http://bigg.ucsd.edu) to check if it exists in the metabolic model repository.

The final output is a concise mapping of:

* Gene ID → Symbol → Full Name → BiGG Match Status

This allows quick identification of biologically relevant genes and their presence in curated genome-scale models.


In [1]:
import pandas as pd
from IPython.display import display


In [2]:
# Replace with the correct file paths
file1 = "Recon3D_EOL_logFC1_pval0.05.csv"

df1 = pd.read_csv(file1)

# Rename gene name column
df1.rename(columns={"Unnamed: 0": "Gene"}, inplace=True)


In [3]:
# Sort the datasets by descending rTS values
top_genes_df1 = df1.sort_values(by="rTS", ascending=False).reset_index(drop=True)

# Display the top 10 genes
display(top_genes_df1.head(10))


Unnamed: 0,Gene,bTS,mTS,wTS,rTS
0,55312_AT1,0.190224,0.160264,-0.150324,5.457769
1,26227_AT1,0.193249,0.096598,-0.191545,3.717023
2,9376_AT1,0.156475,0.106128,-0.152724,3.281462
3,22934_AT1,0.069896,0.05591,-0.069453,0.779095
4,6888_AT1,0.082498,0.040016,-0.081591,0.656617
5,2805_AT1,0.05329,0.061779,-0.051355,0.646488
6,9489_AT1,0.066405,0.051867,-0.048756,0.5973
7,6519_AT1,0.209326,0.008701,-0.208145,0.363231
8,51251_AT1,0.076526,0.025114,-0.047896,0.312472
9,21_AT1,0.052962,0.033144,-0.039942,0.307918


In [4]:
# Extract names of the top-ranked genes
top_genes_df = top_genes_df1[["Gene", "rTS"]].head(5)

print("Recon3D_EOL_logFC1_pval0.05:")
for idx, row in enumerate(top_genes_df.itertuples(index=False), 1):
    print(f'{idx}: {row.Gene} → {row.rTS}')


Recon3D_EOL_logFC1_pval0.05:
1: 55312_AT1 → 5.457768774651536
2: 26227_AT1 → 3.7170231305971737
3: 9376_AT1 → 3.2814619426879306
4: 22934_AT1 → 0.7790951310207553
5: 6888_AT1 → 0.6566166615022351


In [5]:
import requests
import time
import pandas as pd

# Fetch official gene symbol and full description
def get_gene_info(ncbi_id):
    url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
    params = {"db": "gene", "id": ncbi_id, "retmode": "json"}
    response = requests.get(url, params=params)
    if response.status_code == 200:
        try:
            summary = response.json()["result"][ncbi_id]
            return summary["name"], summary.get("description", "N/A")
        except KeyError:
            return "N/A", "N/A"
    return "N/A", "N/A"

# Verify if the gene symbol is listed in BiGG
def search_bigg_gene(symbol):
    url = f"https://bigg.ucsd.edu/search?query={symbol}"
    try:
        response = requests.get(url, timeout=5)  # timeout curto
        if response.status_code == 200:
            return "Found" if "Genes" in response.text else "Not Found"
        return "Error"
    except requests.exceptions.RequestException:
        return "Connection Error"

# Compile final gene mapping
mapped_results = []

for _, row in top_genes_df.iterrows():
    gene_id = row["Gene"]
    gene_id_clean = gene_id.split("_")[0]
    rts_value = row["rTS"]
    symbol, fullname = get_gene_info(gene_id_clean)
    bigg_status = search_bigg_gene(symbol) if symbol != "N/A" else "N/A"
    mapped_results.append({
        "Gene ID": gene_id,
        "Symbol": symbol,
        "Full Name": fullname,
        "rTS": rts_value
    })
    time.sleep(1)

# Convert the list of dictionaries to a DataFrame
df_mapped = pd.DataFrame(mapped_results)

# Reorder columns for clarity
df_mapped = df_mapped[["Gene ID", "Symbol", "Full Name", "rTS"]]

# Display the final mapping as a clean table
print("\n== Final Gene Mapping Table ==")
display(df_mapped)

df_mapped.to_csv("gene_mapping_output.csv", index=False)
print("Archive save as gene_mapping_output.csv")



== Final Gene Mapping Table ==


Unnamed: 0,Gene ID,Symbol,Full Name,rTS
0,55312_AT1,RFK,riboflavin kinase,5.457769
1,26227_AT1,PHGDH,phosphoglycerate dehydrogenase,3.717023
2,9376_AT1,SLC22A8,solute carrier family 22 member 8,3.281462
3,22934_AT1,RPIA,ribose 5-phosphate isomerase A,0.779095
4,6888_AT1,TALDO1,transaldolase 1,0.656617


Archive save as gene_mapping_output.csv


### Literature Search: Gene Associations with Alzheimer's Disease

To further explore the biological relevance of the top-ranked genes identified from the datasets, we queried the PubMed database for scientific publications linking each gene to Alzheimer's disease. Using the NCBI Entrez API, we searched for articles containing both the gene symbol and the term "Alzheimer". For each gene, we retrieved up to five recent articles, including their titles and direct links to PubMed entries. This provides insight into the existing evidence for each gene’s potential involvement in Alzheimer-related mechanisms or pathways.


In [6]:
import requests

def get_pubmed_articles(gene_symbol, disease="Alzheimer", max_results=5):
    query = f"{gene_symbol} AND {disease}"
    
    # Step 1: Search PubMed and get article IDs
    search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
    search_params = {
        "db": "pubmed",
        "term": query,
        "retmode": "json",
        "retmax": max_results
    }
    search_response = requests.get(search_url, params=search_params)
    ids = search_response.json()["esearchresult"].get("idlist", [])
    
    if not ids:
        return []

    # Step 2: Fetch summaries (titles)
    fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
    fetch_params = {
        "db": "pubmed",
        "id": ",".join(ids),
        "retmode": "json"
    }
    fetch_response = requests.get(fetch_url, params=fetch_params)
    summaries = fetch_response.json()["result"]
    
    articles = []
    for pid in ids:
        if pid in summaries:
            title = summaries[pid].get("title", "No title")
            link = f"https://pubmed.ncbi.nlm.nih.gov/{pid}/"
            articles.append({"title": title, "url": link})
    
    return articles

for symbol in df_mapped["Symbol"]:
    print(f"\n {symbol} — Articles related to Alzheimer's:")
    for art in get_pubmed_articles(symbol):
        print(f"- {art['title']}\n  {art['url']}")


 RFK — Articles related to Alzheimer's:


- Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics.
  https://pubmed.ncbi.nlm.nih.gov/38123557/
- Blood-Brain Barrier Dysfunction in Normal Aging and Neurodegeneration: Mechanisms, Impact, and Treatments.
  https://pubmed.ncbi.nlm.nih.gov/36848419/
- Biomimetic Remodeling of Microglial Riboflavin Metabolism Ameliorates Cognitive Impairment by Modulating Neuroinflammation.
  https://pubmed.ncbi.nlm.nih.gov/36799538/
- ApoJ/Clusterin concentrations are determinants of cerebrospinal fluid cholesterol efflux capacity and reduced levels are associated with Alzheimer's disease.
  https://pubmed.ncbi.nlm.nih.gov/36572909/
- AGSE: A Novel Grape Seed Extract Enriched for PP2A Activating Flavonoids That Combats Oxidative Stress and Promotes Skin Health.
  https://pubmed.ncbi.nlm.nih.gov/34770760/

 PHGDH — Articles related to Alzheimer's:
- Transcriptional regulation by PHGDH drives amyloid pathology in Alzheimer's disease.
  https://p