# Extraction of Gene Panels from the Genetic Testing Registry (GTR)

This notebook demonstrates how to extract gene information from NCBI’s **Genetic Testing Registry (GTR)** using a predefined list of test identifiers (IDs).  
Each test ID corresponds to a **gene panel** associated with a specific disease.

The objective is to automate the retrieval of genes from GTR and save them in a structured JSON format for further analysis.

## Obtaining GTR Test Identifiers

Before executing the notebook, a text file containing GTR test identifiers must be created.  
Each line in the file should contain a single GTR test ID.  

The following steps describe how to obtain these identifiers:

1. **Identify the MeSH ID for the disease**
   - Access the [MeSH Database](https://www.ncbi.nlm.nih.gov/mesh).
   - Search for the desired disease (e.g., "breast cancer").
   - Record the **MeSH ID** listed in the entry (for breast cancer, the MeSH ID is `D001943`).

2. **Locate related panels in MedGen**
   - Visit [MedGen](https://www.ncbi.nlm.nih.gov/medgen).
   - Search using the MeSH ID (e.g., `D001943`).
   - On the disease page, navigate to the section titled **Genetic Testing Registry (GTR)** .

3. **Collect GTR Test IDs**
   - In the GTR section, each linked test corresponds to a genetic panel.
   - Click on a test name (e.g., "Breast/Gyn Cancer Panel"); the URL will include the test ID, for example:  
     `https://www.ncbi.nlm.nih.gov/gtr/tests/569831/` → Test ID: `569831`.

4. **Create the input file**
   - Create a file named `gene_panels_ids.txt`.
   - Add one ID per line, for example:
     ```
     569831
     560453
     600459
     ```
   - Save the file in the `tools/` directory so that the full path is `tools/gene_panels_ids.txt`.

In [2]:
# Step 1: Import fetch_gene_panels from create_gene_panel.py
from create_gene_panel import fetch_gene_panels

# Step 2: Define input file with test IDs and output directory

PANELS_FILE = "gene_panels_ids.txt"
PANELS_IDs = ["569831", "560453", "600459"]

with open(PANELS_FILE, 'w') as f:
    for panel_id in PANELS_IDs:
        f.write(f"{panel_id}\n")

OUTPUT_DIR = "../data/gene_panels"

# Step 3: Fetch gene panels using fetch_gene_panels()

fetch_gene_panels(PANELS_FILE, OUTPUT_DIR)

2025-10-21 14:01:23 — ERROR — create_gene_panel — Failed to parse data for test ID: 569831. Error: expected string or bytes-like object, got 'NoneType'
2025-10-21 14:01:24 — INFO — create_gene_panel — Retrieved 23 genes for Gene Panel: Renal_Cancer__Gene_Sequencing_Panel.json
2025-10-21 14:01:24 — INFO — create_gene_panel — Retrieved 99 genes for Gene Panel: Myeloid_Tumor_Panel.json


## Results

Each gene panel retrieved from GTR is saved as a `.json` file within the `data/gene_panels/` directory.

Each file contains:
- The list of genes associated with a single GTR test.
- The data organized in a format suitable for downstream analysis or integration into larger bioinformatics workflows.
