# üß¨ Example Report: Genes to SNPs
This notebook demonstrates how to run the Gene ‚Üí SNP report using Biofilter3R.
It shows how to:
- Check database connection and schema version
- List available reports
- View report documentation
- Run the report using hardcoded gene lists
- Run the report from a file-based list

### üì¶ Step 1: Import Biofilter and connect to the database

In [7]:
from biofilter import Biofilter

In [8]:
# Replace with the appropriate connection string for your environment
db_uri = "postgresql+psycopg2://bioadmin:bioadmin@localhost/biofilter"

# Instance of Biofilter
bf = Biofilter(db_uri)

In [None]:
result = bf.report.run_example_report("report_gene_to_snp")
result.head(10)

### üîç Step 2: Confirm database schema version (optional)

In [9]:
print("Current Biofilter Schema Version:", bf.metadata.schema_version)

Current Biofilter Schema Version: 3.1.0


### üìÉ Step 3: Explore available reports (optional)

In [10]:
# List all registered reports with descriptions
bf.report.list_reports()


üìÑ Available Reports:

1. entity_filter: 
   Validates input list of entity names and returns all matching entities, including conflict and status flags.

2. qry_etl_status: 
   Summarizes the ETL status for each data source

3. gene_to_snp: 
   Given a list of genes, returns gene metadata and associated variants (SNPs) with positional and allelic info.

4. qry_template: 
   Describe what this report does here



True

In [11]:
# Show detailed explanation for the Gene-to-SNP report
bf.report.explain("report_gene_to_snp")

üß¨ GENE ‚Üí SNP Report

    This report takes as input a list of genes ‚Äî accepted formats include:
    - HGNC symbols (e.g., `TP53`)
    - HGNC IDs (e.g., `HGNC:11998`)
    - Entrez IDs (e.g., `7157`)
    - Ensembl IDs (e.g., `ENSG00000141510`)
    - Any other symbols
    - Any other Names or Alias

    It returns:
    - ‚úÖ  Gene metadata (ID, symbol, alias type/source, conflict status)
    - üß¨  Associated SNPs (from dbSNP)
    - üìç  Genomic location (chr/start/end/accession)
    - üß¨  Alleles (ref/alt) and quality
    - ‚ö†Ô∏è  Notes for duplicates or missing variants


üß™ EXAMPLE USAGE

    > result = bf.report.run_report(
        "report_gene_to_snp",
        assembly='38',
        input_data=["TXLNGY", "HGNC:18473", "246126", "ENSG00000131002", "HGNC:5"]
        )


    If you need run a Example Report:
        > result = bf.report.run_example_report("report_gene_to_snp")


    This returns a Pandas DataFrame with columns:
        > print(result)
        Index,Input G

### ‚ñ∂Ô∏è Step 4: Run the example report (predefined inputs inside the report class)

In [12]:
result = bf.report.run_example_report("report_gene_to_snp")

# View result
result.head(10)

Unnamed: 0,Input Gene,HGNC Symbol,Matched Name,Alias Type,Alias Source,Gene ID,Variant ID,Variant Type,Chr,Start,End,Ref Allele,Alt Allele,Accession,Assembly,Quality,Note
0,246126,TXLNGY,246126,code,ENTREZ,38610,rs3900,SNV,Y,19568371.0,19568371.0,"[""G""]","[""C""]",NC_000024.10,GRCh38.p14,8.9,
1,246126,TXLNGY,246126,code,ENTREZ,38610,rs3902,SNV,Y,19568761.0,19568761.0,"[""A""]","[""G""]",NC_000024.10,GRCh38.p14,8.3,
2,246126,TXLNGY,246126,code,ENTREZ,38610,rs3908,DELINS,Y,19571282.0,19571282.0,"[""GGGG""]","[""GGG""]",NC_000024.10,GRCh38.p14,8.1,
3,246126,TXLNGY,246126,code,ENTREZ,38610,rs3909,DELINS,Y,19571277.0,19571278.0,"[""A""]","[""AA""]",NC_000024.10,GRCh38.p14,7.5,
4,246126,TXLNGY,246126,code,ENTREZ,38610,rs3910,SNV,Y,19571345.0,19571345.0,"[""T""]","[""A""]",NC_000024.10,GRCh38.p14,7.5,
5,246126,TXLNGY,246126,code,ENTREZ,38610,rs3911,SNV,Y,19571568.0,19571568.0,"[""A""]","[""G""]",NC_000024.10,GRCh38.p14,7.8,
6,246126,TXLNGY,246126,code,ENTREZ,38610,rs3912,SNV,Y,19570453.0,19570453.0,"[""A""]","[""T""]",NC_000024.10,GRCh38.p14,7.5,
7,246126,TXLNGY,246126,code,ENTREZ,38610,rs1058414,SNV,Y,19590151.0,19590151.0,"[""G""]","[""A""]",NC_000024.10,GRCh38.p14,7.5,
8,246126,TXLNGY,246126,code,ENTREZ,38610,rs1179186,SNV,Y,19583386.0,19583386.0,"[""C""]","[""A""]",NC_000024.10,GRCh38.p14,6.5,
9,246126,TXLNGY,246126,code,ENTREZ,38610,rs1179187,SNV,Y,19582208.0,19582208.0,"[""C""]","[""A"", ""G""]",NC_000024.10,GRCh38.p14,7.5,


### üß™ Step 5: Run the report with a custom gene list (inline)

In [13]:
custom_gene_list = [
    "TXLNGY",             # Symbol
    "HGNC:18473",         # HGNC ID
    "246126",             # Entrez ID
    "ENSG00000131002",    # Ensembl ID
    "HGNC:5"              # Another HGNC ID
]

result2 = bf.report.run_report(
    "report_gene_to_snp",
    assembly='GRC38',
    input_data=custom_gene_list,
)

# Preview the result
result2

Unnamed: 0,Input Gene,HGNC Symbol,Matched Name,Alias Type,Alias Source,Gene ID,Variant ID,Variant Type,Chr,Start,End,Ref Allele,Alt Allele,Accession,Assembly,Quality,Note
0,246126,TXLNGY,246126,code,ENTREZ,38610,rs3900,SNV,Y,19568371.0,19568371.0,"[""G""]","[""C""]",NC_000024.10,GRCh38.p14,8.9,
1,246126,TXLNGY,246126,code,ENTREZ,38610,rs3902,SNV,Y,19568761.0,19568761.0,"[""A""]","[""G""]",NC_000024.10,GRCh38.p14,8.3,
2,246126,TXLNGY,246126,code,ENTREZ,38610,rs3908,DELINS,Y,19571282.0,19571282.0,"[""GGGG""]","[""GGG""]",NC_000024.10,GRCh38.p14,8.1,
3,246126,TXLNGY,246126,code,ENTREZ,38610,rs3909,DELINS,Y,19571277.0,19571278.0,"[""A""]","[""AA""]",NC_000024.10,GRCh38.p14,7.5,
4,246126,TXLNGY,246126,code,ENTREZ,38610,rs3910,SNV,Y,19571345.0,19571345.0,"[""T""]","[""A""]",NC_000024.10,GRCh38.p14,7.5,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3229,246126,TXLNGY,246126,code,ENTREZ,38610,rs2045149522,DELINS,Y,19607647.0,19607647.0,"[""ATAT""]","[""AT""]",NC_000024.10,GRCh38.p14,7.5,
3230,HGNC:5,A1BG,HGNC:5,code,HGNC,1,,,,,,,,,,,No variants found in system
3231,ENSG00000131002,TXLNGY,ENSG00000131002,code,ENSEMBL,38610,,,,,,,,,,,Duplicate entity_id: mapped to same gene as an...
3232,HGNC:18473,TXLNGY,HGNC:18473,code,HGNC,38610,,,,,,,,,,,Duplicate entity_id: mapped to same gene as an...


### üìÇ Step 6: Run the report using a gene list from a local file

In [14]:
# File must contain one gene per line (symbol, ID, or alias)
result3 = bf.report.run_report(
    "report_gene_to_snp",
    input_data="/home/bioadmin/biofilter/notebooks/example_reports/gene_list.txt"
)

# Preview result
print(len(result3))

3232
