<h1>üìò Biofilter Schema Explorer ‚Äî Quick Reference Guide</h1>

The Schema Class is an interactive, notebook-friendly interface for exploring all database models, tables, columns, primary keys, and ORM relationships inside Biofilter.

Methods:
* schema()
* schema.table()
* schema.overview()
* schema.search()
* schema.describe()

<h4>Models vs. Tables</h4>

In Biofilter3R, a model is a Python ORM class that represents a biological concept (such as a gene, variant, or disease) and defines its columns, relationships, and behavior inside the system. A table, on the other hand, is the physical structure stored in the database where the actual data lives. Every model maps to one table, but models provide additional structure‚Äîsuch as relationships, typing, and a user-friendly API‚Äîwhile tables provide efficient storage and indexing. In practice, you query models when working in Python (e.g., select(GeneMaster)), and you reference tables when executing raw SQL (SELECT * FROM gene_master). The distinction allows Biofilter3R to offer both a high-level, intuitive interface for exploration and a low-level interface for performance and debugging.

<h3>Start Biofilter3R</h3>

In [1]:
from biofilter import Biofilter

In [2]:
# Instance of Biofilter
bf = Biofilter()

[INFO] ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
[INFO] üöÄ Initializing Biofilter3R
[INFO]    ‚Ä¢ Version: 3.2.0
[INFO]    ‚Ä¢ Debug mode: False
[INFO]    ‚Ä¢ Config: /home/bioadmin/biofilter/.biofilter.toml
[INFO] ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
[INFO] üîå Database connection established
[INFO]    ‚Ä¢ Engine: postgresql+psycopg2
[INFO]    ‚Ä¢ Host:   localhost
[INFO]    ‚Ä¢ DB:     biofilter
[INFO]    ‚Ä¢ Time:   1.0 ms
[INFO] ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê


<h3>1. List all models</h3>

Returns a list of all SQLAlchemy models registered inside Biofilter3R.
This serves as a quick index of the entire schema.

In [3]:
bf.schema()

['SystemConfig',
 'BiofilterMetadata',
 'GenomeAssembly',
 'Entity',
 'EntityAlias',
 'EntityGroup',
 'EntityRelationshipType',
 'EntityRelationship',
 'EntityLocation',
 'CurationConflict',
 'ConflictStatus',
 'ConflictResolution',
 'OmicStatus',
 'ETLSourceSystem',
 'ETLDataSource',
 'ETLPackage',
 'GeneMaster',
 'GeneGroup',
 'GeneLocusGroup',
 'GeneLocusType',
 'GeneGroupMembership',
 'VariantSNP',
 'VariantSNPMerge',
 'VariantGWAS',
 'VariantGWASSNP',
 'ProteinMaster',
 'ProteinPfam',
 'ProteinPfamLink',
 'ProteinEntity',
 'PathwayMaster',
 'GOMaster',
 'GORelation',
 'DiseaseGroup',
 'DiseaseGroupMembership',
 'DiseaseMaster',
 'ChemicalMaster']

<h3>2. Filter models by keyword</h3>

Returns only the models whose name or group contains "Gene"
(e.g., GeneMaster, GeneGroup, GeneLocusGroup, etc.).
Useful for quickly locating all models related to a biological category.

In [4]:
bf.schema("Gene") # or bf.schema(keyword="Gene")

['GeneMaster',
 'GeneGroup',
 'GeneLocusGroup',
 'GeneLocusType',
 'GeneGroupMembership']

<h3>3. List all SQL tables</h3>
Returns a list of (model_name, table_name) pairs, allowing you to map ORM classes to actual database tables.

In [8]:
bf.schema.tables()

[('SystemConfig', 'system_config'),
 ('BiofilterMetadata', 'biofilter_metadata'),
 ('GenomeAssembly', 'genome_assemblies'),
 ('Entity', 'entities'),
 ('EntityAlias', 'entity_aliases'),
 ('EntityGroup', 'entity_groups'),
 ('EntityRelationshipType', 'entity_relationship_types'),
 ('EntityRelationship', 'entity_relationships'),
 ('EntityLocation', 'entity_locations'),
 ('CurationConflict', 'curation_conflicts'),
 ('OmicStatus', 'omic_status'),
 ('ETLSourceSystem', 'etl_source_systems'),
 ('ETLDataSource', 'etl_data_sources'),
 ('ETLPackage', 'etl_packages'),
 ('GeneMaster', 'gene_masters'),
 ('GeneGroup', 'gene_groups'),
 ('GeneLocusGroup', 'gene_locus_groups'),
 ('GeneLocusType', 'gene_locus_types'),
 ('GeneGroupMembership', 'gene_group_memberships'),
 ('VariantSNP', 'variant_snps'),
 ('VariantSNPMerge', 'variant_snp_merges'),
 ('VariantGWAS', 'variant_gwas'),
 ('VariantGWASSNP', 'variant_gwas_snp'),
 ('ProteinMaster', 'protein_masters'),
 ('ProteinPfam', 'protein_pfams'),
 ('ProteinPfam

<h3>4. Full schema overview (DataFrame)</h3>
Returns a Pandas DataFrame summarizing all Biofilter models, including:

* Group (Gene, Variant, Entity, Disease, Protein‚Ä¶)
* Model name
* Table name
* Column names
* Primary key columns
* Relationship names (SQLAlchemy ORM relationships)

This is essentially a fully dynamic ‚Äúdata dictionary‚Äù for the Biofilter database.

In [9]:
bf.schema.overview()

Unnamed: 0,group,model,table,columns,primary_keys,relationships
0,System,SystemConfig,system_config,"id, key, value, type, description, editable, c...",id,
1,System,BiofilterMetadata,biofilter_metadata,"id, schema_version, etl_version, build_hash, d...",id,
2,System,GenomeAssembly,genome_assemblies,"id, accession, assembly_name, chromosome, crea...",id,
3,Entity,Entity,entities,"id, group_id, has_conflict, is_active, data_so...",id,"entity_group, data_source, etl_package, entity..."
4,Entity,EntityAlias,entity_aliases,"id, entity_id, group_id, alias_value, alias_ty...",id,"entity, entity_group, data_source, etl_package"
5,Entity,EntityGroup,entity_groups,"id, name, description",id,entities
6,Entity,EntityRelationshipType,entity_relationship_types,"id, code, description",id,relationships
7,Entity,EntityRelationship,entity_relationships,"id, entity_1_id, entity_1_group_id, entity_2_i...",id,"entity_1, entity_1_group, entity_2, entity_2_g..."
8,Entity,EntityLocation,entity_locations,"id, entity_id, entity_group_id, assembly_id, b...",id,"entity, entity_group, assembly, data_source, e..."
9,Curation,CurationConflict,curation_conflicts,"id, data_source_id, entity_type, entity_id, id...",id,data_source


<h3>5. Search schema contents</h3>

Searches through:
* model names
* table names
* column names
relationship names

and returns only the models that match the keyword.

Example use cases:
* "variant" ‚Üí all models/tables related to variants
* "entity" ‚Üí entity models

In [12]:
# bf.schema.search("variant")
bf.schema.search("chrom")

Unnamed: 0,group,model,table,columns,primary_keys,relationships
0,System,GenomeAssembly,genome_assemblies,"id, accession, assembly_name, chromosome, crea...",id,
1,Entity,EntityLocation,entity_locations,"id, entity_id, entity_group_id, assembly_id, b...",id,"entity, entity_group, assembly, data_source, e..."
2,Gene,GeneMaster,gene_masters,"id, symbol, hgnc_status, omic_status_id, entit...",id,"omic_status, entity, data_source, etl_package,..."
3,Variant,VariantSNP,variant_snps,"rs_id, chromosome, position_37, position_38, r...",rs_id,"data_source, etl_package"


<h3>6. Describe a model</h3>

Pretty-prints structured information about a specific model:

* Table name
* Group
* All columns + types
* Primary keys
* ORM relationships

In [11]:
bf.schema.describe("GeneMaster")


üìò Model: GeneMaster
üîπ Table: gene_masters
üî∏ Group: Gene

Columns:
  ‚Ä¢ id: INTEGER (PK)
  ‚Ä¢ symbol: VARCHAR(64)
  ‚Ä¢ hgnc_status: VARCHAR(50)
  ‚Ä¢ omic_status_id: INTEGER
  ‚Ä¢ entity_id: BIGINT
  ‚Ä¢ chromosome: VARCHAR(5)
  ‚Ä¢ data_source_id: INTEGER
  ‚Ä¢ etl_package_id: INTEGER
  ‚Ä¢ locus_group_id: INTEGER
  ‚Ä¢ locus_type_id: INTEGER

Relationships:
  ‚Üí omic_status (MANYTOONE)
  ‚Üí entity (MANYTOONE)
  ‚Üí data_source (MANYTOONE)
  ‚Üí etl_package (MANYTOONE)
  ‚Üí gene_locus_group (MANYTOONE)
  ‚Üí gene_locus_type (MANYTOONE)
  ‚Üí group_memberships (ONETOMANY)

