# BV-BRC Genome Conversion

This notebook demonstrates how to use BVBRCUtils to fetch and convert genome data from the BV-BRC (formerly PATRIC) database.

## Overview

BVBRCUtils provides tools for:
- Fetching genome data from the BV-BRC API
- Loading genomes from local BV-BRC files
- Converting BV-BRC data to KBase Genome format
- Creating synthetic genomes from multiple sources
- Aggregating taxonomies across genome sets

## 1. Fetch Genome Metadata from API

Let's fetch metadata for a specific genome (E. coli K-12 MG1655):

In [3]:
%run util.py
# Fetch metadata for E. coli K-12 MG1655
genome_id = '511145.183'

metadata = util.fetch_genome_metadata(genome_id)

print("Genome Metadata:")
print("=" * 60)
print(f"Genome ID: {metadata.get('genome_id')}")
print(f"Genome Name: {metadata.get('genome_name')}")
print(f"GC Content: {metadata.get('gc_content')}%")
print(f"Genetic Code: {metadata.get('genetic_code')}")
print(f"Completion Date: {metadata.get('completion_date')}")
print(f"\nTaxonomy: {'; '.join(metadata.get('taxon_lineage_names', []))}")

2025-12-06 23:38:49,369 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-06 23:38:49,370 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-06 23:38:49,370 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2025-12-06 23:38:49,371 - __main__.NotebookUtil - INFO - Notebook environment detected
2025-12-06 23:38:49,372 - __main__.NotebookUtil - INFO - Fetching genome metadata for 511145.183


/Users/chenry/Dropbox/Projects/KBUtilLib/src
Genome Metadata:
Genome ID: 511145.183
Genome Name: Escherichia coli str. K-12 substr. MG1655
GC Content: 50.79%
Genetic Code: None
Completion Date: 2013-10-25T00:00:00Z

Taxonomy: cellular organisms; Bacteria; Pseudomonadati; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Escherichia; Escherichia coli; Escherichia coli K-12; Escherichia coli str. K-12 substr. MG1655


## 2. Fetch Genome Sequences

Retrieve the contig sequences for this genome:

In [4]:
%run util.py
# Fetch sequences (contigs)
sequences = util.fetch_genome_sequences(genome_id)

print(f"Number of contigs: {len(sequences)}")
print("\nFirst contig:")
if sequences:
    first_contig = sequences[0]
    print(f"  Accession: {first_contig.get('accession')}")
    print(f"  Length: {len(first_contig.get('sequence', ''))} bp")
    print(f"  Sequence (first 100bp): {first_contig.get('sequence', '')[:100]}...")

2025-12-06 23:39:13,111 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-06 23:39:13,112 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-06 23:39:13,113 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2025-12-06 23:39:13,114 - __main__.NotebookUtil - INFO - Notebook environment detected
2025-12-06 23:39:13,115 - __main__.NotebookUtil - INFO - Fetching genome sequences for 511145.183


/Users/chenry/Dropbox/Projects/KBUtilLib/src
Number of contigs: 1

First contig:
  Accession: AYEK01000001
  Length: 4638920 bp
  Sequence (first 100bp): agcttttcattctgactgcaacgggcaatatgtctctgtgtggattaaaaaaagagtgtctgatagcagcttctgaactggttacctgccgtgagtaaat...


## 5. Build Complete KBase Genome from API

Now let's build a complete KBase Genome object from the BV-BRC API.

**Note**: This will fetch all genome data including features, which may take a few minutes for large genomes.

In [5]:
%run util.py
# Build complete KBase genome
# Note: For demonstration, we'll use a smaller genome or limit the fetch
# For full genome, uncomment the following line:
genome = util.build_kbase_genome_from_api(genome_id)

print("Building KBase genome object...")
print("This will fetch:")
print("  1. Genome metadata")
print("  2. Contig sequences")
print("  3. All features (paginated)")
print("  4. Feature sequences (batched)")
print("\nFor large genomes, this may take several minutes.")
print("\n** Uncomment the code above to run the full fetch **")
util.save(genome,"BRBRC-genome.json")

2025-12-06 23:40:40,805 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-06 23:40:40,806 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-06 23:40:40,807 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2025-12-06 23:40:40,808 - __main__.NotebookUtil - INFO - Notebook environment detected
2025-12-06 23:40:40,808 - __main__.NotebookUtil - INFO - Building KBase genome object for 511145.183
2025-12-06 23:40:40,808 - __main__.NotebookUtil - INFO - Fetching genome metadata for 511145.183


/Users/chenry/Dropbox/Projects/KBUtilLib/src


2025-12-06 23:40:41,019 - __main__.NotebookUtil - INFO - Fetching genome sequences for 511145.183
2025-12-06 23:40:42,378 - __main__.NotebookUtil - INFO - Fetching genome features for 511145.183
2025-12-06 23:40:44,430 - __main__.NotebookUtil - INFO - Total features retrieved: 4764
2025-12-06 23:40:44,444 - __main__.NotebookUtil - INFO - Fetching feature sequences for 9149 unique sequences
2025-12-06 23:41:11,076 - __main__.NotebookUtil - INFO - Processing features...
2025-12-06 23:41:11,085 - __main__.NotebookUtil - INFO - Genome object created: 4518 features, 0 CDS, 4,638,920 bp


Building KBase genome object...
This will fetch:
  1. Genome metadata
  2. Contig sequences
  3. All features (paginated)
  4. Feature sequences (batched)

For large genomes, this may take several minutes.

** Uncomment the code above to run the full fetch **


## 6. Build Genome with Ontology Events

You can also add functional annotations as ontology events to the genome. This collects annotations from BV-BRC (like product descriptions, FIGFAM, PGFAM, PLFAM, and GO terms) and adds them as events that can be tracked through the genome's history.

**Note**: Requires a valid workspace name and KBase authentication.

In [None]:
%run util.py

# Example: Build genome with ontology events
# This will collect annotations and add them as events
'''
genome_with_events = util.build_kbase_genome_from_api(
    genome_id='511145.183',
    add_ontology_events=True,
    workspace_name='your_workspace_name'  # Replace with your workspace
)

# The genome will now include ontology events for:
# - SSO: SEED Subsystem Ontology (product descriptions)
# - RefSeq: RefSeq annotations (product descriptions)
# - FIGFAM: FIG families
# - PGFAM: PATRIC genus-specific families
# - PLFAM: PATRIC local families
# - GO: Gene Ontology terms

print(f"Genome with ontology events created")
print(f"Number of events: {len(genome_with_events.get('ontology_events', []))}")
'''

print("Example: Build genome with ontology event tracking")
print("")
print("Ontology types collected from BV-BRC:")
print("  - SSO: SEED Subsystem Ontology")
print("  - RefSeq: RefSeq functional annotations")
print("  - FIGFAM: FIG protein families")
print("  - PGFAM: PATRIC genus-specific families")
print("  - PLFAM: PATRIC local families")
print("  - GO: Gene Ontology terms")
print("")
print("Uncomment the code above with a valid workspace to run")

## Summary

This notebook demonstrated:

1. **Initializing BVBRCUtils** - Setting up the API client
2. **Fetching metadata** - Getting genome information from BV-BRC
3. **Fetching sequences** - Retrieving contig sequences
4. **Building genomes** - Creating complete KBase Genome objects
5. **Adding ontology events** - Tracking functional annotations as events
6. **Loading from files** - Working with local BV-BRC data
7. **Aggregating taxonomies** - Finding consensus across genomes
8. **Creating synthetic genomes** - Merging multiple genomes by function
9. **Saving results** - Exporting to JSON format

### Ontology Event Types

BVBRCUtils can collect and add the following ontology annotations:
- **SSO**: SEED Subsystem Ontology (functional roles)
- **RefSeq**: RefSeq functional annotations
- **FIGFAM**: FIG protein families
- **PGFAM**: PATRIC genus-specific protein families
- **PLFAM**: PATRIC local protein families
- **GO**: Gene Ontology terms

### Next Steps

- Try fetching different genomes from BV-BRC
- Experiment with ontology event tracking
- Create synthetic genomes from real genomic data
- Explore the KBase Genome object structure
- Integrate with other KBUtilLib modules for modeling and analysis