# BV-BRC Genome Conversion

This notebook demonstrates how to use BVBRCUtils to fetch and convert genome data from the BV-BRC (formerly PATRIC) database.

## Overview

BVBRCUtils provides tools for:
- Fetching genome data from the BV-BRC API
- Loading genomes from local BV-BRC files
- Converting BV-BRC data to KBase Genome format
- Creating synthetic genomes from multiple sources
- Aggregating taxonomies across genome sets

## 1. Setup: Add Project to Path

First, we need to add the project source to the Python path:

In [1]:
import sys
import os
from pathlib import Path

# Add the src directory to path
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")

Project root: /Users/chenry/Dropbox/Projects/KBUtilLib
Source path: /Users/chenry/Dropbox/Projects/KBUtilLib/src


## 2. Initialize BVBRCUtils

In [2]:
from kbutillib import BVBRCUtils,NotebookUtils

# Load KBase token
class NotebookUtil(NotebookUtils,BVBRCUtils):
    def __init__(self,**kwargs):
        super().__init__(
            notebook_folder=project_root / "notebooks",
            name="BVBRCExample",
            **kwargs
        )

# Initialize the NotebookUtil instance
util = NotebookUtil() 

2025-12-08 23:48:40,523 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-08 23:48:40,524 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-08 23:48:40,525 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2025-12-08 23:48:40,880 - __main__.NotebookUtil - INFO - Notebook environment detected


## 3. Fetch Genome Metadata from API

Let's fetch metadata for a specific genome (E. coli K-12 MG1655):

In [3]:
genome_id = '511145.183'

metadata = util.fetch_genome_metadata(genome_id)

print("Genome Metadata:")
print("=" * 60)
print(f"Genome ID: {metadata.get('genome_id')}")
print(f"Genome Name: {metadata.get('genome_name')}")
print(f"GC Content: {metadata.get('gc_content')}%")
print(f"Genetic Code: {metadata.get('genetic_code')}")
print(f"Completion Date: {metadata.get('completion_date')}")
print(f"\nTaxonomy: {'; '.join(metadata.get('taxon_lineage_names', []))}")

2025-12-08 23:48:43,515 - __main__.NotebookUtil - INFO - Fetching genome metadata for 511145.183


Genome Metadata:
Genome ID: 511145.183
Genome Name: Escherichia coli str. K-12 substr. MG1655
GC Content: 50.79%
Genetic Code: None
Completion Date: 2013-10-25T00:00:00Z

Taxonomy: cellular organisms; Bacteria; Pseudomonadati; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Escherichia; Escherichia coli; Escherichia coli K-12; Escherichia coli str. K-12 substr. MG1655


## 4. Fetch Genome Sequences

Retrieve the contig sequences for this genome:

In [4]:
sequences = util.fetch_genome_sequences(genome_id)

print(f"Number of contigs: {len(sequences)}")
print("\nFirst contig:")
if sequences:
    first_contig = sequences[0]
    print(f"  Accession: {first_contig.get('accession')}")
    print(f"  Length: {len(first_contig.get('sequence', ''))} bp")
    print(f"  Sequence (first 100bp): {first_contig.get('sequence', '')[:100]}...")

2025-12-08 14:25:02,322 - kbutillib.bvbrc_utils.BVBRCUtils - INFO - Fetching genome sequences for 511145.183


Number of contigs: 1

First contig:
  Accession: AYEK01000001
  Length: 4638920 bp
  Sequence (first 100bp): agcttttcattctgactgcaacgggcaatatgtctctgtgtggattaaaaaaagagtgtctgatagcagcttctgaactggttacctgccgtgagtaaat...


## 5. Build Complete KBase Genome from API

Now let's build a complete KBase Genome object from the BV-BRC API.

**Note**: This will fetch all genome data including features, which may take a few minutes for large genomes.

In [4]:
genome = util.build_kbase_genome_from_api(genome_id)

print("Building KBase genome object...")
print("This will fetch:")
print("  1. Genome metadata")
print("  2. Contig sequences")
print("  3. All features (paginated)")
print("  4. Feature sequences (batched)")
print("\nFor large genomes, this may take several minutes.")
print("\n** Uncomment the code above to run the full fetch **")
util.save("BRBRC-genome",genome)

2025-12-08 23:48:48,396 - __main__.NotebookUtil - INFO - Building KBase genome object for 511145.183
2025-12-08 23:48:48,397 - __main__.NotebookUtil - INFO - Fetching genome metadata for 511145.183
2025-12-08 23:48:48,553 - __main__.NotebookUtil - INFO - Fetching genome sequences for 511145.183
2025-12-08 23:48:49,629 - __main__.NotebookUtil - INFO - Fetching genome features for 511145.183
2025-12-08 23:48:50,957 - __main__.NotebookUtil - INFO - Total features retrieved: 4764
2025-12-08 23:48:50,966 - __main__.NotebookUtil - INFO - Fetching feature sequences for 9149 unique sequences
2025-12-08 23:49:28,171 - __main__.NotebookUtil - INFO - Processing features...
2025-12-08 23:49:28,216 - __main__.NotebookUtil - INFO - Genome object created: 4518 features, 4518 CDS, 4,638,920 bp
2025-12-08 23:49:28,216 - __main__.NotebookUtil - INFO - Creating ontology events from collected annotations...


Building KBase genome object...
This will fetch:
  1. Genome metadata
  2. Contig sequences
  3. All features (paginated)
  4. Feature sequences (batched)

For large genomes, this may take several minutes.

** Uncomment the code above to run the full fetch **


## Summary

This notebook demonstrated:

1. **Initializing BVBRCUtils** - Setting up the API client
2. **Fetching metadata** - Getting genome information from BV-BRC
3. **Fetching sequences** - Retrieving contig sequences
4. **Building genomes** - Creating complete KBase Genome objects
5. **Loading from files** - Working with local BV-BRC data
6. **Aggregating taxonomies** - Finding consensus across genomes
7. **Creating synthetic genomes** - Merging multiple genomes by function
8. **Saving results** - Exporting to JSON format

### Next Steps

- Try fetching different genomes from BV-BRC
- Create synthetic genomes from real genomic data
- Explore the KBase Genome object structure
- Integrate with other KBUtilLib modules for modeling and analysis