# Assembly Upload and Download

This notebook demonstrates how to upload and download KBase Assembly and AssemblySet objects using KBReadsUtils.

## Overview

KBReadsUtils provides tools for:
- Uploading FASTA files as Assembly objects to KBase
- Creating AssemblySet collections from multiple assemblies
- Downloading Assembly and AssemblySet objects
- Working with local FASTA files and JSON metadata

## 1. Setup: Add Project to Path

In [None]:
import sys
import os
from pathlib import Path

# Add the src directory to path
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")

## 2. Initialize KBReadsUtils

Create an instance with your KBase authentication token:

In [None]:
from kbutillib import KBReadsUtils, SharedEnvUtils

# Load token from environment
env_util = SharedEnvUtils()
token = env_util.get_token('kbase')

if not token:
    print("Warning: No KBase token found!")
    print("Set your token using: env_util.set_token('your_token', 'kbase')")
    print("Or configure it in ~/.kbutillib/config.yaml")
else:
    # Initialize KBReadsUtils
    util = KBReadsUtils(token=token, workspace="your_workspace_name")
    print("KBReadsUtils initialized successfully!")

## 3. Working with Assembly Objects

Create and manipulate Assembly objects locally:

In [None]:
from kbutillib import Assembly

# Create an Assembly object
assembly = Assembly(
    name="Example_Assembly",
    fasta_file="/path/to/genome.fasta",
    metadata={
        "num_contigs": 150,
        "dna_size": 5000000,
        "gc_content": 52.3,
        "type": "Isolate"
    }
)

print("Assembly Object Created:")
print("=" * 60)
print(f"Name: {assembly.name}")
print(f"FASTA file: {assembly.fasta_file}")
print(f"Contigs: {assembly.num_contigs}")
print(f"Size: {assembly.dna_size:,} bp")
print(f"GC%: {assembly.gc_content}")

# Save to JSON
assembly.to_json("assembly_metadata.json")
print("\nSaved to assembly_metadata.json")

## 4. Upload Assemblies to KBase

**Note**: This requires a valid KBase token and workspace.

Upload individual FASTA files or entire directories:

In [None]:
# Example: Upload assemblies from FASTA files
# Uncomment to run with real data

'''
# Upload specific files
result = util.upload_assembly(
    input_paths=[
        "./genomes/genome1.fasta",
        "./genomes/genome2.fasta"
    ],
    workspace_name="your_workspace"
)

print(f"Uploaded assemblies: {result['assemblies']}")

# Upload all FASTA files from a directory
result = util.upload_assembly(
    input_paths=["./genomes/"],
    workspace_name="your_workspace",
    assembly_id_map={
        "genome1.fasta": "EcoliK12",
        "genome2.fasta": "SalmonellaLT2"
    },
    assemblyset_id="MyBacteriaSet"
)

print(f"Uploaded {len(result['assemblies'])} assemblies")
print(f"Created AssemblySet: {result['assemblyset_ref']}")
'''

print("Example code shown above.")
print("Uncomment and provide valid paths/credentials to run upload.")

## 5. Download Assemblies from KBase

Download Assembly or AssemblySet objects and save FASTA files locally:

In [None]:
# Example: Download assemblies
# Uncomment to run with real data

'''
# Download assemblies
assemblies = util.download_assembly(
    assembly_refs=[
        "12345/6/1",  # Assembly reference
        "12345/7/1"   # Another assembly
    ],
    output_dir="./downloaded_assemblies"
)

print(f"Downloaded {len(assemblies.assemblies)} assemblies")

# Access downloaded assemblies
for assembly_name, assembly in assemblies.assemblies.items():
    print(f"\nAssembly: {assembly_name}")
    print(f"  FASTA: {assembly.fasta_file}")
    print(f"  Contigs: {assembly.num_contigs}")
    print(f"  Size: {assembly.dna_size:,} bp")
    print(f"  GC%: {assembly.gc_content:.2f}")

# Download an AssemblySet (automatically expands to all assemblies)
assemblies = util.download_assembly(
    assembly_refs=["12345/8/1"],  # AssemblySet reference
    output_dir="./assemblyset_download"
)

print(f"\nDownloaded {len(assemblies.assemblies)} assemblies from set")
'''

print("Example code shown above.")
print("Uncomment and provide valid workspace references to run download.")

## 6. Working with AssemblySet Collections

Manage collections of assemblies:

In [None]:
from kbutillib import AssemblySet

# Create an AssemblySet
assemblyset = AssemblySet(
    name="MyGenomeCollection",
    description="Collection of bacterial genomes"
)

# Add assemblies
assembly1 = Assembly(name="Genome1", metadata={"num_contigs": 100})
assembly2 = Assembly(name="Genome2", metadata={"num_contigs": 150})

assemblyset.add_assembly(assembly1)
assemblyset.add_assembly(assembly2)

print("AssemblySet Created:")
print("=" * 60)
print(f"Name: {assemblyset.name}")
print(f"Description: {assemblyset.description}")
print(f"Number of assemblies: {len(assemblyset.assemblies)}")

# List assemblies
print("\nAssemblies in set:")
for name in assemblyset.list_assemblies():
    assembly = assemblyset.get_assembly(name)
    print(f"  - {name}: {assembly.num_contigs} contigs")

# Save to JSON
assemblyset.to_json("assemblyset.json")
print("\nSaved to assemblyset.json")

## 7. Complete Workflow Example

Demonstration of a complete upload-download-verify workflow:

In [None]:
# Complete workflow example (commented for safety)
'''
from kbutillib import KBReadsUtils

# Initialize
util = KBReadsUtils(token="your_token", workspace="MyWorkspace")

# Step 1: Upload all genomes from a directory
print("Uploading assemblies...")
upload_result = util.upload_assembly(
    input_paths=["./my_genomes/"],
    assemblyset_id="MyGenomeCollection",
    assembly_type="Isolate",
    taxon_ref="1234/5/6"
)

print(f"Uploaded {len(upload_result['assemblies'])} assemblies")
print(f"Created set: {upload_result['assemblyset_ref']}")

# Step 2: Download the AssemblySet to verify
print("\nDownloading for verification...")
assemblies = util.download_assembly(
    assembly_refs=[upload_result['assemblyset_ref']],
    output_dir="./verification"
)

# Step 3: Verify assemblies
print("\nVerifying downloaded assemblies:")
for name, assembly in assemblies.assemblies.items():
    print(f"  {name}: {assembly.num_contigs} contigs, "
          f"{assembly.dna_size:,} bp, "
          f"GC={assembly.gc_content:.2f}%")
'''

print("Complete workflow example shown above.")
print("This demonstrates: Upload → Create Set → Download → Verify")

## Summary

This notebook demonstrated:

1. **Assembly objects** - Creating and managing local Assembly objects
2. **Upload** - Uploading FASTA files to KBase as Assembly objects
3. **Download** - Retrieving assemblies and saving FASTA files
4. **AssemblySet** - Managing collections of assemblies
5. **JSON serialization** - Saving and loading metadata
6. **Complete workflows** - End-to-end upload/download processes

### Key Features

- **Upload**: Individual files or entire directories
- **Download**: Single assemblies or full AssemblySets
- **Metadata**: Automatic calculation of assembly statistics
- **Flexibility**: Custom assembly IDs and grouping

### Next Steps

- Upload your own genome assemblies
- Create organized AssemblySet collections
- Integrate with genome annotation workflows
- Use assemblies for metabolic modeling