# KBUtilLib Environment Configuration

This notebook demonstrates how to set up and configure the KBUtilLib environment.

## Overview

KBUtilLib uses a centralized configuration system with the following priority:

1. **Explicit config parameter** (highest priority)
2. **User config**: `~/.kbutillib/config.yaml`
3. **Project config**: `config.yaml` in project root (lowest priority)

All user-specific settings, databases, and caches are stored in `~/.kbutillib/`

## 1. Setup: Add Project to Path

First, we need to add the project source to the Python path:

In [1]:
import sys
import os
from pathlib import Path

# Add the src directory to path
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")

Project root: /home/chenry/Dropbox/Projects/KBUtilLib
Source path: /home/chenry/Dropbox/Projects/KBUtilLib/src


## 2. Initialize the Environment

The `initialize_environment()` function creates the `~/.kbutillib/` directory and copies the default configuration:

In [2]:
from kbutillib import SharedEnvUtils

# Initialize the environment
result = SharedEnvUtils.initialize_environment()

print("Environment Initialization Results:")
print("=" * 60)
print(f"Success: {result['success']}")
print(f"Directory created: {result['directory_created']}")
print(f"Config copied: {result['config_copied']}")
print(f"Config path: {result['config_path']}")
print()
print("Message:")
print(result['message'])

Environment Initialization Results:
Success: True
Directory created: True
Config copied: True
Config path: /home/chenry/.kbutillib/config.yaml

Message:
Created directory: /home/chenry/.kbutillib
Copied config from /home/chenry/Dropbox/Projects/KBUtilLib/config.yaml to /home/chenry/.kbutillib/config.yaml


  if re.search("(.+)_([a-zA-Z]+)(\d*)$", id) != None:
  m = re.search("(.+)_([a-zA-Z]+)(\d*)$", id)


## 3. Verify Configuration Loading

Let's verify that the configuration is loaded correctly:

In [3]:
# Create a SharedEnvUtils instance
util = SharedEnvUtils()

# Export environment state
env = util.export_environment()

print("Environment State:")
print("=" * 60)
print(f"Config file loaded: {env['config_file']}")
print(f"Token file: {env['token_file']}")
print(f"KBase token file: {env['kbase_token_file']}")
print(f"Environment variables loaded: {len(env['env_vars'])}")
print(f"Token keys available: {env['token_keys']}")

2025-12-03 20:51:58,271 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded configuration from: /home/chenry/.kbutillib/config.yaml
2025-12-03 20:51:58,272 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded kbase tokens from /home/chenry/.kbase/token


Environment State:
Config file loaded: /home/chenry/.kbutillib/config.yaml
Token file: /home/chenry/.tokens
KBase token file: /home/chenry/.kbase/token
Environment variables loaded: 0
Token keys available: ['kbase']


## 4. Access Configuration Values

Use dot notation to access nested configuration values:

In [4]:
# Access various config values
print("Configuration Values:")
print("=" * 60)

# SKANI configuration
skani_exec = util.get_config_value("skani.executable", default="skani")
skani_cache = util.get_config_value("skani.cache_file")
print(f"SKANI executable: {skani_exec}")
print(f"SKANI cache file: {skani_cache}")
print()

# Path configuration
data_dir = util.get_config_value("paths.data_dir", default="./data")
output_dir = util.get_config_value("paths.output_dir", default="./output")
cache_dir = util.get_config_value("paths.cache_dir", default="./cache")
print(f"Data directory: {data_dir}")
print(f"Output directory: {output_dir}")
print(f"Cache directory: {cache_dir}")
print()

# KBase configuration
kbase_url = util.get_config_value("kbase.url")
print(f"KBase URL: {kbase_url}")
print()

# Modeling configuration
default_obj = util.get_config_value("modeling.default_objective")
fba_timeout = util.get_config_value("modeling.fba_timeout")
print(f"Default objective: {default_obj}")
print(f"FBA timeout: {fba_timeout}")

Configuration Values:
SKANI executable: skani
SKANI cache file: ~/.kbutillib/skani_databases.json

Data directory: ./data
Output directory: ./output
Cache directory: ./cache

KBase URL: https://kbase.us/services

Default objective: bio1
FBA timeout: 300


## 5. View Complete Configuration

Let's view the entire configuration dictionary:

In [None]:
import json

print("Complete Configuration:")
print("=" * 60)
print(json.dumps(env['config'], indent=2))

## 6. Check ~/.kbutillib Directory

Let's see what files are in the `~/.kbutillib/` directory:

In [None]:
from pathlib import Path

kbutillib_dir = Path.home() / ".kbutillib"

print(f"Contents of {kbutillib_dir}:")
print("=" * 60)

if kbutillib_dir.exists():
    for item in sorted(kbutillib_dir.iterdir()):
        if item.is_file():
            size = item.stat().st_size
            print(f"  üìÑ {item.name:40s} ({size:,} bytes)")
        elif item.is_dir():
            # Count items in subdirectory
            item_count = len(list(item.iterdir())) if item.is_dir() else 0
            print(f"  üìÅ {item.name:40s} ({item_count} items)")
else:
    print("  Directory does not exist yet.")

## 7. Customize Your Configuration

You can now edit `~/.kbutillib/config.yaml` to customize settings for your environment.

### Common Customizations:

```yaml
# SKANI Configuration
skani:
  executable: "/usr/local/bin/skani"  # Custom path if not in PATH
  cache_file: "~/.kbutillib/skani_databases.json"

# File Paths
paths:
  data_dir: "/mnt/data"          # Custom data directory
  output_dir: "/mnt/output"      # Custom output directory
  cache_dir: "/mnt/cache"        # Custom cache directory

# KBase Configuration
kbase:
  url: "https://kbase.us/services"  # Production
  # url: "https://ci.kbase.us/services"  # CI environment
```

## 8. Test SKANI Integration

Let's test that SKANI utilities can access the configuration:

In [None]:
from kbutillib import SKANIUtils

# Initialize SKANIUtils (reads config automatically)
skani_util = SKANIUtils()

print("SKANI Configuration:")
print("=" * 60)
print(f"Executable: {skani_util.skani_executable}")
print(f"Cache file: {skani_util.cache_file}")
print(f"SKANI available: {skani_util.skani_available}")

# List any existing databases
databases = skani_util.list_databases()
print(f"\nRegistered databases: {len(databases)}")
for db in databases:
    print(f"  - {db['name']}: {db['genome_count']} genomes")

In [5]:
from kbutillib import SKANIUtils

# Initialize SKANIUtils (reads config automatically)
skani_util = SKANIUtils()
skani_util.add_skani_database(
    "gtdb_bacteria",
    "/storage/fliu/data/ani/skani/gtdb_r220",
    description="GTDB bacterial representatives r220"
)

2025-12-03 21:53:03,272 - kbutillib.skani_utils.SKANIUtils - INFO - Loaded configuration from: /home/chenry/.kbutillib/config.yaml
2025-12-03 21:53:03,273 - kbutillib.skani_utils.SKANIUtils - INFO - Loaded kbase tokens from /home/chenry/.kbase/token
2025-12-03 21:53:03,275 - kbutillib.skani_utils.SKANIUtils - INFO - SKANI database cache: /home/chenry/.kbutillib/skani_databases.json
2025-12-03 21:53:03,283 - kbutillib.skani_utils.SKANIUtils - INFO - SKANI is available: skani 0.3.1 (executable: /opt/skani/0.3.1/skani)
2025-12-03 21:53:03,287 - kbutillib.skani_utils.SKANIUtils - INFO - Added database 'gtdb_bacteria' to cache at /storage/fliu/data/ani/skani/gtdb_r220


True

## 9. Force Reinitialize (Optional)

If you need to reset to default configuration, use `force=True`:

In [None]:
# Uncomment to force reinitialize
# result = SharedEnvUtils.initialize_environment(force=True)
# print(result['message'])

## Summary

### Directory Structure

```
~/.kbutillib/
‚îú‚îÄ‚îÄ config.yaml              # User configuration
‚îú‚îÄ‚îÄ skani_databases.json     # SKANI sketch database registry
‚îî‚îÄ‚îÄ skani_sketches/          # SKANI sketch databases
    ‚îú‚îÄ‚îÄ database1/
    ‚îî‚îÄ‚îÄ database2/
```

### Key Functions

- `SharedEnvUtils.initialize_environment()` - Create and initialize ~/.kbutillib/
- `util.get_config_value("path.to.key")` - Access config with dot notation
- `util.export_environment()` - View environment state

### Next Steps

1. Customize `~/.kbutillib/config.yaml` for your environment
2. All utilities will automatically use your configuration
3. Configuration changes take effect on next utility initialization