# KBUtilLib Environment Configuration

This notebook demonstrates how to set up and configure the KBUtilLib environment.

## Overview

KBUtilLib uses a centralized configuration system with the following priority:

1. **Explicit config parameter** (highest priority)
2. **User config**: `~/.kbutillib/config.yaml`
3. **Project config**: `config.yaml` in project root (lowest priority)

All user-specific settings, databases, and caches are stored in `~/.kbutillib/`

## 1. Setup: Add Project to Path

First, we need to add the project source to the Python path:

In [1]:
import sys
import os
from pathlib import Path

# Add the src directory to path
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")

Project root: /Users/chenry/Dropbox/Projects/KBUtilLib
Source path: /Users/chenry/Dropbox/Projects/KBUtilLib/src


## 2. Initialize the Environment

The `initialize_environment()` function creates the `~/.kbutillib/` directory and copies the default configuration:

In [2]:
from kbutillib import SharedEnvUtils

# Initialize the environment
result = SharedEnvUtils.initialize_environment()

print("Environment Initialization Results:")
print("=" * 60)
print(f"Success: {result['success']}")
print(f"Directory created: {result['directory_created']}")
print(f"Config copied: {result['config_copied']}")
print(f"Config path: {result['config_path']}")
print()
print("Message:")
print(result['message'])

Environment Initialization Results:
Success: True
Directory created: False
Config copied: False
Config path: /Users/chenry/.kbutillib/config.yaml

Message:
Directory already exists: /Users/chenry/.kbutillib
Config file already exists: /Users/chenry/.kbutillib/config.yaml


[KBUtilLib] Failed to import rcsb_pdb_utils: ModuleNotFoundError: No module named 'aiohttp'


## 2b. Set Up Local Dependencies Configuration

KBUtilLib depends on several external repositories (ModelSEEDpy, ModelSEEDDatabase, cobrakbase, etc.).
Dependency paths are configured in `dependencies.yaml`, with the following priority:

1. **User config**: `~/.kbutillib/dependencies.yaml` (local overrides)
2. **Repo config**: `dependencies.yaml` in the KBUtilLib project root

Use `create_local_dependency_config()` to copy the repo's `dependencies.yaml` to `~/.kbutillib/` so you can customize paths without modifying the repository:

In [3]:

from kbutillib import SharedEnvUtils
# Copy the repo's dependencies.yaml to ~/.kbutillib/
result = SharedEnvUtils.create_local_dependency_config()

print("Dependencies Config Setup:")
print("=" * 60)
print(f"Success: {result['success']}")
print(f"Config copied: {result['config_copied']}")
print(f"Config path: {result['config_path']}")
print(f"Message: {result['message']}")

Dependencies Config Setup:
Success: True
Config copied: True
Config path: /Users/chenry/.kbutillib/dependencies.yaml
Message: Copied /Users/chenry/Dropbox/Projects/KBUtilLib/dependencies.yaml to /Users/chenry/.kbutillib/dependencies.yaml


### Check Dependency Status

Initialize the dependency manager and see which dependencies are available at their configured paths:

In [4]:
from kbutillib import SharedEnvUtils

util = SharedEnvUtils()
# Create a dependency manager (auto_init=False to just load config without modifying sys.path)
dep_mgr = util.dependency_manager

print(f"Dependencies config loaded from: {dep_mgr.config_path}")
print()
print("Dependency Status:")
print("=" * 60)

missing = []
for dep_name, dep_config in dep_mgr.config.items():
    dep_path = dep_mgr._resolve_path(dep_config['path'])
    exists = dep_path.exists()
    status = "FOUND" if exists else "MISSING"
    git_url = dep_config.get('git', 'N/A')
    print(f"  {dep_name:35s} [{status}]")
    print(f"    Path: {dep_path}")
    print(f"    Git:  {git_url}")
    if not exists:
        missing.append(dep_name)

print()
if missing:
    print(f"{len(missing)} missing dependency(ies): {', '.join(missing)}")
    print("Run the next cell to clone them automatically.")
else:
    print("All dependencies found!")

2026-02-24 10:34:56,888 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-24 10:34:56,889 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-24 10:34:56,890 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


Dependencies config loaded from: /Users/chenry/.kbutillib/dependencies.yaml

Dependency Status:
  modelseedpy                         [FOUND]
    Path: /Users/chenry/Dropbox/Projects/ModelSEEDpy
    Git:  https://github.com/cshenry/ModelSEEDpy
  ModelSEEDDatabase                   [FOUND]
    Path: /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
    Git:  https://github.com/ModelSEED/ModelSEEDDatabase.git
  cobrakbase                          [FOUND]
    Path: /Users/chenry/Dropbox/Projects/cobrakbase
    Git:  https://github.com/Fxe/cobrakbase.git
  cb_annotation_ontology_api          [FOUND]
    Path: /Users/chenry/Dropbox/Projects/cb_annotation_ontology_api
    Git:  https://github.com/kbaseapps/cb_annotation_ontology_api.git

All dependencies found!


### Install Missing Dependencies (Optional)

If any dependencies are missing, run this cell to clone them from their git URLs.
This uses `initialize_dependencies(checkout_if_missing=True)` which will `git clone` any missing repos to their configured paths:

In [None]:
from kbutillib import SharedEnvUtils

util = SharedEnvUtils()
# Create a dependency manager (auto_init=False to just load config without modifying sys.path)
dep_mgr = util.dependency_manager
dep_mgr.initialize_dependencies(checkout_if_missing=True)

print()
print("Resolved dependency paths:")
print("=" * 60)
for name, path in dep_mgr.dependency_paths.items():
    print(f"  {name:35s} {path}")

## 3. Verify Configuration Loading

Let's verify that the configuration is loaded correctly:

In [None]:
from kbutillib import SharedEnvUtils
# Create a SharedEnvUtils instance
util = SharedEnvUtils()

# Export environment state
env = util.export_environment()

print("Environment State:")
print("=" * 60)
print(f"Config file loaded: {env['config_file']}")
print(f"Token file: {env['token_file']}")
print(f"KBase token file: {env['kbase_token_file']}")
print(f"Environment variables loaded: {len(env['env_vars'])}")
print(f"Token keys available: {env['token_keys']}")

2026-02-24 10:35:08,365 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-24 10:35:08,365 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-24 10:35:08,366 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


Environment State:
Config file loaded: /Users/chenry/.kbutillib/config.yaml
Token file: /Users/chenry/.tokens
KBase token file: /Users/chenry/.kbase/token
Environment variables loaded: 2
Token keys available: ['kbase', 'berdl']


## 4. Access Configuration Values

Use dot notation to access nested configuration values:

In [None]:
from kbutillib import SharedEnvUtils
# Create a SharedEnvUtils instance
util = SharedEnvUtils()

# Access various config values
print("Configuration Values:")
print("=" * 60)

# SKANI configuration
skani_exec = util.get_config_value("skani.executable", default="skani")
skani_cache = util.get_config_value("skani.cache_file")
print(f"SKANI executable: {skani_exec}")
print(f"SKANI cache file: {skani_cache}")
print()

# Path configuration
data_dir = util.get_config_value("paths.data_dir", default="./data")
output_dir = util.get_config_value("paths.output_dir", default="./output")
cache_dir = util.get_config_value("paths.cache_dir", default="./cache")
print(f"Data directory: {data_dir}")
print(f"Output directory: {output_dir}")
print(f"Cache directory: {cache_dir}")
print()

# KBase configuration
kbase_url = util.get_config_value("kbase.url")
print(f"KBase URL: {kbase_url}")
print()

# Modeling configuration
default_obj = util.get_config_value("modeling.default_objective")
fba_timeout = util.get_config_value("modeling.fba_timeout")
print(f"Default objective: {default_obj}")
print(f"FBA timeout: {fba_timeout}")

2026-02-24 10:35:42,517 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-24 10:35:42,518 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-24 10:35:42,520 - kbutillib.shared_env_utils.SharedEnvUtils - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


Configuration Values:
SKANI executable: skani
SKANI cache file: ~/.kbutillib/skani_databases.json

Data directory: ./data
Output directory: ./output
Cache directory: ./cache

KBase URL: https://kbase.us/services

Default objective: bio1
FBA timeout: 300


## 5. Set Configuration Values Programmatically

You can set configuration values programmatically using `set_environment_variable()`. Changes are automatically saved to the config file by default:

In [None]:
from kbutillib import SharedEnvUtils
# Create a SharedEnvUtils instance
util = SharedEnvUtils()
# Set a single configuration value (automatically saved to config file)
util.set_environment_variable("ai_curation.backend", "argo")

# Verify the value was set
print(f"AI Curation backend: {util.get_config_value('ai_curation.backend')}")
print()

# Set multiple values without saving each time (more efficient for batch updates)
util.set_environment_variable("custom.setting1", "value1", save=False)
util.set_environment_variable("custom.setting2", "value2", save=False)
util.set_environment_variable("custom.nested.deep.value", 42, save=False)

# Save all changes at once
util.save_config()

# Verify the values
print("Custom settings:")
print(f"  setting1: {util.get_config_value('custom.setting1')}")
print(f"  setting2: {util.get_config_value('custom.setting2')}")
print(f"  nested.deep.value: {util.get_config_value('custom.nested.deep.value')}")

## 6. View Complete Configuration

Let's view the entire configuration dictionary:

In [None]:
from kbutillib import SharedEnvUtils
import json
# Create a SharedEnvUtils instance
util = SharedEnvUtils()
# Refresh the environment export to include our new settings
env = util.export_environment()

print("Complete Configuration:")
print("=" * 60)
print(json.dumps(env['config'], indent=2))

## 7. Check ~/.kbutillib Directory

Let's see what files are in the `~/.kbutillib/` directory:

In [None]:
from pathlib import Path

kbutillib_dir = Path.home() / ".kbutillib"

print(f"Contents of {kbutillib_dir}:")
print("=" * 60)

if kbutillib_dir.exists():
    for item in sorted(kbutillib_dir.iterdir()):
        if item.is_file():
            size = item.stat().st_size
            print(f"  {item.name:40s} ({size:,} bytes)")
        elif item.is_dir():
            # Count items in subdirectory
            item_count = len(list(item.iterdir())) if item.is_dir() else 0
            print(f"  {item.name:40s} ({item_count} items)")
else:
    print("  Directory does not exist yet.")

## 8. Customize Your Configuration

You can now edit `~/.kbutillib/config.yaml` to customize settings for your environment.

### Common Customizations:

```yaml
# SKANI Configuration
skani:
  executable: "/usr/local/bin/skani"  # Custom path if not in PATH
  cache_file: "~/.kbutillib/skani_databases.json"

# File Paths
paths:
  data_dir: "/mnt/data"          # Custom data directory
  output_dir: "/mnt/output"      # Custom output directory
  cache_dir: "/mnt/cache"        # Custom cache directory

# KBase Configuration
kbase:
  url: "https://kbase.us/services"  # Production
  # url: "https://ci.kbase.us/services"  # CI environment
```

## 10. Force Reinitialize (Optional)

If you need to reset to default configuration, use `force=True`:

In [None]:
# Uncomment to force reinitialize (overwrites existing config)
# result = SharedEnvUtils.initialize_environment(force=True)
# print(result['message'])

## Summary

### Directory Structure

```
~/.kbutillib/
├── config.yaml              # User configuration
├── dependencies.yaml        # Local dependency path overrides
├── skani_databases.json     # SKANI sketch database registry
└── skani_sketches/          # SKANI sketch databases
    ├── database1/
    └── database2/
```

### Key Functions

| Function | Description |
|----------|-------------|
| `SharedEnvUtils.initialize_environment()` | Create and initialize ~/.kbutillib/ |
| `SharedEnvUtils.create_local_dependency_config()` | Copy repo dependencies.yaml to ~/.kbutillib/ |
| `util.dependency_manager.initialize_dependencies(checkout_if_missing=True)` | Clone missing dependencies from git |
| `util.get_dependency_path("name")` | Get resolved path for a dependency |
| `util.get_data_path("name", "subpath")` | Get path to data within a dependency |
| `util.get_config_value("path.to.key")` | Access config with dot notation |
| `util.set_environment_variable("path.to.key", value)` | Set config value and save to file |
| `util.save_config()` | Save current config to file |
| `util.export_environment()` | View environment state |

### Next Steps

1. Customize `~/.kbutillib/config.yaml` for your environment
2. Customize `~/.kbutillib/dependencies.yaml` to point to your local dependency paths
3. All utilities will automatically use your configuration
4. Configuration changes take effect on next utility initialization
5. Use `set_environment_variable()` to programmatically update settings