# KBase Workspace Utilities

This notebook demonstrates the KBase Workspace Service API utilities for:
- Retrieving datatype lists and specifications
- Generating datatype documentation
- Creating and registering typespec modules

## Overview

The KBase Workspace Service (WSS) provides:
- **Immutable storage** of typed objects with metadata and provenance
- **Versioning** of typed objects
- **Type checking** against KIDL specifications
- **Module management** for custom type definitions

## 1. Setup and Initialization

In [None]:
%run util.py

from kbutillib import KBWSUtils
import json
from pathlib import Path
from collections import defaultdict, Counter

# Initialize workspace utilities
ws_utils = KBWSUtils()

print("KBase Workspace Utilities initialized")
print(f"Workspace URL: {ws_utils.ws_url}")

## 2. Retrieving All Datatypes

List all available KBase datatypes from the workspace service.

In [None]:
# Retrieve all available datatypes
print("Retrieving all KBase datatypes...")
print("=" * 60)

all_types = ws_utils.list_all_types(include_empty_modules=False)

print(f"Total datatypes found: {len(all_types)}")
print()
print("Sample datatypes (first 15):")
for i, dtype in enumerate(sorted(all_types)[:15], 1):
    print(f"  {i:2d}. {dtype}")
print(f"  ... and {len(all_types) - 15} more")

In [None]:
# Organize datatypes by module
print("Datatypes organized by module:")
print("=" * 60)

modules = defaultdict(list)
for dtype in all_types:
    module = dtype.split('.')[0]
    modules[module].append(dtype)

print(f"Total modules: {len(modules)}")
print()
print("Top 10 modules by type count:")
sorted_modules = sorted(modules.items(), key=lambda x: len(x[1]), reverse=True)
for i, (module, types) in enumerate(sorted_modules[:10], 1):
    print(f"  {i:2d}. {module}: {len(types)} types")

In [None]:
# Save datatype list to file
output_dir = Path("datacache")
output_dir.mkdir(exist_ok=True)

types_file = output_dir / "all_kbase_types.json"
with open(types_file, 'w') as f:
    json.dump({
        "total_types": len(all_types),
        "total_modules": len(modules),
        "types": sorted(all_types),
        "modules": {k: sorted(v) for k, v in sorted(modules.items())}
    }, f, indent=2)

print(f"Saved datatype list to: {types_file}")

## 3. Retrieving Type Specifications

Get detailed specifications for specific datatypes including JSON schemas and descriptions.

In [None]:
# Define key datatypes to retrieve specs for
priority_types = [
    "KBaseGenomes.Genome",
    "KBaseGenomeAnnotations.Assembly",
    "KBaseFBA.FBAModel",
    "KBaseBiochem.Media",
    "KBaseSets.GenomeSet",
    "KBaseExpression.ExpressionMatrix",
    "KBaseRNASeq.RNASeqAlignment"
]

# Filter to only existing types
types_to_fetch = [t for t in priority_types if t in all_types]
print(f"Fetching specifications for {len(types_to_fetch)} types:")
for t in types_to_fetch:
    print(f"  - {t}")

In [None]:
# Retrieve type specifications
print("\nRetrieving type specifications...")
print("=" * 60)

if types_to_fetch:
    specs = ws_utils.get_type_specs(types_to_fetch)
    print(f"Successfully retrieved {len(specs)} specifications")
else:
    specs = {}
    print("No types to fetch")

In [None]:
# Display specification details for one type
if specs:
    sample_type = list(specs.keys())[0]
    sample_spec = specs[sample_type]
    
    print(f"Specification for: {sample_type}")
    print("=" * 60)
    print(f"Type Definition: {sample_spec.get('type_def', 'N/A')}")
    print()
    
    # Show description
    desc = sample_spec.get('description', '')
    if desc:
        print("Description:")
        print("-" * 40)
        # Show first 500 chars
        print(desc[:500])
        if len(desc) > 500:
            print("...")
    print()
    
    # Show version info
    print(f"Module Versions: {sample_spec.get('module_vers', [])}")
    print(f"Type Versions: {sample_spec.get('type_vers', [])}")
    print(f"Released Versions: {sample_spec.get('released_type_vers', [])}")
    print()
    
    # Show schema info
    schema = sample_spec.get('json_schema')
    if schema and isinstance(schema, dict):
        print(f"Schema Type: {schema.get('type', 'unknown')}")
        if 'properties' in schema:
            print(f"Number of Properties: {len(schema['properties'])}")
            print("Properties (first 10):")
            for prop in list(schema['properties'].keys())[:10]:
                print(f"  - {prop}")

In [None]:
# Save specifications to file
if specs:
    specs_file = output_dir / "type_specifications.json"
    with open(specs_file, 'w') as f:
        json.dump(specs, f, indent=2)
    print(f"Saved {len(specs)} type specifications to: {specs_file}")

## 4. Batch Retrieval of All Type Specifications

Retrieve specifications for all datatypes in batches with progress tracking.

In [None]:
def retrieve_all_specs(type_list, batch_size=50):
    """Retrieve specifications for all types in batches.
    
    Args:
        type_list: List of type strings
        batch_size: Number of types per batch
        
    Returns:
        Tuple of (specs dict, failed types list)
    """
    all_specs = {}
    failed_types = []
    
    total_batches = (len(type_list) + batch_size - 1) // batch_size
    
    for i in range(0, len(type_list), batch_size):
        batch = type_list[i:i+batch_size]
        batch_num = i // batch_size + 1
        
        print(f"Processing batch {batch_num}/{total_batches} ({len(batch)} types)...", end=" ")
        
        try:
            batch_specs = ws_utils.get_type_specs(batch)
            all_specs.update(batch_specs)
            print("OK")
        except Exception as e:
            print(f"Error: {e}")
            # Try individual types
            for dtype in batch:
                try:
                    spec = ws_utils.get_type_specs([dtype])
                    all_specs.update(spec)
                except Exception:
                    failed_types.append(dtype)
    
    return all_specs, failed_types

print("Batch retrieval function defined")
print()
print("To retrieve ALL type specs, uncomment and run:")
print("  all_specs, failed = retrieve_all_specs(all_types)")
print("  print(f'Retrieved {len(all_specs)} specs, {len(failed)} failed')")

In [None]:
# Uncomment to run full batch retrieval (takes several minutes)
'''
print("Starting full batch retrieval...")
print("This may take several minutes for 300+ types")
print("=" * 60)

all_specs, failed_types = retrieve_all_specs(all_types, batch_size=50)

print()
print("=" * 60)
print(f"Successfully retrieved: {len(all_specs)} specifications")
print(f"Failed to retrieve: {len(failed_types)} types")

# Save complete specifications
all_specs_file = output_dir / "all_type_specs.json"
with open(all_specs_file, 'w') as f:
    json.dump(all_specs, f, indent=2)
print(f"Saved to: {all_specs_file}")
'''

print("Batch retrieval code ready (commented out)")

## 5. Generating Datatype Documentation

Generate comprehensive markdown documentation from type specifications.

In [None]:
def analyze_type_structure(spec):
    """Analyze the structure of a type specification."""
    analysis = {
        'has_description': bool(spec.get('description')),
        'has_json_schema': bool(spec.get('json_schema')),
        'has_spec_def': bool(spec.get('spec_def')),
        'module_versions': len(spec.get('module_vers', [])),
        'type_versions': len(spec.get('type_vers', [])),
        'released_versions': len(spec.get('released_type_vers', [])),
        'using_types': len(spec.get('using_type_defs', [])),
        'used_types': len(spec.get('used_type_defs', [])),
    }

    # Parse JSON schema for more details
    if 'json_schema' in spec:
        schema = spec['json_schema']
        if isinstance(schema, dict):
            analysis['schema_type'] = schema.get('type', 'unknown')
            analysis['has_properties'] = 'properties' in schema
            analysis['property_count'] = len(schema.get('properties', {}))
            analysis['required_fields'] = len(schema.get('required', []))

    return analysis

# Analyze the specs we have
if specs:
    print("Type Analysis:")
    print("=" * 60)
    for dtype, spec in specs.items():
        analysis = analyze_type_structure(spec)
        print(f"\n{dtype}:")
        print(f"  Schema Type: {analysis.get('schema_type', 'N/A')}")
        print(f"  Properties: {analysis.get('property_count', 0)}")
        print(f"  Required Fields: {analysis.get('required_fields', 0)}")
        print(f"  Dependencies: {analysis.get('used_types', 0)}")

In [None]:
def generate_markdown_documentation(specs_dict, output_file):
    """Generate comprehensive markdown documentation for datatypes."""
    
    # Build module index
    module_index = defaultdict(list)
    for type_name in sorted(specs_dict.keys()):
        module = type_name.split('.')[0]
        module_index[module].append(type_name)
    
    md = []
    md.append("# KBase Workspace Datatypes Reference")
    md.append("")
    md.append(f"**Total Datatypes:** {len(specs_dict)}")
    md.append(f"**Total Modules:** {len(module_index)}")
    md.append("")
    md.append("---")
    md.append("")
    
    # Table of Contents
    md.append("## Table of Contents")
    md.append("")
    md.append("1. [Module Index](#module-index)")
    md.append("2. [Type Specifications](#type-specifications)")
    md.append("3. [Usage Guidelines](#usage-guidelines)")
    md.append("")
    md.append("---")
    md.append("")
    
    # Module Index
    md.append("## Module Index")
    md.append("")
    for module in sorted(module_index.keys()):
        types = module_index[module]
        md.append(f"### {module} ({len(types)} types)")
        md.append("")
        for dtype in types:
            spec = specs_dict[dtype]
            version = spec.get('type_def', '').split('-')[-1] if '-' in spec.get('type_def', '') else "latest"
            md.append(f"- **{dtype}** (v{version})")
        md.append("")
    
    md.append("---")
    md.append("")
    
    # Type Specifications
    md.append("## Type Specifications")
    md.append("")
    
    for dtype in sorted(specs_dict.keys()):
        spec = specs_dict[dtype]
        md.append(f"### {dtype}")
        md.append("")
        md.append(f"**Version:** {spec.get('type_def', 'N/A')}")
        md.append("")
        
        if spec.get('description'):
            desc = spec['description'].split('\n')[0][:200]
            md.append(f"**Description:** {desc}")
            md.append("")
        
        # Schema info
        if 'json_schema' in spec and isinstance(spec['json_schema'], dict):
            schema = spec['json_schema']
            md.append(f"**Schema Type:** `{schema.get('type', 'unknown')}`")
            
            if 'properties' in schema:
                props = schema['properties']
                required = schema.get('required', [])
                md.append(f"**Fields:** {len(props)} total, {len(required)} required")
                md.append("")
                md.append("| Field | Type | Required |")
                md.append("|-------|------|----------|")
                for field_name in list(props.keys())[:10]:
                    field_spec = props[field_name]
                    field_type = field_spec.get('type', 'unknown')
                    is_req = 'Yes' if field_name in required else 'No'
                    md.append(f"| `{field_name}` | {field_type} | {is_req} |")
                if len(props) > 10:
                    md.append(f"| ... | ... | *{len(props) - 10} more fields* |")
        md.append("")
        md.append("---")
        md.append("")
    
    # Usage Guidelines
    md.append("## Usage Guidelines")
    md.append("")
    md.append("### Working with KBase Datatypes")
    md.append("")
    md.append("1. **Use full type names**: Always use `Module.TypeName` format")
    md.append("2. **Check versions**: Verify compatibility when referencing types")
    md.append("3. **Follow dependencies**: Understand type relationships")
    md.append("4. **Use JSON schemas**: Leverage for validation and code generation")
    md.append("")
    md.append("### Common Operations")
    md.append("")
    md.append("```python")
    md.append("from kbutillib import KBWSUtils")
    md.append("")
    md.append("utils = KBWSUtils()")
    md.append("")
    md.append("# List all types")
    md.append("all_types = utils.list_all_types()")
    md.append("")
    md.append("# Get specifications")
    md.append("specs = utils.get_type_specs(['KBaseGenomes.Genome'])")
    md.append("```")
    md.append("")
    
    # Write to file
    with open(output_file, 'w') as f:
        f.write('\n'.join(md))
    
    return len(md)

print("Documentation generator function defined")

In [None]:
# Generate documentation from available specs
if specs:
    doc_file = output_dir / "KBase_Datatypes_Reference.md"
    line_count = generate_markdown_documentation(specs, doc_file)
    
    print(f"Generated {line_count} lines of documentation")
    print(f"Saved to: {doc_file}")
else:
    print("No specs available - fetch some first")

## 6. Working with Typespec Modules

Understanding and creating KBase typespec modules using KIDL.

In [4]:
# Workflow functions for typespec management
%run util.py

from kbutillib import KBWSUtils
import json
from pathlib import Path
from collections import defaultdict, Counter

# Initialize workspace utilities
ws_utils = KBWSUtils(kb_version="appdev")

print("KBase Workspace Utilities initialized")
print(f"Workspace URL: {ws_utils.workspace_url}")

2025-12-13 22:57:17,831 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-13 22:57:17,832 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-13 22:57:17,832 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


/Users/chenry/Dropbox/Projects/KBUtilLib/src


2025-12-13 22:57:18,201 - __main__.NotebookUtil - INFO - Notebook environment detected
2025-12-13 22:57:18,204 - kbutillib.kb_ws_utils.KBWSUtils - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-13 22:57:18,204 - kbutillib.kb_ws_utils.KBWSUtils - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-13 22:57:18,205 - kbutillib.kb_ws_utils.KBWSUtils - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


KBase Workspace Utilities initialized
Workspace URL: https://appdev.kbase.us/services/ws


In [None]:
# List all available modules
print("Available KBase Modules:")
print("=" * 60)

module_list = sorted(modules.keys())
print(f"Total modules: {len(module_list)}")
print()
print("Modules (alphabetical):")
for i, mod in enumerate(module_list, 1):
    type_count = len(modules[mod])
    print(f"  {i:2d}. {mod} ({type_count} types)")

In [None]:
# Example: Test typespec compilation (dryrun)
print("Testing typespec compilation (dryrun mode):")
print("=" * 60)

'''
# Uncomment to test registration (requires proper permissions)
result = register_typespec_dryrun(
    ws_utils.ws_client,
    example_typespec,
    ['Sample', 'AnalysisResult', 'AnalysisResultSet'],
    dryrun=True
)

if 'error' in result:
    print(f"Error: {result['error']}")
else:
    print("Typespec compiled successfully!")
    print(f"Generated schemas for {len(result)} types")
    for type_name, schema in result.items():
        print(f"  - {type_name}")
'''

print("Dryrun test code ready (commented out)")
print("Uncomment to test with proper workspace permissions")

## 7. Module Ownership and Registration Workflow

Complete workflow for creating and registering new typespec modules.

In [None]:
# Workflow functions for typespec management
%run util.py

from kbutillib import KBWSUtils
import json
from pathlib import Path
from collections import defaultdict, Counter

# Initialize workspace utilities
ws_utils = KBWSUtils(kb_version="appdev")

print("KBase Workspace Utilities initialized")
print(f"Workspace URL: {ws_utils.ws_url}")

In [None]:
# Get info about an existing module
print("Getting module information:")
print("=" * 60)

# Try to get info about KBaseGenomes module
module_info = util.get_module_info(ws_utils.ws_client, 'KBaseGenomes')

if 'error' not in module_info:
    print(f"Module: KBaseGenomes")
    print(f"Version: {module_info.get('ver', 'N/A')}")
    print(f"Owners: {module_info.get('owners', [])}")
    print(f"Is Released: {module_info.get('is_released', False)}")
    print(f"Number of Types: {len(module_info.get('types', {}))}")
    print()
    print("Types in module:")
    for type_name in list(module_info.get('types', {}).keys())[:10]:
        print(f"  - {type_name}")
else:
    print(f"Error: {module_info['error']}")

In [None]:
# List module versions
print("Module version history:")
print("=" * 60)

versions = util.list_module_versions(ws_utils.ws_client, 'KBaseGenomes')

if 'error' not in versions:
    print(f"Module: {versions.get('mod', 'N/A')}")
    print(f"All versions: {versions.get('vers', [])}")
    print(f"Released versions: {versions.get('released_vers', [])}")
else:
    print(f"Error: {versions['error']}")

## 8. Complete Registration Workflow Example

Step-by-step example of registering a new typespec module.

In [2]:
%run util.py

# Load KBase token
class NotebookUtil(NotebookUtils,KBWSUtils):
    def __init__(self,**kwargs):
        super().__init__(
            notebook_folder=script_dir,
            name="KBWSUtils Example",
            kb_version="appdev",
            **kwargs
        )

# Initialize the NotebookUtil instance
util = NotebookUtil()
print(f"Workspace URL: {util.workspace_url}")

2025-12-14 00:20:45,474 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2025-12-14 00:20:45,475 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2025-12-14 00:20:45,475 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2025-12-14 00:20:45,476 - __main__.NotebookUtil - INFO - Notebook environment detected


/Users/chenry/Dropbox/Projects/KBUtilLib/src
Workspace URL: https://appdev.kbase.us/services/ws


In [3]:
# Complete workflow for registering a new module

# Complete Typespec Registration Workflow
# ========================================

# Step 1: Request module ownership (requires admin approval)
#result = util.request_module_ownership('MyNewModule')
#print(f"Ownership request: {result}")

# Step 2: Wait for admin approval...
# (This is done through KBase admin processes)

# Step 3: Define your typespec
#Load from data/KBaseGenomes.spec
my_typespec = Path("data/KBaseFBA.spec").read_text()

# Step 4: Test compilation (dryrun)
result = util.register_typespec_dryrun(
    my_typespec,
    ['GenomeDataLakeTables'],
    dryrun=False
)
if 'error' in result:
    print(f"Compilation error: {result['error']}")
else:
    print("Compilation successful!")

cleaned_result = {}
for key, value in result.items():
    if isinstance(value, str):
        try:
            cleaned_result[key] = json.loads(value)
        except json.JSONDecodeError:
            cleaned_result[key] = value
    else:
        cleaned_result[key] = value

util.save("KBaseFBARegistration",cleaned_result)

Note: Registering new typespecs requires:
  1. Module ownership (request via request_module_ownership)
  2. Admin approval for new modules
  3. Valid KIDL syntax
Compilation successful!


In [4]:
# Step 6: Release module for general use
result = util.release_module('KBaseFBA')

cleaned_result = {}
for key, value in result.items():
    if isinstance(value, str):
        try:
            cleaned_result[key] = json.loads(value)
        except json.JSONDecodeError:
            cleaned_result[key] = value
    else:
        cleaned_result[key] = value

util.save("KBaseFBARelease",cleaned_result)

print("Complete Registration Workflow:")
print("=" * 60)

Complete Registration Workflow:


## Summary

This notebook demonstrated:

1. **Retrieving Datatypes** - List all available KBase datatypes and organize by module
2. **Type Specifications** - Fetch detailed specs including JSON schemas and descriptions
3. **Batch Retrieval** - Efficiently retrieve specs for many types with progress tracking
4. **Documentation Generation** - Create markdown documentation from type specs
5. **Typespec Modules** - Understanding KIDL syntax and module structure
6. **Module Management** - Request ownership, list versions, get module info
7. **Registration Workflow** - Complete process for creating and registering new types

### Key Functions

```python
# List all types
all_types = ws_utils.list_all_types()

# Get type specifications
specs = ws_utils.get_type_specs(['KBaseGenomes.Genome'])

# Module management (requires permissions)
request_module_ownership(ws_client, 'ModuleName')
get_module_info(ws_client, 'ModuleName')
list_module_versions(ws_client, 'ModuleName')
release_module(ws_client, 'ModuleName')
```

### Next Steps

- Explore specific module types for your use case
- Design custom typespecs for new data types
- Work with KBase admins for module registration
- Generate comprehensive documentation for your types