# Development Utilities Notebook

This notebook provides utilities for testing and debugging during development, particularly focused on module reloading and utility testing.

## Contents
1. Setup and Path Configuration
2. Module Reloading Utilities
3. Test Cases
4. Example Workflows

## 1. Setup and Path Configuration

First, let's set up our environment and ensure we can access our project modules.

In [None]:
import os
import sys
import importlib
from pathlib import Path

# Add project root to path
project_root = Path("../..").resolve()
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

print(f"Project root added to path: {project_root}")

## 2. Module Reloading Utilities

These functions help manage module reloading during development.

In [None]:
def reload_module(module_name: str) -> None:
    """Reload a module by name.
    
    Args:
        module_name: Full module path (e.g., 'utils.notebook_utils.dataset_utils')
    """
    if module_name in sys.modules:
        print(f"Reloading {module_name}...")
        importlib.reload(sys.modules[module_name])
    else:
        print(f"Module {module_name} not loaded yet")

def clear_module_cache(module_prefix: str = 'utils') -> None:
    """Clear all cached modules with given prefix.
    
    Args:
        module_prefix: Only clear modules starting with this prefix
    """
    modules_to_clear = [m for m in sys.modules if m.startswith(module_prefix)]
    for m in modules_to_clear:
        del sys.modules[m]
    print(f"Cleared {len(modules_to_clear)} modules from cache")

def reload_all_utils() -> None:
    """Reload all utility modules in the correct order."""
    # Clear existing cache
    clear_module_cache('utils')
    
    # Import and reload in dependency order
    import utils.notebook_utils.dataset_utils
    import utils.notebook_utils.document_utils
    import utils.notebook_utils.importable
    
    reload_module('utils.notebook_utils.dataset_utils')
    reload_module('utils.notebook_utils.document_utils')
    reload_module('utils.notebook_utils.importable')
    
    print("All utility modules reloaded")

## 3. Test Cases

Here we define test cases for our utilities.

In [None]:
def test_dataset_utils():
    """Test dataset utilities functionality."""
    from utils.notebook_utils.dataset_utils import load_labeled_dataset, DATASET_REGISTRY
    
    print("Available datasets in registry:")
    for name, info in DATASET_REGISTRY.items():
        print(f"- {name}: {info['description']}")
    
    # Test dataset loading
    dataset_dir = project_root / "datasets/rag_evaluation/labeled/covid19_origin"
    print(f"\nTesting dataset loading from {dataset_dir}")
    
    try:
        dataset, documents = load_labeled_dataset(dataset_dir, download_if_missing=True)
        print(f"Successfully loaded dataset with {len(dataset.examples)} examples and {len(documents)} documents")
    except Exception as e:
        print(f"Error loading dataset: {str(e)}")

def run_all_tests():
    """Run all test cases."""
    print("Running dataset utils tests...\n")
    test_dataset_utils()
    print("\nAll tests completed")

## 4. Example Workflows

Here are some common development workflows.

### Workflow 1: Update and Test Dataset Utils

Use this workflow when making changes to dataset_utils.py:

In [None]:
# 1. Reload all utilities
reload_all_utils()

# 2. Run tests
test_dataset_utils()

### Workflow 2: Debug Dataset Loading

Use this workflow to debug dataset loading issues:

In [None]:
# 1. Reload modules
reload_all_utils()

# 2. Import with fresh modules
from utils.notebook_utils.dataset_utils import load_labeled_dataset

# 3. Test specific dataset
dataset_dir = project_root / "datasets/rag_evaluation/labeled/covid19_origin"
try:
    dataset, documents = load_labeled_dataset(dataset_dir, download_if_missing=True)
    print(f"Success! Loaded {len(dataset.examples)} examples")
except Exception as e:
    print(f"Error: {str(e)}")
    
    # 4. Debug directory structure
    print("\nChecking directory structure:")
    print(f"Dataset directory exists: {dataset_dir.exists()}")
    if dataset_dir.exists():
        print("Contents:")
        for item in dataset_dir.glob("*"):
            print(f"- {item.name}")