# Storage Management with idspy

This notebook demonstrates the storage system for organized data persistence and state management. The storage components provide a clean abstraction for handling data persistence across pipeline executions.

## What you'll learn

In this tutorial, you'll discover how to:

1. **Understanding Storage Interface** - The abstract base class and its contract
2. **Use DictStorage** - Simple in-memory storage with dictionary backing
3. **Apply Storage Predicates** - Utility functions for checking storage state
4. **Implement BindedStorage** - Advanced key mapping and translation
5. **Manage Configuration Data** - Storing and retrieving complex settings

## Key Benefits

- **Abstraction**: Clean interface hiding storage implementation details
- **Flexibility**: Swap storage backends without changing pipeline code  
- **State Management**: Persist data across pipeline executions
- **Key Mapping**: Translate between internal and external key names
- **Validation**: Built-in predicates for checking storage state

---

Let's start by setting up our environment and explore the storage system components.

In [8]:
import sys
import os

# Add the project root to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [9]:
from src.idspy.core.storage.base import Storage, has_key, has_keys, lacks_key, lacks_keys
from src.idspy.core.storage.dict import DictStorage
from src.idspy.core.storage.proxy import BindedStorage

# Sample configuration data for examples
sample_config = {
    "model_name": "neural_network_v1",
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100,
    "optimizer": "adam",
    "loss_function": "cross_entropy"
}

# Sample metrics data
sample_metrics = {
    "accuracy": 0.95,
    "precision": 0.92,
    "recall": 0.89,
    "f1_score": 0.905
}

## Storage Interface Overview

The **Storage** abstract base class defines a consistent interface for all storage implementations. It provides CRUD operations and supports both individual keys and batch operations.

In [10]:
# Examine the Storage abstract base class methods
print("Storage Interface Methods:")
print("=" * 40)

abstract_methods = []
for method_name in dir(Storage):
    if not method_name.startswith('_'):
        method = getattr(Storage, method_name)
        if hasattr(method, '__isabstractmethod__') and method.__isabstractmethod__:
            abstract_methods.append(method_name)

for method_name in abstract_methods:
    print(f"  • {method_name}() - Abstract method that must be implemented")

Storage Interface Methods:
  • as_dict() - Abstract method that must be implemented
  • clear() - Abstract method that must be implemented
  • delete() - Abstract method that must be implemented
  • get() - Abstract method that must be implemented
  • has() - Abstract method that must be implemented
  • set() - Abstract method that must be implemented


## DictStorage - Simple In-Memory Storage

**DictStorage** is the simplest storage implementation, using a Python dictionary as the backing store. It's perfect for development, testing, and single-session workflows.

In [11]:
# Create a DictStorage instance and demonstrate basic operations
storage = DictStorage(sample_config)

### Basic CRUD Operations

Demonstrate Create, Read, Update, Delete operations.

In [12]:
# GET: Retrieve specific keys
retrieved = storage.get(["model_name", "learning_rate", "nonexistent_key"])

print("Retrieved values:")
for key, value in retrieved.items():
    print(f"  {key}: {value}")

print(f"\nNote: 'nonexistent_key' is not returned since it doesn't exist")

Retrieved values:
  model_name: neural_network_v1
  learning_rate: 0.001

Note: 'nonexistent_key' is not returned since it doesn't exist


In [13]:
# HAS: Check key existence
keys_to_check = ["model_name", "learning_rate", "nonexistent_key"]

print("Key existence checks:")
for key in keys_to_check:
    exists = storage.has(key)
    print(f"  '{key}': {exists}")

Key existence checks:
  'model_name': True
  'learning_rate': True
  'nonexistent_key': False


In [14]:
# UPDATE: Modify existing values and add new ones
updates = {
    "learning_rate": 0.0005,  # Update existing
    "validation_split": 0.2,  # Add new
    "dropout_rate": 0.3       # Add new
}

storage.set(updates)

print("After updates:")
updated_config = storage.as_dict()
for key, value in updated_config.items():
    marker = " (updated)" if key in updates else ""
    print(f"  {key}: {value}{marker}")

After updates:
  model_name: neural_network_v1
  learning_rate: 0.0005 (updated)
  batch_size: 32
  epochs: 100
  optimizer: adam
  loss_function: cross_entropy
  validation_split: 0.2 (updated)
  dropout_rate: 0.3 (updated)


In [15]:
# DELETE: Remove specific key
print(f"Before deletion - has 'dropout_rate': {storage.has('dropout_rate')}")

storage.delete('dropout_rate')

print(f"After deletion - has 'dropout_rate': {storage.has('dropout_rate')}")
print(f"Current keys: {list(storage.as_dict().keys())}")

Before deletion - has 'dropout_rate': True
After deletion - has 'dropout_rate': False
Current keys: ['model_name', 'learning_rate', 'batch_size', 'epochs', 'optimizer', 'loss_function', 'validation_split']


### Storage Predicates

Use predicate functions to check storage state conditions.

In [16]:
# Test various storage predicates
required_keys = ["model_name", "learning_rate", "batch_size"]
optional_keys = ["validation_split", "early_stopping"]

print("Storage predicate tests:")
print(f"  has_key('model_name'): {has_key('model_name')(storage)}")
print(f"  has_keys({required_keys}): {has_keys(required_keys)(storage)}")
print(f"  lacks_key('early_stopping'): {lacks_key('early_stopping')(storage)}")
print(f"  lacks_keys({optional_keys}): {lacks_keys(optional_keys)(storage)}")

print(f"\nCurrent storage keys: {list(storage.as_dict().keys())}")

Storage predicate tests:
  has_key('model_name'): True
  has_keys(['model_name', 'learning_rate', 'batch_size']): True
  lacks_key('early_stopping'): True
  lacks_keys(['validation_split', 'early_stopping']): False

Current storage keys: ['model_name', 'learning_rate', 'batch_size', 'epochs', 'optimizer', 'loss_function', 'validation_split']


### Clear Storage

Demonstrate clearing all stored data.

In [17]:
print(f"Before clear: {len(storage.as_dict())} items")

storage.clear()

print(f"After clear: {len(storage.as_dict())} items")
print(f"Storage contents: {storage.as_dict()}")

Before clear: 7 items
After clear: 0 items
Storage contents: {}


## BindedStorage with Key Mapping

Advanced storage wrapper that translates between external and internal keys.

In [18]:
# Create a base storage with internal naming convention
internal_storage = DictStorage({
    "mdl_nm": "neural_network_v2",
    "lr": 0.002,
    "bs": 64,
    "ep": 150,
    "opt": "sgd"
})

print("Internal storage (abbreviated keys):")
for key, value in internal_storage.as_dict().items():
    print(f"  {key}: {value}")

Internal storage (abbreviated keys):
  mdl_nm: neural_network_v2
  lr: 0.002
  bs: 64
  ep: 150
  opt: sgd


### Key Mapping Configuration

Define mappings between user-friendly external keys and internal abbreviated keys.

In [19]:
# Define key mappings (external -> internal)
key_mappings = {
    "model_name": "mdl_nm",
    "learning_rate": "lr",
    "batch_size": "bs",
    "epochs": "ep",
    "optimizer": "opt"
}

# Create binded storage with key translation
binded_storage = BindedStorage(internal_storage, key_mappings, strict=False)

print("Key mappings (external -> internal):")
for ext, internal in key_mappings.items():
    print(f"  '{ext}' -> '{internal}'")

Key mappings (external -> internal):
  'model_name' -> 'mdl_nm'
  'learning_rate' -> 'lr'
  'batch_size' -> 'bs'
  'epochs' -> 'ep'
  'optimizer' -> 'opt'


### External Key Interface

Access internal storage using user-friendly external keys.

In [20]:
# Access data using external (user-friendly) keys
print("Accessing via external keys:")
external_data = binded_storage.as_dict()
for key, value in external_data.items():
    print(f"  {key}: {value}")

print(f"\nCompare with internal storage keys: {list(internal_storage.as_dict().keys())}")

Accessing via external keys:
  model_name: neural_network_v2
  learning_rate: 0.002
  batch_size: 64
  epochs: 150
  optimizer: sgd

Compare with internal storage keys: ['mdl_nm', 'lr', 'bs', 'ep', 'opt']


In [21]:
# GET operations using external keys
config_subset = binded_storage.get(["model_name", "learning_rate", "epochs"])

print("Retrieved config subset:")
for key, value in config_subset.items():
    print(f"  {key}: {value}")

Retrieved config subset:
  model_name: neural_network_v2
  learning_rate: 0.002
  epochs: 150


In [22]:
# SET operations using external keys
new_config = {
    "learning_rate": 0.001,     # Update existing
    "batch_size": 128,          # Update existing
    "momentum": 0.9             # Add new (unmapped key)
}

binded_storage.set(new_config)

print("After setting new config:")
print("External view:")
for key, value in binded_storage.as_dict().items():
    print(f"  {key}: {value}")

print("\nInternal storage view:")
for key, value in internal_storage.as_dict().items():
    print(f"  {key}: {value}")

After setting new config:
External view:
  model_name: neural_network_v2
  learning_rate: 0.001
  batch_size: 128
  epochs: 150
  optimizer: sgd
  momentum: 0.9

Internal storage view:
  mdl_nm: neural_network_v2
  lr: 0.001
  bs: 128
  ep: 150
  opt: sgd
  momentum: 0.9


### Key Existence and Deletion

Test key existence checks and deletion operations through the binding layer.

In [23]:
# Test key existence with external keys
test_keys = ["model_name", "learning_rate", "nonexistent", "momentum"]

print("Key existence tests:")
for key in test_keys:
    exists = binded_storage.has(key)
    print(f"  '{key}': {exists}")

Key existence tests:
  'model_name': True
  'learning_rate': True
  'nonexistent': False
  'momentum': True


In [24]:
# Delete using external key
print(f"Before deletion - has 'epochs': {binded_storage.has('epochs')}")
print(f"Internal storage has 'ep': {internal_storage.has('ep')}")

binded_storage.delete('epochs')

print(f"\nAfter deletion - has 'epochs': {binded_storage.has('epochs')}")
print(f"Internal storage has 'ep': {internal_storage.has('ep')}")

print(f"\nRemaining keys (external): {list(binded_storage.as_dict().keys())}")
print(f"Remaining keys (internal): {list(internal_storage.as_dict().keys())}")

Before deletion - has 'epochs': True
Internal storage has 'ep': True

After deletion - has 'epochs': False
Internal storage has 'ep': False

Remaining keys (external): ['model_name', 'learning_rate', 'batch_size', 'optimizer', 'momentum']
Remaining keys (internal): ['mdl_nm', 'lr', 'bs', 'opt', 'momentum']


## Strict Mode Key Binding

Demonstrate strict key binding that only allows predefined key mappings.

In [25]:
# Create strict binded storage
strict_storage = DictStorage({"lr": 0.01, "bs": 32})
strict_mappings = {"learning_rate": "lr", "batch_size": "bs"}

strict_binded = BindedStorage(strict_storage, strict_mappings, strict=True)

print("Strict mode storage created")
print(f"Available external keys: {list(strict_mappings.keys())}")

# This works - using mapped key
print(f"\nAccessing 'learning_rate': {strict_binded.get(['learning_rate'])}")

Strict mode storage created
Available external keys: ['learning_rate', 'batch_size']

Accessing 'learning_rate': {'learning_rate': 0.01}


In [26]:
# This will raise an error in strict mode - using unmapped key
try:
    strict_binded.set({"epochs": 100})  # 'epochs' not in mapping
    print("Successfully set unmapped key")
except KeyError as e:
    print(f"Strict mode error: {e}")

# This works - using mapped key
try:
    strict_binded.set({"learning_rate": 0.005})
    print("Successfully set mapped key 'learning_rate'")
    print(f"Updated value: {strict_binded.get(['learning_rate'])}")
except KeyError as e:
    print(f"Unexpected error: {e}")

Strict mode error: "Unmapped external key: 'epochs'"
Successfully set mapped key 'learning_rate'
Updated value: {'learning_rate': 0.005}


## Key Takeaways

1. **Storage Abstraction**: Clean interface that hides implementation details
2. **DictStorage**: Simple in-memory storage perfect for development and testing
3. **Storage Predicates**: Utility functions for checking storage state and validation
4. **BindedStorage**: Advanced wrapper providing key mapping and translation
5. **Batch Operations**: Efficient handling of multiple keys simultaneously
