# Blueprint Catalog API

The `catalog` module provides API-driven access to blueprint information stored in the blueprints directory. This allows you to discover, query, and load blueprint data for instantiating `OcnModel` objects.


## Overview

The `BlueprintCatalog` class provides methods to:

- Discover blueprint files in the blueprints directory
- Load individual blueprint YAML files
- Extract grid parameters from grid YAML files
- Load all blueprints into a pandas DataFrame with all data needed to instantiate `OcnModel` objects


## Basic Usage

The module provides a convenience instance `blueprint` that you can use directly:


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import catalog

### Finding Blueprint Files

You can find all blueprint files in the blueprints directory:


In [3]:
# Find all blueprint files
blueprint_files = catalog.blueprint.find_blueprint_files()
print(f"Found {len(blueprint_files)} blueprint files:")
for bp_file in blueprint_files:
    print(f"  - {bp_file.name}")


Found 4 blueprint files:
  - blueprint_roms-marbl-ccs-12km.yml
  - blueprint_roms-marbl-gulf-guinea-toy.yml
  - blueprint_roms-marbl-hvalfjörður-0.yml
  - blueprint_roms-marbl-wio-toy.yml


### Loading a Single Blueprint

You can load and inspect a single blueprint file:


In [4]:
# Load a single blueprint
if blueprint_files:
    bp_data = catalog.blueprint.load_blueprint(blueprint_files[0])
    print(f"Grid name: {bp_data.get('grid_name')}")
    print(f"Model name: {bp_data.get('model_spec', {}).get('name')}")
    print(f"Start time: {bp_data.get('start_time')}")
    print(f"End time: {bp_data.get('end_time')}")
    print(f"Processors: {bp_data.get('np_xi')} x {bp_data.get('np_eta')}")


Grid name: ccs-12km
Model name: roms-marbl
Start time: 2024-01-01T00:00:00
End time: 2024-01-02T00:00:00
Processors: 16 x 20


### Loading Grid Parameters

You can extract grid keyword arguments from a grid YAML file:


In [6]:
# Load grid kwargs from a blueprint
if blueprint_files:
    bp_data = catalog.blueprint.load_blueprint(blueprint_files[0])

    # Get grid YAML path
    grid_yaml_path = None
    if "inputs" in bp_data and "grid" in bp_data["inputs"]:
      
        grid_input = bp_data["inputs"]["grid"]
        if "yaml_file" in grid_input:
            from pathlib import Path
            grid_yaml_path = Path(grid_input["yaml_file"])

            # Load grid kwargs
            grid_kwargs = catalog.blueprint.load_grid_kwargs(grid_yaml_path)
            print("Grid parameters:")                
            for key, value in grid_kwargs.items():
                print(f"  {key}: {value}")


Grid parameters:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0


## Loading All Blueprints into a DataFrame

The main feature is the `load()` method, which returns a pandas DataFrame with all data necessary to instantiate `OcnModel` objects:


In [9]:
# Load all blueprints into a DataFrame
df = catalog.blueprint.load()

print(f"Loaded {len(df)} blueprints")
print(f"\nDataFrame columns: {list(df.columns)}")
print(f"\nDataFrame shape: {df.shape}")

df

Loaded 4 blueprints

DataFrame columns: ['model_name', 'grid_name', 'grid_kwargs', 'boundaries', 'start_time', 'end_time', 'np_eta', 'np_xi', 'blueprint_path', 'grid_yaml_path', 'input_data_dir']

DataFrame shape: (4, 11)


Unnamed: 0,model_name,grid_name,grid_kwargs,boundaries,start_time,end_time,np_eta,np_xi,blueprint_path,grid_yaml_path,input_data_dir
0,roms-marbl,ccs-12km,"{'nx': 224, 'ny': 440, 'size_x': 2688, 'size_y...","{'east': True, 'north': True, 'south': True, '...",2024-01-01T00:00:00,2024-01-02T00:00:00,20,16,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
1,roms-marbl,gulf-guinea-toy,"{'nx': 10, 'ny': 10, 'size_x': 4000, 'size_y':...","{'east': True, 'north': True, 'south': True, '...",2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
2,roms-marbl,hvalfjörður-0,"{'nx': 512, 'ny': 512, 'size_x': 1280, 'size_y...","{'east': True, 'north': True, 'south': True, '...",2024-01-01T00:00:00,2024-01-02T00:00:00,16,16,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
3,roms-marbl,wio-toy,"{'nx': 20, 'ny': 20, 'size_x': 8000, 'size_y':...","{'east': True, 'north': True, 'south': True, '...",2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...


### Inspecting the DataFrame

Let's look at the structure of the DataFrame:


In [10]:
# Display basic information about the DataFrame
if not df.empty:
    print("DataFrame info:")
    print(df.info())
    
    print("\nFirst few rows:")
    # Display non-dict columns for readability
    display_cols = [col for col in df.columns if col not in ['grid_kwargs', 'boundaries']]
    display(df[display_cols].head())


DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   model_name      4 non-null      object
 1   grid_name       4 non-null      object
 2   grid_kwargs     4 non-null      object
 3   boundaries      4 non-null      object
 4   start_time      4 non-null      object
 5   end_time        4 non-null      object
 6   np_eta          4 non-null      int64 
 7   np_xi           4 non-null      int64 
 8   blueprint_path  4 non-null      object
 9   grid_yaml_path  4 non-null      object
 10  input_data_dir  4 non-null      object
dtypes: int64(2), object(9)
memory usage: 484.0+ bytes
None

First few rows:


Unnamed: 0,model_name,grid_name,start_time,end_time,np_eta,np_xi,blueprint_path,grid_yaml_path,input_data_dir
0,roms-marbl,ccs-12km,2024-01-01T00:00:00,2024-01-02T00:00:00,20,16,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
1,roms-marbl,gulf-guinea-toy,2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
2,roms-marbl,hvalfjörður-0,2024-01-01T00:00:00,2024-01-02T00:00:00,16,16,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...
3,roms-marbl,wio-toy,2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/codes/cson-forge/workflows/bluep...,/Users/mclong/cson-forge-data/input-data/roms-...


### Viewing Grid Parameters

The `grid_kwargs` column contains dictionaries with grid parameters:


In [11]:
# Display grid kwargs for the first blueprint
if not df.empty:
    first_row = df.iloc[0]
    print(f"Grid kwargs for {first_row['grid_name']}:")
    grid_kwargs = first_row['grid_kwargs']
    if isinstance(grid_kwargs, dict):
        for key, value in grid_kwargs.items():
            print(f"  {key}: {value}")


Grid kwargs for ccs-12km:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0


### Querying the DataFrame

You can query the DataFrame to find specific blueprints:


In [12]:
# Query by model name
if not df.empty:
    model_name = df['model_name'].iloc[0] if 'model_name' in df.columns else None
    if model_name:
        model_blueprints = df[df['model_name'] == model_name]
        print(f"Found {len(model_blueprints)} blueprints for model '{model_name}':")
        print(model_blueprints[['grid_name', 'start_time', 'end_time']].to_string())

# Query by grid name
if not df.empty and 'grid_name' in df.columns:
    grid_name = df['grid_name'].iloc[0]
    grid_blueprints = df[df['grid_name'] == grid_name]
    print(f"\nFound {len(grid_blueprints)} blueprints for grid '{grid_name}':")
    print(grid_blueprints[['model_name', 'start_time', 'end_time']].to_string())


Found 4 blueprints for model 'roms-marbl':
         grid_name           start_time             end_time
0         ccs-12km  2024-01-01T00:00:00  2024-01-02T00:00:00
1  gulf-guinea-toy  2012-01-01T00:00:00  2012-01-02T00:00:00
2    hvalfjörður-0  2024-01-01T00:00:00  2024-01-02T00:00:00
3          wio-toy  2012-01-01T00:00:00  2012-01-02T00:00:00

Found 1 blueprints for grid 'ccs-12km':
   model_name           start_time             end_time
0  roms-marbl  2024-01-01T00:00:00  2024-01-02T00:00:00


## Instantiating OcnModel from DataFrame

The DataFrame contains all the data needed to instantiate `OcnModel` objects. Here's how to use it:


In [15]:
from cson_forge import OcnModel
from datetime import datetime

# Example: Instantiate OcnModel from the first row
if not df.empty:
    row = df.iloc[0]
    
    # TODO: 
    # - extract data from the row and instantiate an OcnModel object 
    # - requires a cson_forge.OcnModel._from_blueprint() method that skips calls to ROMSInputs methods if 
    #   the files already exist in the input_data_dir
    

## Summary

The `catalog.blueprint` module provides:

1. **Discovery**: Find all blueprint files in the blueprints directory
2. **Loading**: Load individual blueprints or all blueprints at once
3. **Data Extraction**: Extract grid parameters and other configuration data
4. **DataFrame Interface**: Get all blueprint data in a pandas DataFrame for easy querying
5. **OcnModel Integration**: All data needed to instantiate `OcnModel` objects is included

This makes it easy to:
- Query existing blueprints
- Compare configurations across different domains
- Programmatically instantiate models from stored blueprints
- Build analysis workflows that work with multiple domains
