# Step 3: Ingest Data into the SwimContainer

This notebook demonstrates how to ingest extracted data into the SwimContainer. The container provides a unified `ingest` API for all data sources:

- `container.ingest.ndvi()` - NDVI from Earth Engine exports
- `container.ingest.etf()` - ETf from SSEBop or other models
- `container.ingest.gridmet()` - Meteorology
- `container.ingest.snodas()` - Snow water equivalent
- `container.ingest.properties()` - Soils, LULC, irrigation

## Two Data Sources

1. **Extracted data**: If you ran notebook 02, data is in `data/remote_sensing/landsat/extracts/`, `data/snow/snodas/extracts/`, etc.
2. **Pre-built data**: If you don't have Earth Engine access, use data from `data/prebuilt/`

Set `USE_PREBUILT = True` or `False` below to choose your data source.

In [5]:
import os
import sys
from pathlib import Path

root = os.path.abspath('../..')
sys.path.append(root)

from swimrs.container import open_container
from swimrs.swim.config import ProjectConfig

## 1. Configuration

Choose whether to use pre-built data or extracted data.

In [6]:
# Set to True to use pre-built data, False to use data from notebook 02
USE_PREBUILT = False

project_dir = Path.cwd()
data_dir = project_dir / 'data'
container_path = data_dir / '1_Boulder.swim'

# Load config to get feature_id
config_path = project_dir / '1_Boulder.toml'
config = ProjectConfig()
config.read_config(str(config_path))
feature_id = config.feature_id_col

print(f"Using feature_id column: {feature_id}")

if USE_PREBUILT:
    print("Using pre-built data from data/prebuilt/")
    ndvi_root = data_dir / 'prebuilt' / 'remote_sensing' / 'landsat' / 'extracts' / 'ndvi'
    etf_root = data_dir / 'prebuilt' / 'remote_sensing' / 'landsat' / 'extracts' / 'ssebop_etf'
    met_dir = data_dir / 'prebuilt' / 'met_timeseries' / 'gridmet'
    snodas_dir = data_dir / 'prebuilt' / 'snow' / 'snodas' / 'extracts'
    properties_dir = data_dir / 'prebuilt' / 'properties'
else:
    print("Using extracted data from notebook 02")
    ndvi_root = data_dir / 'remote_sensing' / 'landsat' / 'extracts' / 'ndvi'
    etf_root = data_dir / 'remote_sensing' / 'landsat' / 'extracts' / 'ssebop_etf'
    met_dir = data_dir / 'met_timeseries' / 'gridmet'
    snodas_dir = data_dir / 'snow' / 'snodas' / 'extracts'
    properties_dir = data_dir / 'properties'

Using feature_id column: FID_1
Using extracted data from notebook 02


## 2. Open the Container

Open the container we created in notebook 01 in read-write mode.

In [7]:
container = open_container(str(container_path), mode='r+')

print(f"Opened container: {container.project_name}")
print(f"Fields: {container.n_fields}")
print(f"Date range: {container.start_date} to {container.end_date}")

RuntimeError: Could not acquire lock for container: /home/dgketchum/code/swim-rs/examples/1_Boulder/data/1_Boulder.swim
Another process may have it open, or a previous process crashed.
If you're sure no other process is using this container, delete the lock file:
  /home/dgketchum/code/swim-rs/examples/1_Boulder/data/1_Boulder.swim.lock

## 3. Check Current Status

Before ingestion, let's see what data the container currently holds.

In [None]:
print(container.query.status())

## 4. Ingest NDVI Data

Ingest NDVI for both irrigated (`irr`) and non-irrigated (`inv_irr`) masks.

In [19]:
for mask in ['irr', 'inv_irr']:
    ndvi_dir = ndvi_root / mask
    if ndvi_dir.exists():
        print(f"Ingesting NDVI ({mask})...")
        container.ingest.ndvi(
            source_dir=str(ndvi_dir),
            uid_column=feature_id,
            instrument='landsat',
            mask=mask,
            overwrite=True
        )
    else:
        print(f"Warning: NDVI directory not found: {ndvi_dir}")

{"event": "no_data_found", "component": "ingestor", "timestamp": "2026-01-12T20:39:31.652617Z", "source": "/home/dgketchum/code/swim-rs/examples/1_Boulder/data/remote_sensing/landsat/extracts/ndvi/irr"}
{"event": "no_data_found", "component": "ingestor", "timestamp": "2026-01-12T20:39:31.656829Z", "source": "/home/dgketchum/code/swim-rs/examples/1_Boulder/data/remote_sensing/landsat/extracts/ndvi/inv_irr"}


Ingesting NDVI (irr)...
Ingesting NDVI (inv_irr)...


## 5. Ingest ETf Data

Ingest SSEBop ETf for both masks.

In [20]:
for mask in ['irr', 'inv_irr']:
    etf_dir = etf_root / mask
    if etf_dir.exists():
        print(f"Ingesting ETf ({mask})...")
        container.ingest.etf(
            source_dir=str(etf_dir),
            uid_column=feature_id,
            instrument='landsat',
            model='ssebop',
            mask=mask,
            overwrite=True
        )
    else:
        print(f"Warning: ETf directory not found: {etf_dir}")

Ingesting ETf (irr)...


{"event": "operation_failed", "component": "ingestor", "timestamp": "2026-01-12T20:39:35.979711Z", "operation": "ingest_etf", "target": "remote_sensing/etf/landsat/ssebop/irr", "source": "/home/dgketchum/code/swim-rs/examples/1_Boulder/data/remote_sensing/landsat/extracts/ssebop_etf/irr", "model": "ssebop", "instrument": "landsat", "mask": "irr", "duration_seconds": 0.56, "error": "ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", "error_type": "TypeError"}


TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

## 6. Ingest Meteorology Data

Ingest GridMET meteorology including bias-corrected reference ET.

In [None]:
if met_dir.exists():
    print("Ingesting GridMET meteorology...")
    container.ingest.gridmet(
        source_dir=str(met_dir),
        variables=['eto', 'etr', 'prcp', 'tmin', 'tmax', 'srad', 'u2', 'ea'],
        include_corrected=True,
        overwrite=True
    )
else:
    print(f"Warning: GridMET directory not found: {met_dir}")

## 7. Ingest Snow Data (SNODAS)

Ingest SNODAS snow water equivalent.

In [None]:
if snodas_dir.exists():
    print("Ingesting SNODAS SWE...")
    container.ingest.snodas(
        source_dir=str(snodas_dir),
        uid_column=feature_id,
        overwrite=True
    )
else:
    print(f"Warning: SNODAS directory not found: {snodas_dir}")

## 8. Ingest Properties

Ingest static properties: soils, land cover, and irrigation fractions.

In [None]:
soils_csv = properties_dir / '1_Boulder_ssurgo.csv'
lulc_csv = properties_dir / '1_Boulder_landcover.csv'
irr_csv = properties_dir / '1_Boulder_irr.csv'

# Check which files exist
props_exist = {
    'soils': soils_csv.exists(),
    'lulc': lulc_csv.exists(),
    'irrigation': irr_csv.exists()
}

if any(props_exist.values()):
    print("Ingesting properties...")
    container.ingest.properties(
        soils_csv=str(soils_csv) if props_exist['soils'] else None,
        lulc_csv=str(lulc_csv) if props_exist['lulc'] else None,
        irr_csv=str(irr_csv) if props_exist['irrigation'] else None,
        uid_column=feature_id,
        overwrite=True
    )
    print(f"  Soils: {'OK' if props_exist['soils'] else 'not found'}")
    print(f"  Land cover: {'OK' if props_exist['lulc'] else 'not found'}")
    print(f"  Irrigation: {'OK' if props_exist['irrigation'] else 'not found'}")
else:
    print(f"Warning: No property files found in {properties_dir}")

## 9. Check Container Status After Ingestion

In [None]:
print(container.query.status(detailed=True))

## 10. Explore Ingested Data with xarray

One of the powerful features of the SwimContainer is seamless xarray integration. Let's visualize some of the ingested data.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Get NDVI as an xarray DataArray
try:
    ndvi = container.to_xarray('remote_sensing/ndvi/landsat/irr')
    print(f"NDVI shape: {ndvi.shape}")
    print(f"Dimensions: {ndvi.dims}")
    print(f"Sites: {list(ndvi.site.values[:5])}...")
except Exception as e:
    print(f"Could not load NDVI: {e}")

In [None]:
# Plot NDVI time series for a single field
try:
    sample_site = container.field_uids[0]
    ndvi_site = ndvi.sel(site=sample_site)
    
    fig, ax = plt.subplots(figsize=(14, 4))
    ndvi_site.plot(ax=ax, marker='.', linestyle='none', markersize=2)
    ax.set_title(f'NDVI Time Series - Field {sample_site}')
    ax.set_ylabel('NDVI')
    plt.tight_layout()
    plt.show()
except Exception as e:
    print(f"Could not plot NDVI: {e}")

In [None]:
# Plot ETf vs ETo for a single year
try:
    etf = container.to_xarray('remote_sensing/etf/landsat/ssebop/irr')
    eto = container.to_xarray('meteorology/gridmet/eto')
    
    # Select one site and one year
    sample_site = container.field_uids[0]
    etf_2020 = etf.sel(site=sample_site, time='2020')
    eto_2020 = eto.sel(site=sample_site, time='2020')
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 6), sharex=True)
    
    axes[0].plot(eto_2020.time, eto_2020, 'b-', alpha=0.7, label='ETo')
    axes[0].set_ylabel('ETo (mm/day)')
    axes[0].legend()
    
    axes[1].plot(etf_2020.time, etf_2020, 'go', markersize=3, label='ETf')
    axes[1].set_ylabel('ETf (fraction)')
    axes[1].set_xlabel('Date')
    axes[1].legend()
    
    fig.suptitle(f'Reference ET and ETf - Field {sample_site} (2020)')
    plt.tight_layout()
    plt.show()
except Exception as e:
    print(f"Could not plot ETf/ETo: {e}")

## 11. View Provenance

The container automatically tracks all operations for reproducibility.

In [None]:
print("Provenance Log:")
for event in container.provenance.events[-10:]:
    print(f"  {event.timestamp[:19]} - {event.operation}: {event.target or 'container'}")

## 12. Save and Close

In [None]:
container.save()
container.close()

print(f"Container saved to: {container_path}")
print("\nNext: Run notebook 04 to compute dynamics and export model inputs")