# TUFLOW Workflow Demo

This notebook demonstrates how to use `ryan-tools` to load, process, and combine TUFLOW results interactively.

In [None]:
import sys
from pathlib import Path
import pandas as pd

# Ensure ryan-tools is in your path if you haven't installed it as a package
# sys.path.append('path/to/ryan-tools')

from ryan_library.functions.tuflow.notebook_helpers import load_tuflow_data
from ryan_library.processors.tuflow.processor_collection import ProcessorCollection

## 1. Define Paths and Data Types

Specify the directories containing your TUFLOW results and the data types you want to load.

In [None]:
# Update this path to point to your data
data_path = Path("E:/Project/Results/TUFLOW")
paths_to_process = [data_path]

# List required data types (suffixes)
# e.g. Q (Flow), V (Velocity), H (Water Level), POMM (Peak of Max/Means)
data_types = ["Q", "H", "V"]

## 2. Load Data

Use the `load_tuflow_data` helper to scan for files and process them in parallel. 
This returns a `ProcessorCollection` object.

In [None]:
collection = load_tuflow_data(
    paths=paths_to_process,
    data_types=data_types,
    parallel=True,       # Set to False if debugging or for small datasets
    log_level="INFO"     # "DEBUG" for more verbose output
)

print(f"Loaded {len(collection.processors)} files.")

## 3. Filter Locations (Optional)

You can filter the loaded data to only include specific culvert or node IDs.

In [None]:
# collection.filter_locations(["Culvert_001", "Culvert_002"])

## 4. Combine Results

Combine the loaded data into a single Pandas DataFrame. 
The method used depends on the data format (Timeseries vs Maximums).

In [None]:
# Helper to combine 1D timeseries data (Q, V, H, etc.)
# This merges static attributes (from EOF/Chan files if loaded) and calculates HW/D
timeseries_df = collection.combine_1d_timeseries()

if not timeseries_df.empty:
    display(timeseries_df.head())
else:
    print("No timeseries data found or combined.")

## 5. Caching & Persistence

For large datasets, re-scanning and loading files can be slow. You can save the processed collection to disk to resume work later.
The generic `save()` and `load()` methods default to using a single HDF5 file, which is fast and convenient.

In [None]:
# Option A: Save as HDF5 (Single file, fast, recommended)
collection.save("processed_data.h5")
print(f"Collection saved to processed_data.h5")

# Option B: Save as directory of Parquet files (Good for debugging)
# collection.save("processed_cache", format="parquet")
# print(f"Collection saved to processed_cache")

In [None]:
# Resume from HDF5
resumed_collection = ProcessorCollection.load("processed_data.h5")
print(f"Resumed {len(resumed_collection.processors)} processors from file.")

# Resume from Directory
# resumed_collection_parquet = ProcessorCollection.load("processed_cache")

## 6. Advanced Analysis: Mean Max Hydrographs

Identify the 'critical' mean simulation for each AEP and plot the hydrographs.
This requires both Maximums (to find the mean) and Timeseries (to plot) data.

In [None]:
from ryan_library.functions.tuflow.notebook_helpers import get_critical_hydrographs, plot_hydrographs

# Ensure we have a clean copy for analysis if we plan to mutate
analysis_collection = collection.copy()

# Identify critical hydrographs (based on Flow 'Q')
critical_flows = get_critical_hydrographs(analysis_collection, metric="Q")

# Plot the results
if critical_flows:
    plot_hydrographs(critical_flows, title="Mean Critical Flow Hydrographs")
else:
    print("No critical hydrographs found (ensure you loaded Timeseries AND Maximums data).")

## 7. Custom Analysis

Now you have a standard Pandas DataFrame to use for any other plotting or analysis.

In [None]:
if not timeseries_df.empty:
    # Example: Plot Max Q by AEP for a specific channel
    # subset = timeseries_df[timeseries_df["Chan ID"] == "Example_Culvert"]
    # subset.plot(x="aep_numeric", y="Q", kind="scatter")
    pass