# Model-Agnostic Input Data Preprocessing in CONFLUENCE

## Introduction

This notebook focuses on the model-agnostic preprocessing steps for input data in CONFLUENCE. Model-agnostic preprocessing involves tasks that are common across different hydrological models, such as data acquisition, and initial formatting.

Key steps covered in this notebook include:

1. Running the model agnostic orchestrator
2. Optional lapse rate adjustment on forcing variables

In this preprocessing stage we ensure that our input data is consistent, complete, and properly formatted before we move on to model-specific preprocessing steps. By the end of this notebook, you will have clean, standardized datasets ready for further model-specific processing.

## First we import the libraries and functions we need

In [1]:
import sys
from pathlib import Path
from typing import Dict, Any
import logging
import yaml # type: ignore

current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

from utils.dataHandling_utils.data_utils import DataAcquisitionProcessor # type: ignore

# Set up logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [2]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"EASYMORE_CLIENT: {config['EASYMORE_CLIENT']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

FORCING_DATASET: ERA5
EASYMORE_CLIENT: easymore cli
FORCING_VARIABLES: longitude,latitude,time,LWRadAtm,SWRadAtm,pptrate,airpres,airtemp,spechum,windspd
EXPERIMENT_TIME_START: 2010-01-01 01:00
EXPERIMENT_TIME_START: 2010-01-01 01:00


## Define default paths

Now let's define the paths to data directories before we run the pre processing scripts and create the containing directories

In [3]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directoris
raw_data_dir = project_dir / 'forcing' / 'raw_data'
basin_averaged_data = project_dir / 'forcing' / 'basin_averaged_data'
catchment_intersection_dir = project_dir / 'shapefiles' / 'catchment_intersection'

# Make sure the new directories exists
basin_averaged_data.mkdir(parents = True, exist_ok = True)
catchment_intersection_dir.mkdir(parents = True, exist_ok = True)

## 1. Run Model Agnostic Orchestrator

Now let's run the model agnostic orchestrator to acquire and pre-process our data

In [None]:
# Initialize forcingReampler class
fr = forcingResampler(config, logger)

# Run resampling
fr.run_resampling()

## 2. Pre process geospatial data

Now let's calculate the zonal statistics of the geospatial attributes we need for our model

In [3]:
# Set up
# Initialize geospatialStatistics class
gs = geospatialStatistics(config, logger)

# Run resampling
gs.run_statistics()

2024-10-20 13:47:17,311 - INFO - Calculating soil statistics
2024-10-20 13:47:18,239 - INFO - Created 136 records
2024-10-20 13:47:18,245 - INFO - Soil statistics saved to /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/shapefiles/catchment_intersection/with_soilgrids/catchment_with_soilclass.shp
2024-10-20 13:47:18,246 - INFO - Calculating land statistics
2024-10-20 13:47:19,034 - INFO - Created 136 records
2024-10-20 13:47:19,045 - INFO - Land statistics saved to /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/shapefiles/catchment_intersection/with_landclass/catchment_with_landclass.shp
2024-10-20 13:47:19,047 - INFO - Calculating elevation statistics
2024-10-20 13:47:19,911 - INFO - Updating existing 'elev_mean' column
2024-10-20 13:47:19,966 - INFO - Created 136 records
2024-10-20 13:47:19,973 - INFO - Elevation statistics saved to /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/shapefiles/catchment_intersection/with_dem/catchment_with_dem.shp
2024-10-20 13:47:19,976