# Model-Agnostic Input Data Preprocessing in CONFLUENCE

## Introduction

This notebook focuses on the model-agnostic preprocessing steps for input data in CONFLUENCE. Model-agnostic preprocessing involves tasks that are common across different hydrological models, such as data acquisition, and initial formatting.

Key steps covered in this notebook include:

1. Running the model agnostic orchestrator
2. Optional lapse rate adjustment on forcing variables

In this preprocessing stage we ensure that our input data is consistent, complete, and properly formatted before we move on to model-specific preprocessing steps. By the end of this notebook, you will have clean, standardized datasets ready for further model-specific processing.

## First we import the libraries and functions we need

In [1]:
import sys
from pathlib import Path
from typing import Dict, Any
import logging
import yaml # type: ignore

current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

from utils.dataHandling_utils.data_utils import DataAcquisitionProcessor # type: ignore

# Set up logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [2]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"EASYMORE_CLIENT: {config['EASYMORE_CLIENT']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

FORCING_DATASET: ERA5
EASYMORE_CLIENT: easymore cli
FORCING_VARIABLES: longitude,latitude,time,LWRadAtm,SWRadAtm,pptrate,airpres,airtemp,spechum,windspd
EXPERIMENT_TIME_START: 2010-01-01 01:00
EXPERIMENT_TIME_START: 2010-01-01 01:00


## Define default paths

Now let's define the paths to data directories before we run the pre processing scripts and create the containing directories

In [3]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directoris
raw_data_dir = project_dir / 'forcing' / 'raw_data'
basin_averaged_data = project_dir / 'forcing' / 'basin_averaged_data'
catchment_intersection_dir = project_dir / 'shapefiles' / 'catchment_intersection'

# Make sure the new directories exists
basin_averaged_data.mkdir(parents = True, exist_ok = True)
catchment_intersection_dir.mkdir(parents = True, exist_ok = True)

## 1. Run Model Agnostic Orchestrator

Now let's run the model agnostic orchestrator to acquire and pre-process our data

In [4]:
# Initialize forcingReampler class
dap = DataAcquisitionProcessor(config, logger)

# Run resampling
dap.run_data_acquisition()

2024-10-20 19:08:57,973 - INFO - MAF configuration JSON saved to: /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/forcing/maf_config.json
2024-10-20 19:08:57,974 - INFO - Starting data acquisition process


model-agnostic.sh: Model-agnostic workflow job submission to SLURM scheduler on DRA HPC.
model-agnostic.sh: details are logged in /home/darri/data/CONFLUENCE_data/installs/MAF/02_model_agnostic_component/model-agnostic.log
model-agnostic.sh: Script for independant :gis:#1 process is executed
model-agnostic.sh: Script for independant :gis:#2 process is executed
model-agnostic.sh: Script for independant :gis:#3 process is executed


2024-10-20 19:28:51,092 - INFO - Model Agnostic Framework completed successfully.
2024-10-20 19:28:51,093 - INFO - Data acquisition process completed


model-agnostic.sh: Script for :met:#1 process is executed for the parent process
model-agnostic.sh: Script for :remap: for the #1 process is executed for the child process with parent ID(s) of (2024-10-20 23:09:07) era5_simplified.sh: processing ECMWF ERA5...
(2024-10-20 23:09:07) era5_simplified.sh: creating output directory under /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/forcing/raw_data
(2024-10-20 23:28:51) era5_simplified.sh: results are produced under /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/forcing/raw_data.
