# Acquiring model forcing data for the modelling domain

To run our hydrological models we need meteorological forcing data for our domain. These data include:

1. Precipitation
2. Temperature
3. Wind speed
4. Humidity
5. Solar radiation
6. Long wave radiation
7. Atmospheric pressure

## Methods of acquiring geospatial data
There are several ways of acquiring geospatial data for our domain in CONFLUENCE, depending on the resources we have access to:

1. Subsetting from full domain datasets stored on HPC. If you have access to appropriate HPC infrastructure we can use the datatool (https://github.com/CH-Earth/datatool)
2. Download data directly from provider
3. User supplied data. If you want to use your own forcing data, e.g. with datasets not currently integrated in CONFLUENCE these can be defined in the CONFLUENCE configuration file

In this notebook we will cover using methods 1 and 2 for aqcuiring the pertinent meteorological data for our models

## First we import the libraries and functions we need

In [1]:
import sys
from pathlib import Path
import yaml # type: ignore
import logging

# Add the parent directory to sys.path
current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

# Import required CONFLUENCE utility functions
from utils.dataHandling_utils.data_acquisition_utils import datatoolRunner # type: ignore

# Print if successfull
print("All modules imported successfully")

All modules imported successfully


## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [2]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"BOUNDING_BOX_COORDS: {config['BOUNDING_BOX_COORDS']}")
    print(f"DATATOOL_DATASET_ROOT: {config['GISTOOL_DATASET_ROOT']}")
    print(f"TOOL_ACCOUNT: {config['TOOL_ACCOUNT']}")
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

BOUNDING_BOX_COORDS: 51.76/-116.55/50.95/-115.5
GISTOOL_DATASET_ROOT: /project/rrg-mclark/data/geospatial-data/
TOOL_ACCOUNT: def-mclark-ab


## Define default paths

Now let's define the paths to the attribute data before we run the acquisition scripts and create the containing directories

In [3]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directory
raw_data_dir = project_dir / 'forcing' / 'raw_data'

# Make sure the directory exists
raw_data_dir.mkdir(parents = True, exist_ok = True)


# 1. Running datatool
Now that we have our configuration loaded, let's run the datatool to get data we need. This process involves initializing the gistoolRunner with the appropriate settings for each of the datasets we want to extract.

In [4]:
# Set up 
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Initialize datatoolRunner class
dr = datatoolRunner(config, logger)

# Get lat and lon lims
bbox = config['BOUNDING_BOX_COORDS'].split('/')
latlims = f"{bbox[2]},{bbox[0]}"
lonlims = f"{bbox[1]},{bbox[3]}"

# Create the gistool command
datatool_command = dr.create_datatool_command(dataset = config['FORCING_DATASET'], output_dir = raw_data_dir, lat_lims = latlims, lon_lims = lonlims, variables = config['FORCING_VARIABLES'], start_date = config['EXPERIMENT_TIME_START'], end_date = config['EXPERIMENT_TIME_END'])
print(datatool_command)
dr.execute_datatool_command(datatool_command)

['/installs/gistool/extract-dataset.sh', '--dataset=RDRS', '--dataset-dir=/project/rrg-mclark/data/meteorological-data/RDRS', '--output-dir=/Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_Bow_Dev/forcing/raw_data', '--start-date=2010-01-01 01:00', '--end-date=2018-12-31 23:00', '--lat-lims=50.95,51.76', '--lon-lims=-116.55,-115.5', "--variable=['RDRS_v2.1_P_P0_SFC', 'RDRS_v2.1_P_HU_09944', 'RDRS_v2.1_P_TT_09944', 'RDRS_v2.1_P_UVC_09944', 'RDRS_v2.1_A_PR0_SFC', 'RDRS_v2.1_P_FB_SFC', 'RDRS_v2.1_P_FI_SFC']", '--prefix=domain_Bow_Dev', '--submit-job', '--cache=/home/darri/cache/', '--account=def-mclark-ab']


FileNotFoundError: [Errno 2] No such file or directory: '/installs/gistool/extract-dataset.sh'

# 2. Download forcing data from provider - TO BE IMPLEMENTED

In case don't have access gistool supported HPC infrastructure data can be downloaded from the original data provider. CONFLUENCE currently supports direct downloads of the following datasets:

1. ERA5ECMWF Reanalysis v5 (ERA5)
2. Copernicus Arctic Regional Reanalys (CARRA)

These scripts are adapted from the CWARHM workflows by Knoben et al., 2021. The user can also develop their own download scripts here. If you do so, please consider contributing them to the CONFLUENCE repository.

## 1. Download ERA5 data

In [5]:
#Code

## 2. Download CARRA data