# Acquiring model forcing data for the modelling domain

To run our hydrological models we need meteorological forcing data for our domain. These data include:

1. Precipitation
2. Temperature
3. Wind speed
4. Humidity
5. Solar radiation
6. Long wave radiation
7. Atmospheric pressure

## Methods of acquiring geospatial data

1. Subsetting from full domain datasets stored on HPC. If you have access to appropriate HPC infrastructure we can use the datatool (https://github.com/CH-Earth/datatool)


## First we import the libraries and functions we need

In [1]:
import sys
from pathlib import Path
import yaml # type: ignore
import logging

# Add the parent directory to sys.path
current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

# Import required CONFLUENCE utility functions
from utils.dataHandling_utils.data_acquisition_utils import datatoolRunner, carraDownloader, era5Downloader # type: ignore

# Print if successfull
print("All modules imported successfully")

ModuleNotFoundError: No module named 'mpi4py'

## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [None]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"BOUNDING_BOX_COORDS: {config['BOUNDING_BOX_COORDS']}")
    print(f"DATATOOL_DATASET_ROOT: {config['GISTOOL_DATASET_ROOT']}")
    print(f"TOOL_ACCOUNT: {config['TOOL_ACCOUNT']}")
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

## Define default paths

Now let's define the paths to the forcing data before we run the acquisition scripts and create the containing directories

In [None]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directory
raw_data_dir = project_dir / 'forcing' / 'raw_data'

# Make sure the directory exists
raw_data_dir.mkdir(parents = True, exist_ok = True)


# 1. Running datatool
Now that we have our configuration loaded, let's run the datatool to get data we need. This process involves initializing the gistoolRunner with the appropriate settings for each of the datasets we want to extract.

In [None]:
# Set up 
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Initialize datatoolRunner class
dr = datatoolRunner(config, logger)

# Get lat and lon lims
bbox = config['BOUNDING_BOX_COORDS'].split('/')
latlims = f"{bbox[2]},{bbox[0]}"
lonlims = f"{bbox[1]},{bbox[3]}"

# Create the gistool command
datatool_command = dr.create_datatool_command(dataset = config['FORCING_DATASET'], output_dir = raw_data_dir, lat_lims = latlims, lon_lims = lonlims, variables = config['FORCING_VARIABLES'], start_date = config['EXPERIMENT_TIME_START'], end_date = config['EXPERIMENT_TIME_END'])
print(datatool_command)
dr.execute_datatool_command(datatool_command)