# MassBalanceMachine Data Processing - Example for Iceland

In this notebook, the data processing part of the MassBalanceMachine will be outline through an example with stake data from glaciers in Iceland. This example will help you understand how to use the data processing pipeline, that retrieves toporgrahpical and meteorological features for the stake data.

In [6]:
import re
import os
import pandas as pd

# Import the submodules from the MassBalanceMachine core
from mbm.data_processing import Dataset

FILE_DIR = '../../regions/iceland/mbm/data/files/'

## 1. Define and Load your Target Surface Mass Balance Dataset

**Expected columns in the dataset (per stake):** longitude ('lon'), latitude ('lat'), RGI ID, and the hydrological year of the measurement. 

In [7]:
# Specify the filename of the input file with the raw data
input_target_fname = 'Iceland_Stake_Data_Reprojected.csv'
# Construct the full file path
input_file_path = os.path.join(FILE_DIR, input_target_fname)

df = pd.read_csv(input_file_path)

# Provide the column name for the column that has the RGI IDs for each of the stakes
# Provide the region ID
dataset = Dataset(df, 'RGIId', FILE_DIR, '06')

## 2. Get the Topographical Features per Stake

In [None]:
# Specify the output filename to save the intermediate results
output_topo_fname = 'Iceland_Stake_Data_T_Attributes.csv'

# Specify the topographical features of interest 
vois_topo_columns = ['topo', 'aspect', 'slope', 'slope_factor', 'dis_from_border']

# Retrieve the topographical features for each of the stake measurement in the dataset
dataset.get_topo_features(output_topo_fname, vois_topo_columns)

## 3. Get the Meteorological Features per Stake

In [None]:
# Specify the directory and the files of the climate data, that will be matched with the coordinates of the stake data
input_era5_fname = '../../regions/iceland/mbm/data/climate/ERA5_monthly_averaged_climate_data.nc'
input_gp_fname = '../../regions/iceland/mbm/data/climate/ERA5_geopotential_pressure.nc'

# Specify the output filename to save the intermediate results
output_climate_fname = 'Iceland_Stake_Data_Climate.csv'

# Provide the column name of your dataset that contains the hydrological year, if not available provide a measurement date
# that was taken at the end of the hydrological year, so that the year can be extracted for stakes and indicate this with TRUE.
dataset.get_climate_features(output_climate_fname, input_era5_fname, input_gp_fname, 'd3')

## 4. Transform Data to Monthly Resolution

In [3]:
# Define which columns are of interest (vois: variables of interest), please see the metadata file for the ERA5-Land data with all the variable names
vois_climate = ['t2m', 'tp', 'sshf', 'slhf', 'ssrd', 'fal', 'str']

# Create a dictionary of all the columns in the dataset that match the variables of interest of the ERA5-Land data
vois_climate_columns = {voi: [col for col in df.columns.values if re.match(f'{voi}_[a-zA-Z]*', col)] for voi in vois_climate}

# Specify the column names for the seasonal (winter and summer) and annual mass balance columns in the dataset
smb_column_names = ['ba_stratigraphic', 'bw_stratigraphic', 'bs_stratigraphic']

misc_column_names = ['yr']

# Specify the output filename to save the intermediate results
output_climate_fname = 'Iceland_Stake_Data_Monthly.csv'

dataset.convert_to_monthly(output_climate_fname, vois_climate_columns, vois_topo_columns, smb_column_names, misc_column_names)