# Part VI: Bonus projects

Authors: Jordi Bolibar & Facundo Sapienza

In notebook 5 we re-computed a distributed dataset which allows us to work with 2D gridded data instead of glacier-wide data. In this notebook, we will introduce two different bonus projects which can help you exploit these datasets further. 

For these two projects, we will use the following libraries:

In [1]:
## Auxiliary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import salem
from pathlib import Path
import os

# OGGM
import oggm.cfg as cfg
from oggm import utils, workflow, tasks, graphics

cfg.initialize(logging_level='WARNING') # initialize OGGM
cfg.PARAMS['border'] = 10
cfg.PARAMS['use_multiprocessing'] = True 

2023-04-20 08:46:47: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2023-04-20 08:46:47: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2023-04-20 08:46:47: oggm.cfg: Multiprocessing: using all available processors (N=4)
2023-04-20 08:46:49: oggm.cfg: PARAMS['border'] changed from `80` to `10`.
2023-04-20 08:46:49: oggm.cfg: Multiprocessing switched ON after user settings.


## 6.1. Bonus project No1: Distributed glacier mass balance

<img src="Figures/eye_logo.png" width="75"/>

In this project, we propose to go beyond the mass balance modelling we did in the previous notebooks, and to attempt to learn the spatial distribution of the glacier mass balance directly from the gridded geodetic mass balance data from Hugonnet et al. (2021). Instead of attempting to model a single data point per glacier, we will simulate the spatial distribution of the mass balance (i.e. its altitudinal gradient). 

## 6.1.1 Building a distributed dataset

In the previous notebooks we worked with tabular data. As you can see in the plot above, now we will be working with distributed (2D) data. Therefore, we need to reshape and adjust our dataset for this new problem.

In [8]:
df_subset = pd.read_csv('Data/df_distributed_subset.csv')
df_subset.columns

Index(['Unnamed: 0.1', 'Unnamed: 0', 'RGI_ID', 'PDD_2D', 'rain_2D', 'snow_2D',
       'topo', 'aspect', 'slope', 'dis_from_border', 'glacier_mask',
       'millan_ice_thickness', 'hugonnet_dhdt'],
      dtype='object')

We clean the DataFrame a little bit

In [11]:
df_subset = df_subset.drop(['Unnamed: 0.1', 'Unnamed: 0'], axis=1)

This is a small subset of the whole dataset, a lightweight version to start working with.

In [12]:
df_subset

Unnamed: 0,RGI_ID,PDD_2D,rain_2D,snow_2D,topo,aspect,slope,dis_from_border,glacier_mask,millan_ice_thickness,hugonnet_dhdt
0,RGI60-08.00005,609.424791,24237.964135,16140.702605,1024.7790,5.331003,0.102807,1997.02900,0,0.0,0.047467
1,RGI60-08.00005,615.582145,24237.964135,16140.702605,1034.9397,5.075106,0.112442,1955.76070,0,0.0,0.091581
2,RGI60-08.00005,626.451758,24237.964135,16049.763045,1052.8350,4.781625,0.125466,1916.53980,0,0.0,0.083521
3,RGI60-08.00005,632.948060,24237.964135,16049.763045,1063.4498,4.490769,0.125620,1879.49460,0,0.0,0.034988
4,RGI60-08.00005,637.226233,24237.964135,16049.763045,1070.4403,4.222532,0.151456,1844.75610,0,0.0,0.034173
...,...,...,...,...,...,...,...,...,...,...,...
53565,RGI60-08.00014,1053.637775,23169.686477,11568.195539,1352.4573,1.996007,0.119973,357.61713,0,0.0,-0.010636
53566,RGI60-08.00014,1051.288168,23169.686477,11721.653822,1349.4329,2.012404,0.114451,373.89438,0,0.0,-0.016246
53567,RGI60-08.00014,1049.039072,23169.686477,11721.653822,1346.5271,2.080615,0.099085,390.62260,0,0.0,-0.022051
53568,RGI60-08.00014,1046.852415,23169.686477,11721.653822,1343.7019,2.261690,0.075171,407.74625,0,0.0,-0.024491


### TO BE CONTINUED

See how to build training matrices with all geodetic MB data, and how to re-structure the training dataset from the previous notebooks to give it the same shape as the flattened matrices of this dataset. For many data (e.g. climate) only available at low resolution, we can just repeat the same value for precipitations and do a rough downscaling for temperature.  

## 6.2. Bonus project No2: Inferring glacier ice thickness

<img src="Figures/eye_logo.png" width="75"/>

In this project, we propose to simulate something completely different to what we have been doing in the previous slides. Instead of working on glacier mass balance, we will try to infer glacier ice thickness from a data-driven perspective. OGGM provides access to many different glacier datasets, including Glathida, the global dataset of glacier ice thickness observations. These observations cover a reduced number of glaciers (ca 2700), and only partially in terms of glacier surface. Nonetheless, all the data points available can be used to train a machine learning model to infer glacier ice thickness from different input features of interest.

These different input features could be:
- Glacier surface elevation
- Surface slope
- Ice surface velocities

In [None]:
parent_path = os.path.dirname(Path().resolve())
workspace_path = os.path.join(parent_path, 'OGGM_data_Finse_glathida')
#workspace_path = '/home/jovyan/shared/glacier-ml-2022/Mass_Balance_ML_Modelling/Data'

if not os.path.exists(workspace_path):
    os.mkdir(workspace_path)

cfg.PATHS['working_dir'] = workspace_path

In [None]:
gtd_file = utils.file_downloader('https://cluster.klima.uni-bremen.de/~oggm/glathida/glathida-v3.1.0/data/TTT_per_rgi_id.h5')

glathida = pd.HDFStore(gtd_file)
rgi_ids = glathida.keys()
rgi_ids = [id[1:] for id in rgi_ids]

In [None]:
# We use the directories with the shop data in it: "W5E5_w_data"
base_url = 'https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L3-L5_files/2023.1/elev_bands/W5E5_w_data/'
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3, prepro_base_url=base_url, prepro_border=10)

In [None]:
df = pd.read_hdf(gtd_file, key=rgi_id)
df.plot(y='POINT_LAT', x='POINT_LON', c='THICKNESS', kind='scatter', cmap='viridis');