# Phase III: Water Storage Retrieval

For this phase, only `numpy` and `pandas` are required. The import from `sklearn` is done to evaluate the water storage retrieval.

In [83]:
import numpy as np
import pandas as pd

from sklearn.metrics import mean_squared_error as mse

Path to estimated bathymetry:

In [84]:
path = "C:/Users/Dave Mont/Desktop/Master_of_DataScience/TFM/Results/depth_estimation/test-results/"

Path to actual bathymetry:

In [102]:
path_ = "C:/Users/Dave Mont/Desktop/Master_of_DataScience/TFM/Bat_data/IDW/"

The following function makes the estimation using the equation $S=\alpha\sum_{i=1}^nh_i$, where $S$ is the total water storage in $m^3$, $\alpha$ is the pixel area in $m^2$ and $h$ is the water depth in $m$. The function also gives the option to use real water depth data to make comparisons. Final units of volume can be changed.

* `z_pred`: Array of $n$ observations of predicted water depth in $m$.
* `pix_area`: Numeric value. Pixel area in $m^2$.
* `z_true` (Optional): Array of $n$ observations of actual water depth in $m$.
* `units` (Optional): String. Final units of volume for water storage. Available options ['m3','hm3','km3'].

In [134]:
def water_storage_retrieval(z_pred,pix_area,z_true = None,units = "hm3"):
    if units == "m3":
        conversion = pix_area
    elif units == "hm3":
        conversion = pix_area*1E-6
    elif units == "km3":
        conversion = pix_area*1E-9
    ws_pred = z_pred.sum()*conversion
    if z_true is not None:
        ws_true = z_true.sum()*conversion
        ws_diff = ws_pred - ws_true
        return ws_pred, ws_true, ws_diff
    else:
        return ws_pred

Results are going to be stored in a pandas data frame:

In [135]:
ws = pd.DataFrame(columns = ['Reservoir','actual_ws',
                             'lr_pred_ws','lr_ws_diff',
                             'rf_pred_ws','rf_ws_diff',
                             'gb_pred_ws','gb_ws_diff'])

ws['Reservoir'] = ['Alto-Lindoso','Bubal','Canelles','Grado']

Load the actual bathymetry.

In [136]:
A = pd.read_csv(path_ + "Alto_Lindoso_3.5_wmt.csv",sep = " ")
B = pd.read_csv(path_ + "Bubal_3.5_wmt.csv",sep = " ")
C = pd.read_csv(path_ + "Canelles_3.5_wmt.csv",sep = " ")
G = pd.read_csv(path_ + "Grado_3.5_wmt.csv",sep = " ")

Delete data that is not in the study area.

In [137]:
A = A[(A['x'] != 0) & (A['y'] != 0)]
B = B[(B['x'] != 0) & (B['y'] != 0)]
C = C[(C['x'] != 0) & (C['y'] != 0)]
G = G[(G['x'] != 0) & (G['y'] != 0)]

## Water Storage Retrieval for each Regression Model

Water storage is retrieved for each estimated water depth and stored in the data frame.

### Linear Regression

In [138]:
A_lr = pd.read_csv(path + "data-all-A-LR.csv")
B_lr = pd.read_csv(path + "data-all-B-LR.csv")
C_lr = pd.read_csv(path + "data-all-C-LR.csv")
G_lr = pd.read_csv(path + "data-all-G-LR.csv")

In [139]:
ws['lr_pred_ws'][0], ws['actual_ws'][0], ws['lr_ws_diff'][0] = water_storage_retrieval(A_lr['z_pred'],100,A['var1.pred'],"hm3")
ws['lr_pred_ws'][1], ws['actual_ws'][1], ws['lr_ws_diff'][1] = water_storage_retrieval(B_lr['z_pred'],100,B['var1.pred'],"hm3")
ws['lr_pred_ws'][2], ws['actual_ws'][2], ws['lr_ws_diff'][2] = water_storage_retrieval(C_lr['z_pred'],100,C['var1.pred'],"hm3")
ws['lr_pred_ws'][3], ws['actual_ws'][3], ws['lr_ws_diff'][3] = water_storage_retrieval(G_lr['z_pred'],100,G['var1.pred'],"hm3")

### Random Forest

In [140]:
A_rf = pd.read_csv(path + "data-all-A-RF-20-sqrt.csv")
B_rf = pd.read_csv(path + "data-all-B-RF-20-sqrt.csv")
C_rf = pd.read_csv(path + "data-all-C-RF-20-sqrt.csv")
G_rf = pd.read_csv(path + "data-all-G-RF-20-sqrt.csv")

In [141]:
ws['rf_pred_ws'][0], ws['actual_ws'][0], ws['rf_ws_diff'][0] = water_storage_retrieval(A_rf['z_pred'],100,A['var1.pred'],"hm3")
ws['rf_pred_ws'][1], ws['actual_ws'][1], ws['rf_ws_diff'][1] = water_storage_retrieval(B_rf['z_pred'],100,B['var1.pred'],"hm3")
ws['rf_pred_ws'][2], ws['actual_ws'][2], ws['rf_ws_diff'][2] = water_storage_retrieval(C_rf['z_pred'],100,C['var1.pred'],"hm3")
ws['rf_pred_ws'][3], ws['actual_ws'][3], ws['rf_ws_diff'][3] = water_storage_retrieval(G_rf['z_pred'],100,G['var1.pred'],"hm3")

### Gradient Boosting

In [142]:
A_gb = pd.read_csv(path + "data-all-A-GB.csv")
B_gb = pd.read_csv(path + "data-all-B-GB.csv")
C_gb = pd.read_csv(path + "data-all-C-GB.csv")
G_gb = pd.read_csv(path + "data-all-G-GB.csv")

In [143]:
ws['gb_pred_ws'][0], ws['actual_ws'][0], ws['gb_ws_diff'][0] = water_storage_retrieval(A_gb['z_pred'],100,A['var1.pred'],"hm3")
ws['gb_pred_ws'][1], ws['actual_ws'][1], ws['gb_ws_diff'][1] = water_storage_retrieval(B_gb['z_pred'],100,B['var1.pred'],"hm3")
ws['gb_pred_ws'][2], ws['actual_ws'][2], ws['gb_ws_diff'][2] = water_storage_retrieval(C_gb['z_pred'],100,C['var1.pred'],"hm3")
ws['gb_pred_ws'][3], ws['actual_ws'][3], ws['gb_ws_diff'][3] = water_storage_retrieval(G_gb['z_pred'],100,G['var1.pred'],"hm3")

# Estimation Error

Error data frame.

In [None]:
ws

RMSE for each estimated water depth.

In [144]:
print('LR RMSE: %0.3f hm3' % mse(ws['actual_ws'],ws['lr_pred_ws'],squared = False))
print('RF RMSE: %0.3f hm3' % mse(ws['actual_ws'],ws['rf_pred_ws'],squared = False))
print('GB RMSE: %0.3f hm3' % mse(ws['actual_ws'],ws['gb_pred_ws'],squared = False))

LR RMSE: 8.934 hm3
RF RMSE: 2.823 hm3
GB RMSE: 3.537 hm3
