# Exercise 2
**Evaluate the Scale of Measurement on Soil Moisture**

In this exercise you will do your own evaluation of H SAF ASCAT surface soil moisture (SSM) 6.25 km. However, for your own analysis you will use modelled soil moisture estimates from [ECMWF](https://www.ecmwf.int/) instead of the in situ stations.

## Overview

The particular dataset used here is [ERA5-Land daily](https://cds.climate.copernicus.eu/datasets/derived-era5-land-daily-statistics?tab=overview). We have extracted for you the volume of water in soil layer 1 (0 - 7cm, the surface is at 0cm). The soil's water content is derived by using a combination of modeling and data assimilation techniques. Here's a simplified explanation of how it works: 

- **Modelling**: ERA5-Land uses a sophisticated land surface model to simulate various processes that affect soil moisture. This model takes into account factors like rainfall, evaporation, runoff, and infiltration to estimate how much water is present in different layers of the soil.

- **Data Assimilation**: To improve the accuracy of these estimates, ERA5-Land incorporates observational atmospheric variables, such as air temperature and air humidity.

- **Soil Layers**: The model divides the soil into multiple layers, each with its own characteristics and moisture content. By considering the water movement between these layers, ERA5-Land can provide detailed information about soil moisture at different depths. 

In essence, ERA5-Land combines advanced modeling techniques with real-world observations to derive accurate and detailed estimates of water content in soil layers. This information is crucial for applications like weather forecasting, agriculture, and water resource management. The resolution of this dataset is 9 km and comes in volumetric units [m$^3$ / m$^3$], so much coarser than the point-wise in situ stations.

## Imports

In [1]:
import hvplot.pandas  # noqa
import pandas as pd

## Loading Soil Moisture Data

As before, we load the data as a `pandas.DataFrame`. First ERA5 Land soil moisture and then the H SAF ASCAT SSM.

In [2]:
%run ../src/download_path.py

url = make_url("era5_ssm_timeseries.csv")  # noqa
df_era5 = pd.read_csv(
    url,
    index_col="time",
    parse_dates=True,
)

url = make_url("ascat-6_25_ssm_timeseries.csv")  # noqa
df_ascat = pd.read_csv(
    url,
    index_col="time",
    parse_dates=True,
)

https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/era5_ssm_timeseries.csv/raw?ref=main&lfs=true
https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/ascat-6_25_ssm_timeseries.csv/raw?ref=main&lfs=true


Now you will perform the same type of analyses as in notebook 2. Perform the analysis by adhering to the following steps and filling in the blanks `...`.

1. **Unit Conversions**

- Calculate porosity with `calc_porosity` from bulk and particle densities `density_df` using pandas `transform`.

In [None]:
density_df = pd.DataFrame(
    {
        "name": ["Buzi", "Chokwé", "Mabalane", "Mabote", "Muanza"],
        "bulk_density": [1.25, 1.4, 1.4, 1.35, 1.25],
    }
).set_index("name")

def calc_porosity(x):
    return 1 - x / 2.65


porosity_df = ...  # ADD YOUR CODE
porosity_df

Unnamed: 0_level_0,porosity
name,Unnamed: 1_level_1
Buzi,0.528302
Chokwé,0.471698
Mabalane,0.471698
Mabote,0.490566
Muanza,0.528302


- Add the porosity (`porosity_df`) to the ASCAT `DataFrame` as a new column with pandas `merge`.

In [None]:
df_ascat_porosity = ...  # ADD YOUR CODE
df_ascat_porosity.head()

Unnamed: 0_level_0,name,type,surface_soil_moisture,unit,porosity
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2007-01-01 06:37:46.391000064,Chokwé,ascat,100.0,%,0.471698
2007-01-01 19:04:05.412999680,Chokwé,ascat,82.71,%,0.471698
2007-01-03 07:36:23.573000192,Chokwé,ascat,47.1,%,0.471698
2007-01-03 20:02:40.260000256,Chokwé,ascat,39.63,%,0.471698
2007-01-04 07:15:40.862000128,Chokwé,ascat,74.75,%,0.471698


- Convert SSM in degrees of saturation to volumetric units with `deg2vol` and pandas `apply` on `df_ascat_porosity`.

In [None]:
def deg2vol(df):
    return df["porosity"] * df["surface_soil_moisture"] / 100


df_ascat_vol = df_ascat.copy()
df_ascat_vol["unit"] = "m³/m³"
df_ascat_vol["surface_soil_moisture"] = ...  # ADD YOUR CODE
df_ascat_vol.head()

Unnamed: 0_level_0,name,type,surface_soil_moisture,unit
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2007-01-01 06:37:46.391000064,Chokwé,ascat,0.471698,m³/m³
2007-01-01 19:04:05.412999680,Chokwé,ascat,0.390142,m³/m³
2007-01-03 07:36:23.573000192,Chokwé,ascat,0.22217,m³/m³
2007-01-03 20:02:40.260000256,Chokwé,ascat,0.186934,m³/m³
2007-01-04 07:15:40.862000128,Chokwé,ascat,0.352594,m³/m³


## Correlations

- Concatenate the `df_ascat_vol` and `df_era5` datasets.

In [None]:
df = ...  # ADD YOUR CODE
df.head()

Unnamed: 0_level_0,surface_soil_moisture,name,type,unit
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2007-01-01 00:00:00,0.363715,Muanza,era5,m³/m³
2007-01-01 06:00:00,0.33726,Muanza,era5,m³/m³
2007-01-01 12:00:00,0.369484,Muanza,era5,m³/m³
2007-01-01 18:00:00,0.381393,Muanza,era5,m³/m³
2007-01-02 00:00:00,0.364395,Muanza,era5,m³/m³


In [7]:
df.hvplot.scatter(
    x="time",
    y="surface_soil_moisture",
    by="type",
    groupby="name",
    frame_width=800,
    padding=(0.01, 0.1),
    alpha=0.5,
)

BokehModel(combine_events=True, render_bundle={'docs_json': {'eec8b309-e4bb-4f2b-965f-34cbc1878627': {'version…

- Resample the `df_ascat_vol` and `df_era5` to daily values datasets and merge the datasets.

In [None]:
df_insitu_daily = (
    df_era5.groupby("name")["surface_soil_moisture"]
    ...  # ADD YOUR CODE
    .median()
    .to_frame("era5")
)

df_ascat_vol_daily = (
   ...  # ADD YOUR CODE
)

df_combined = pd.merge(
    df_ascat_vol_daily, df_insitu_daily, left_index=True, right_index=True
)
df_combined.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,ascat,era5
name,time,Unnamed: 2_level_1,Unnamed: 3_level_1
Buzi,2007-01-01,0.262883,0.348961
Buzi,2007-01-02,,0.380474
Buzi,2007-01-03,0.218083,0.360645
Buzi,2007-01-04,0.243204,0.355381
Buzi,2007-01-05,,0.384015


- Calculate Pearson's R$^2$ with pandas `groupby` on the locations and `corr`.

In [None]:
...  # ADD YOUR CODE

Unnamed: 0_level_0,Unnamed: 1_level_0,ascat,era5
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Buzi,ascat,1.0,0.765487
Buzi,era5,0.765487,1.0
Chokwé,ascat,1.0,0.613083
Chokwé,era5,0.613083,1.0
Mabalane,ascat,1.0,0.70777
Mabalane,era5,0.70777,1.0
Mabote,ascat,1.0,0.660671
Mabote,era5,0.660671,1.0
Muanza,ascat,1.0,0.810161
Muanza,era5,0.810161,1.0


1. **Calculate the root mean squared error**

 - Calculate RMSE with pandas `groupby` on the locations and an user defined function `RMSE`.

In [None]:
def RMSE(df):
    return ...  # ADD YOUR CODE


df_combined.groupby("name").apply(RMSE)

name
Buzi        0.065389
Chokwé      0.081895
Mabalane    0.069251
Mabote      0.062143
Muanza      0.113982
dtype: float64