# VPRM timeseries

This tutorial will show you how to create timeseries of emission for the vegetation 
photosynthesis and respiration model (VPRM) within emiproc.

We will first prepare the input data, then run the model and finally visualize the results.

If you want to learn more how to use emiproc VPRM, you can check [the documentation](https://emiproc.readthedocs.io/en/master/emissions_generation.html#vprm) 

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import urllib.request

from emiproc import  FILES_DIR
from emiproc.profiles.vprm import calculate_vprm_emissions

plt.style.use('default')

# Set up the working directory, change this if you want to check the files
# in a different location
work_dir = FILES_DIR / 'vprm'
work_dir.mkdir(exist_ok=True, parents=True)
print(work_dir)

## Prepare input data

To run VPRM we need the following input data:

- Satellite data to calculate the vegetation indices (EVI: Enhanced Vegetation Index and 
  NDVI: Normalized Difference Vegetation Index)
- Meteorological data (temperature and radiation)
- Vegetation parameters (constants for different vegetation types used in the model)


This tutorial will focus on the city of Zurich, which provide good open source data.

We will run vprm for the year 2024, with an hourly resolution.
We won't treat spatial resolution in this tutorial, so we are only interested in the temporal profiles.

### Meteorological data

The city of zurich provides a good [meteorological dataset](https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte), wo we can use this directly.

In [None]:

downlaod_link_meteo = "https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte/download/ugz_ogd_meteo_h1_2024.csv"
# Download the meteorological data
meteo_file = work_dir / 'meteo.csv'

if not meteo_file.is_file():
    urllib.request.urlretrieve(downlaod_link_meteo, meteo_file)

df_meteo = pd.read_csv(meteo_file, parse_dates=['Datum'])
df_meteo.head(10)

For VPRM we need to extract the Temperature and radiation.
So we will try to get the `T` and the `StrGlo` parameters from the data.

In [None]:
cols = {}
for var_in_data, var in {
    "T": "T",
    "StrGlo": "Rad",
}.items():

    mask_var = df_meteo["Parameter"] == var_in_data
    # Make the average over the different stations
    serie = df_meteo.loc[mask_var, ["Datum", "Wert"]].groupby("Datum").mean()['Wert']
    cols[var] = serie

df_meteo_cleaned = pd.concat(cols, axis=1)
# Put to utc 
df_meteo_cleaned.index = df_meteo_cleaned.index.tz_convert('UTC').tz_localize(None)
df_meteo_cleaned

We can have a look at the data

In [None]:
fig, axes = plt.subplots(2, 1,sharex=True)
df_daily = df_meteo_cleaned.resample("d").mean()
axes[0].plot(df_daily.index, df_daily["T"], label="Temperature")
axes[0].set_ylabel("Temperature [°C]")
axes[0].legend()
axes[1].plot(df_daily.index, df_daily["Rad"], label="Global radiation")
axes[1].set_ylabel("Global radiation [W/m²]")
axes[1].legend()


As expected, the temperature and radiation have daily fluctuations as well as seasonal ones. 

It seems we can use that data for the next steps.

### Satellite indicies 

We will use the `EVI` and `LSWI` indices to calculate the vegetation parameters.

Usually you would need to download some satellite data and calculate the indices yourself.
Then you will get the indices for different moment in the year.

This is a bit tedious, so here we will simply use some timeseries that are already generated. In case you want to do it yourself, you can follow [this python tutorial](https://documentation.dataspace.copernicus.eu/notebook-samples/openeo/NDVI_Timeseries.html)


The satellite indices are not always given directly, if you need to calculate them, you can
use the emiproc function 
[calculate_vegetation_indices](https://emiproc.readthedocs.io/en/master/api/models.html#emiproc.profiles.vprm.calculate_vegetation_indices)

In [None]:
df_sat = pd.read_csv(work_dir / "vegetation_indices.csv", index_col=0, header=[0, 1], 
                 parse_dates=True)
df_sat

This table contains the evi and ndvi indices for different vegetation types
at different days in time (where a satellite pass was available).

In [None]:
df_sat.loc["2024"].plot(linestyle="", marker='o', figsize=(10, 5))

We need to interpolate the data to get estimates for the whole year.
Since it is very stochastic, we use a robust method to interpolate the data.

In [None]:
# monthly median
df_sat_monthly_mean = df_sat.resample("MS").median()
# Add 15 days to be at the middle of the month
df_sat_monthly_mean.index = df_sat_monthly_mean.index + pd.Timedelta(days=15)
# Resample to hourly data and interpolate for the missing values
df_sat_full = df_sat_monthly_mean.resample("h").interpolate(method="akima").reindex(
    df_meteo_cleaned.index
)

ax = df_sat_full.plot(figsize=(10, 5))
ax.legend(loc="upper right", bbox_to_anchor=(1.25, 1))

This is a very rough estimate, but most of the important features are captured.

We see lower values in the winter and higher values in the summer.
Cropland has a huge drop in summer, which can happen when the crops are harvested.

### Vegetation parameters

For the parameters, we will use the original parameters from the [VPRM paper](https://doi.org/10.1029/2006GB002735).

In [None]:
df_indices_mahadevan = pd.read_csv(work_dir / "vprm_parameters.csv", index_col="Site")
df_indices_mahadevan


## Run VPRM
Now that we have all the input data, we can run VPRM.

This is simply done by calling the function [calculate_vprm_emissions](https://emiproc.readthedocs.io/en/master/api/models.html#emiproc.profiles.vprm.calculate_vprm_emissions) .
If you look at the documentation, you can also see the equations used.

In [None]:
# Put all the timeseries together
df_vprm = df_sat_full.copy()
df_vprm[('T', 'global')] = df_meteo_cleaned['T']
df_vprm[('RAD', 'global')] = df_meteo_cleaned['Rad']
df_vprm



In [None]:
# Choose which site to use for each category (rename the index)
# Of course this is not what you should do 
# in a real application you should optimize the model to find the best parameters
# but since we are just doing a tutorial, we will use the default parameters
df_indices = df_indices_mahadevan.rename(
    index={
        "NOBS": "Evergreen",
        "HARVARD": "Deciduous",
        "CORN_MEAD": "Cropland",
        "VAIRA": "Grassland",
        
    }
)
df_indices

In [None]:
df_emissions = calculate_vprm_emissions(
    df=df_vprm,
    df_vprm=df_indices,
    
)

## Plot VPRM results

Now we have another function that helps us to visualize the results.

### Average daily cycle for each month

In [None]:
from emiproc.plots.vprm import plot_vprm_params_per_veg_type


plot_vprm_params_per_veg_type(df_emissions, df_indices, group_by='%m%H')

We see nice diurnal cycles, which are stronger in summer.

Different other parameters are shown to help us understand the results.

### Daily means

In [None]:
plot_vprm_params_per_veg_type(df_emissions, df_indices, group_by='%m%d')