# ET0 Computation
In this notebook, reference evapotranspiration (ET0) is computed from meteorlogical measurements.

### Index
* [Implementation tutorial](#tutorial)
    * [Cleaning raw data](#imp_raw_data_cleaner)
    * [Computing ET0](#computing_ET0)
* [Creating datasets](#creating_datasets)
    * [Dataset 1](#dataset_1)
    * [Dataset 2](#dataset_2)

## Notebook preparation

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import re
import glob
import os

Loading functions from ET0_functions notebook

In [2]:
# Enter the path to the ET0_functions.ipynb notbook and uncomment the %run line
# TO DO: Pack the fuinctions in a package

# %run path\to\notebook

<a id = 'imp_raw_data_cleaner'></a>
#### Cleaning raw data


1. Define the column rename dictionary (col_rename_dict defined above) and the columns to keep in the resulting dataframe (cols_to_keep).

In [3]:
# Dictionary of old column names (keys) and new column names (vals)
col_rename_dict = {
    # Station number
    'Istasyon_No': 'st_num',
    '?stasyon Numaras?': 'st_num',
    
    
    # Station name
    'Istasyon_Adi': 'st_name',
    '?stasyon Ad?': 'st_name',
    
    
    # Year and month
    'YIL': 'year',
    'AY': 'month',
    
    
    # Province and district
    '?l': 'province',
    '?l?e': 'district',
    
    
    # Longitude, latitude, and elevation
    'Enlem': 'latitude',
    'Boylam': 'longitude',
    'Rak?m': 'elevation',
    
    
    # Radiation and sun duration
    
    # Total monthly radiation (kwh÷m²) - Values available from 2005 only
    'TOPLAM_KURESEL_GUNES_RADYASYONU_kws': 'tot_monthly_rad',
    
    # Maximum radiation (W/m2) - Values available from 2005 only
    'MAKS?MUM KURESEL GUNES RADYASYONU Watt?m?': 'max_rad',
    
    # Monthly average of total daily rad (cal÷cm²)
    'GUNLUK_TOPLAM_GLOBAL_GUNESLENME_SIDDETI_AYLIK_ORTALAMASI': 'inc_rad',
    
    # Total Monthly Sunlight Duration
    'TOPLAM_GUNESLENME_SURESI': 'tot_monthly_sun_duration',
    
    # Monthly average of total daily sunlight duration
    'GUNLUK_TOPLAM_GUNESLENME_SURESI_AYLIK_ORTALAMASI': 'sun_duration',
    
    
    # Relative humidity
    'MAKSIMUM_NEM': 'max_hum',
    'MINIMUM_NEM': 'min_hum',
    'ORTALAMA_NEM_%': 'avg_hum',
    
    
    # Temperature
    'MAKSIMUM_SICAKLIK_C': 'max_temp',
    'MINIMUM_SICAKLIK_C': 'min_temp',
    'ORTALAMA_SICAKLIK_?C': 'avg_temp',
    
    
    # Wind speed (m/s)
    'AYLIK_ORTALAMA_RUZGAR_HIZI': 'avg_ws',
    
    
    # Soil relative humidity
    'ORTALAMA_TOPRAK_NEMI_20': 'soil_hum_20',
    'ORTALAMA_TOPRAK_NEMI_40': 'soil_hum_40',
    'ORTALAMA_TOPRAK_NEMI_80': 'soil_hum_80',
    
    # Free surface evaporation
    'TOPLAM_ACIK_YUZEY_BUHARLASMASI_mm': 'surface_evaporation',
    
    # Precipitation
    'AYLIK_TOPLAM_YAGIS_mm': 'precip_omgi',
    'AYLIK_TOPLAM_YAGIS_mm_Manuel': 'precip_manual'
}

In [4]:
cols_to_keep = [
    'st_num', 'year', 'month', 'latitude',
    'longitude', 'elevation', 'min_temp',
    'max_temp', 'min_hum', 'max_hum', 'avg_ws',
    'inc_rad', 'sun_duration'
]

2. Define the path to the directory containing the raw data files. This directory should contain **station_definitions.txt**, which has station numbers, latitudes, and longitudes.

In [5]:
# Define the path to the directory containing the raw data files
# path_to_dir = r"path\to'directory"


3. Create a `RawDataCleaner` object and call `clean_for_ET0()` method.

In [24]:
RawDataCleaner?

In [6]:
cl = RawDataCleaner(path_to_dir, col_rename_dict, cols_to_keep)
df = cl.clean_for_ET0()

Cleaning dataset:
    Dropped 0 missing elevation values
    Dropped 326 missing min_temp values
    Dropped 206 missing max_temp values
    Dropped 256 missing min_hum values
    Dropped 256 missing max_hum values
    Dropped 690 missing avg_ws values
    Dropped 5010 missing inc_rad/sun_duration values
    Dropped 244 values from st-year combos with less than 12 vals/year
----------------------------------
Total dropped values: 6285
Dataset size: 2172


<a id = 'computing_ET0'></a>
#### Computing ET0

1. Compute average temperature and average humidity

In [7]:
idx = df.columns.get_loc('max_temp') + 1
avg_temp = (df['max_temp'] + df['min_temp']) / 2.
df.insert(idx, 'avg_temp', avg_temp)

idx = df.columns.get_loc('max_hum') + 1
avg_hum = (df['min_hum'] + df['max_hum']) / 2.
df.insert(idx, 'avg_hum', avg_hum)

2. Convert wind speed at 10m above the ground to 2m above the ground

In [8]:
h = 10
df['avg_ws'] = windspeed_2m(h = h, data = df)

3. Convert measured incoming solar radiation units

In [9]:
# Convert solar radiation units
def cal_to_MJ(val):
    return val*4.184*10**(-2)

df['inc_rad'] = cal_to_MJ(df['inc_rad'])

4. Replacing RHmax greater than 100 with 100

In [10]:
cond = df['max_hum'] > 100
idx = df[cond].index
df.loc[idx, 'max_hum'] = 100

In [11]:
df.head()

Unnamed: 0,st_num,year,month,latitude,longitude,elevation,min_temp,max_temp,avg_temp,min_hum,max_hum,avg_hum,avg_ws,inc_rad,sun_duration
0,17186,2007,1,38.6153,27.4049,71.0,-2.6,19.7,8.55,31.0,99.0,65.0,0.972336,6.08772,2.1
1,17186,2007,2,38.6153,27.4049,71.0,-2.2,18.8,8.3,38.0,99.0,68.5,0.897541,4.025008,1.5
2,17186,2007,3,38.6153,27.4049,71.0,2.0,24.7,13.35,19.0,98.0,58.5,0.373976,2.435088,1.0
3,17186,2007,4,38.6153,27.4049,71.0,3.9,27.7,15.8,21.0,97.0,59.0,0.598361,2.899512,1.1
4,17186,2007,5,38.6153,27.4049,71.0,10.8,36.3,23.55,15.0,84.0,49.5,0.747951,3.669368,1.3


4. Compute ET0 using `compute_ET0()` function

<a id = 'soil_heat_flux'></a>
Soil heat flux (G) computation using one of the following equations:

* Method 1 (Eq. 1):
$$ G_{month, i} = 0.14 \cdot \left( T_{month, i} - T_{month, i-1} \right) $$

* Method 2 (Eq. 2):
$$ G_{month, i} = 0.07 \cdot \left( T_{month, i+1} - T_{month, i-1} \right) $$


In [12]:
compute_ET0?

In [13]:
et = compute_ET0(G_method = 1,
                maxmin = False,
                compute_inc_rad = False,
                data = df)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [14]:
et.describe()

Unnamed: 0,st_num,year,month,latitude,longitude,elevation,min_temp,max_temp,avg_temp,min_hum,max_hum,avg_hum,avg_ws,inc_rad,sun_duration,G,Ra,Rso,Rn,ET0
count,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0,2172.0
mean,17627.104972,1995.878453,6.5,37.431709,28.721023,296.033149,5.231998,28.887201,17.059599,23.046041,93.552486,58.299263,1.452616,14.957347,7.674263,-0.002345,29.13134,22.020032,7.564482,3.894731
std,298.320522,7.007264,3.452847,0.53831,0.70982,328.709833,7.860968,8.113204,7.772188,7.995029,5.793269,5.463647,0.74359,6.29469,2.726288,0.540278,9.521811,7.198947,4.209858,1.846843
min,17186.0,1984.0,1.0,36.6266,27.4049,3.0,-20.0,10.0,-3.15,0.0,61.0,32.0,0.224385,2.435088,1.0,-1.1795,14.436144,10.847607,1.421682,0.73629
25%,17296.0,1990.0,3.75,36.97,28.1369,24.0,-1.0,21.4,10.35,18.0,92.0,55.0,0.972336,9.253962,5.4,-0.483,20.279842,15.232463,3.280613,2.28582
50%,17824.0,1996.0,6.5,37.3395,28.6869,84.0,4.85,29.25,17.0,22.0,95.0,58.5,1.271517,14.838556,7.5,0.0525,29.838789,22.417886,7.568732,3.631573
75%,17884.0,2001.0,9.25,37.9135,29.0921,425.0,11.7,36.2,23.95,28.0,97.0,61.5,1.720287,20.274618,10.3,0.497,37.802836,28.710083,11.450577,5.299499
max,17924.0,2016.0,12.0,38.6153,30.1531,864.0,23.0,45.7,33.2,57.0,100.0,77.0,4.562502,29.66456,13.2,1.029,41.809913,32.063319,16.396273,10.30237


5. Adding `date` (datetime) column

In [None]:
idx = et.columns.get_loc('st_num') + 1

# Data is monthly data but datetime requires  a day entry
# Assigning day 15 to all dates

date = pd.to_datetime(et['year'].astype(str) + '-' + et['month'].astype(str) + '-15')

et.insert(idx, 'date', date)

6. Saving data

In [15]:
# Saving to csv file
# Enter path to save the dataset in
# save_path = r'save\path'

file_name = 'ET0_data.csv'
file_path = os.path.join(save_path, file_name)

et.to_csv(file_path, index = False)