<a href="https://colab.research.google.com/github/DiaPorntipa/Bushfire_data_analytics/blob/main/Acclimatised_data_compilation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Start


This script prepares vapor pressure deficit (VPD) data from eratos and matches it with VPD from field observations for subsequent analysis.

TODO (HIGH): The information below is the copy of documentation of another script. Update it.

📄 **What this script does**
1. Loads cleaned field data with topography — for example, output from `Nick_phd_data_complilation.ipynb`.
2. Downloads **eratos data** for the grid cells closest to the field sites, spanning from the first day to one day after the last day of the field observations.
3. Find the field observations with the highest and lowest temperature between 9am to 9am of each day and put them separately in `df` and `df_min_temp`. When there are multiple observations with the highest or lowest temperatures, the ones with the lowest RH are selected.
4. Matches eratos temperature and RH values to the field observations in `df` and `df_min_temp` based on the nearest eratos grid cell and `eratos_observation_date`.
5. For both `df` and `df_min_temp`, calculates **VPD** (vapor pressure deficit) from the matched eratos temperature and RH.
6. Saves the combined field and eratos data as `eratos_max_temp.csv` and `eratos_min_temp.csv` in the `output/csv` folder.


⚠️ **Important notes**
* Before running the script, set all variables in the **first cell**, and delete the **second cell** if not using a Google Colab environment.  
  *(The script was developed for use in Google Colab and has not been tested outside of it.)*


In [None]:
input_file_name = 'in-situ_topography_phd.csv'
output_file_name = 'eratos_vpd_phd.csv'

eratos_rh_path = 'Data/eratos/ANU_CombinedSites_RH.csv'
eratos_temp_path = 'Data/eratos/ANU_CombinedSites_Temp.csv'
eratos_sdi_path = 'Data/eratos/ANU_CombinedSites_SDI.csv'  # daily

In [None]:
import sys

sys.path.append('../..')
import os

import pandas as pd

from Utils.datetime import add_UTC_Datetime
from Utils.vpd import calculate_vpd

# Loading in-situ data


In [None]:
# Load in-situ_topography.csv as the main df

df = pd.read_csv(os.path.join("..", "..", "output", "csv", input_file_name))
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.head()

# Loading ERATOS data

In [None]:
eratos_rh_data_dir = os.path.join("..", "..", eratos_rh_path)
eratos_rh_df = pd.read_csv(eratos_rh_data_dir)
eratos_rh_df.head()

In [None]:
eratos_temp_data_dir = os.path.join("..", "..", eratos_temp_path)
eratos_temp_df = pd.read_csv(eratos_temp_data_dir)
eratos_temp_df.head()

# Combining in-situ and remote data into a single dataframe

In [None]:
# Generate UTC_Datetime for in-situ observations

df = add_UTC_Datetime(df)
df.head()

In [None]:
# Fill in df with eratos data


# For each row, open eratos data csv file one-by-one to get data
def get_eratos_value(row, eratos_df):
    SiteID_str = str(row['SiteID'])

    target_time = row['UTC_Datetime'].round('1h').strftime('%Y-%m-%d %H:%M:%S')
    eratos_value = eratos_rh_df.loc[eratos_rh_df['Date_time'] == target_time, SiteID_str].values[0]
    print(SiteID_str, target_time)
    print(eratos_value)

    return eratos_value


df['eratos_Temperature'] = df.apply(lambda row: get_eratos_value(row, eratos_temp_df), axis=1)
df['eratos_RH'] = df.apply(lambda row: get_eratos_value(row, eratos_rh_df), axis=1)
df.head()

In [None]:
# Investigate the result - There are only NaNs in veg_cover column
df[df.isna().any(axis=1)]

# Calculating remote VPD from remote temperature and remote relative humidity

In [None]:
df['eratos_VPD'] = df.apply(
    lambda row: calculate_vpd(row['eratos_Temperature'], row['eratos_RH']), axis=1
)
df.head()

# Save the resulting dataframes

In [None]:
df.to_csv(os.path.join("..", "..", "output", "csv", output_file_name), index=False)