<a href="https://colab.research.google.com/github/DiaPorntipa/Bushfire_data_analytics/blob/main/Acclimatised_data_compilation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Start


This script prepares vapor pressure deficit (VPD) data from eratos and matches it with VPD from field observations for subsequent analysis.

TODO (HIGH): The information below is the copy of documentation of another script. Update it.

📄 **What this script does**
1. Loads cleaned field data with topography — for example, output from `Nick_phd_data_complilation.ipynb`.
2. Downloads **eratos data** for the grid cells closest to the field sites, spanning from the first day to one day after the last day of the field observations.
3. Find the field observations with the highest and lowest temperature between 9am to 9am of each day and put them separately in `df` and `df_min_temp`. When there are multiple observations with the highest or lowest temperatures, the ones with the lowest RH are selected.
4. Matches eratos temperature and RH values to the field observations in `df` and `df_min_temp` based on the nearest eratos grid cell and `eratos_observation_date`.
5. For both `df` and `df_min_temp`, calculates **VPD** (vapor pressure deficit) from the matched eratos temperature and RH.
6. Saves the combined field and eratos data as `eratos_max_temp.csv` and `eratos_min_temp.csv` in the `output/csv` folder.


⚠️ **Important notes**
* Before running the script, set all variables in the **first cell**, and delete the **second cell** if not using a Google Colab environment.  
  *(The script was developed for use in Google Colab and has not been tested outside of it.)*


In [None]:
working_dir = '../..'  # This repository's root directory
input_file_name = 'in-situ_topography_pcs.csv'
output_file_name = 'eratos_sdi_pcs_min_soil_mois.csv'

eratos_sdi_path = 'Data/eratos/ANU_CombinedSites_SDI.csv'  # daily

In [None]:
import sys

sys.path.append(working_dir)
import os

from tqdm import tqdm

tqdm.pandas()
import pandas as pd

from Utils.datetime import add_UTC_Datetime
from Utils.vpd import calculate_vpd

# Loading in-situ data


In [None]:
# Load in-situ_topography.csv as the main df

df = pd.read_csv(os.path.join(working_dir, "output", "csv", input_file_name))
df = df[~df['Soil_mois'].isna()]
df['Datetime'] = pd.to_datetime(df['Datetime'])
if 'UTC_Datetime' in df.columns:
    df['UTC_Datetime'] = pd.to_datetime(df['UTC_Datetime'])
else:
    df = add_UTC_Datetime(df)
df.head()

# Loading ERATOS data

In [None]:
eratos_sdi_data_dir = os.path.join(working_dir, eratos_sdi_path)
eratos_sdi_df = pd.read_csv(eratos_sdi_data_dir)
eratos_sdi_df.head()

# Combining in-situ and remote data into a single dataframe

In [None]:
# Selecting only obs with has the daily min Soil_mois value

# Assign eratos_observation_date to each observation
df['eratos_observation_date'] = df['UTC_Datetime'].dt.date

# Select only rows with min soil_mois of the day
df['Day_min_soil_mois'] = df.groupby(['SiteID', 'eratos_observation_date'])['Soil_mois'].transform(
    'min'
)
df_min_soil_mois = df[df['Soil_mois'] == df['Day_min_soil_mois']]

df_min_soil_mois.head()

In [None]:
# Fill in df_min_soil_mois with eratos data


# For each row, open eratos data csv file one-by-one to get data
def get_eratos_value(row, eratos_df):
    SiteID_str = str(row['SiteID'])
    SiteID_str = ''.join(c for c in SiteID_str if c.isdigit())

    target_time = row['UTC_Datetime'].round('1d').strftime('%Y-%m-%d')
    eratos_value = eratos_df.loc[eratos_df['Date'] == target_time, SiteID_str].values[0]

    return eratos_value


df_min_soil_mois['eratos_SDI'] = df_min_soil_mois.progress_apply(
    lambda row: get_eratos_value(row, eratos_sdi_df), axis=1
)
df_min_soil_mois.head()

In [None]:
# Investigate the result - There are only NaNs in veg_cover column
df_min_soil_mois[df_min_soil_mois['eratos_SDI'].isna()]

# Save the resulting dataframes

In [None]:
df_min_soil_mois.to_csv(os.path.join(working_dir, "output", "csv", output_file_name), index=False)