<a href="https://colab.research.google.com/github/DiaPorntipa/Bushfire_data_analytics/blob/main/Acclimatised_data_compilation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Start


This script prepares vapor pressure deficit (VPD) data from acclimatised and matches it with VPD from field observations for subsequent analysis.

TODO (HIGH): The information below is the copy of documentation of another script. Update it.

📄 **What this script does**
1. Loads cleaned field data with topography — for example, output from `Nick_phd_data_complilation.ipynb`.
2. Downloads **acclimatised data** for the grid cells closest to the field sites, spanning from the first day to one day after the last day of the field observations.
3. Find the field observations with the highest and lowest temperature between 9am to 9am of each day and put them separately in `df` and `df_min_temp`. When there are multiple observations with the highest or lowest temperatures, the ones with the lowest RH are selected.
4. Matches acclimatised temperature and RH values to the field observations in `df` and `df_min_temp` based on the nearest acclimatised grid cell and `acclimatised_observation_date`.
5. For both `df` and `df_min_temp`, calculates **VPD** (vapor pressure deficit) from the matched acclimatised temperature and RH.
6. Saves the combined field and acclimatised data as `acclimatised_max_temp.csv` and `acclimatised_min_temp.csv` in the `output/csv` folder.


⚠️ **Important notes**
* Before running the script, set all variables in the **first cell**, and delete the **second cell** if not using a Google Colab environment.  
  *(The script was developed for use in Google Colab and has not been tested outside of it.)*


In [31]:
working_dir = "/content/drive/My Drive/Work/2025.04 ANU Bushfire"
explore_acclimatised_data = True

In [32]:
# Connect to Google Drive

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Loading in-situ data


In [33]:
# Load in-situ_topography.csv as the main df

import pandas as pd
import os

df = pd.read_csv(os.path.join(working_dir, "output/csv/in-situ_topography.csv"))
df['Datetime'] = pd.to_datetime(df['Datetime'])

first_date, day_after_last_date = min(df['Datetime']).strftime("%Y%m%d"), (max(df['Datetime']) + pd.Timedelta(days=1)).strftime("%Y%m%d")
print("First date: ", first_date, ", One day after the last date: ", day_after_last_date)

df.head()

First date:  20181221 , One day after the last date:  20201107


Unnamed: 0.1,Unnamed: 0,SiteID,X,Y,Datetime,Temperature,RH,VPD,slope,aspect,relief
0,0,79,150.2953,-35.48254,2019-01-20 04:00:00,17.941,97.525473,0.050885,7.76997,187.29736,27.341133
1,1,79,150.2953,-35.48254,2019-01-20 05:00:00,17.753,97.881611,0.043049,7.76997,187.29736,27.341133
2,2,79,150.2953,-35.48254,2019-01-20 06:00:00,17.878,98.236778,0.036114,7.76997,187.29736,27.341133
3,3,79,150.2953,-35.48254,2019-01-20 07:00:00,18.066,97.406114,0.05376,7.76997,187.29736,27.341133
4,4,79,150.2953,-35.48254,2019-01-20 08:00:00,18.379,97.881611,0.044776,7.76997,187.29736,27.341133


# Exploring Acclimatised data

In [34]:
acclimatised_data_dir = os.path.join(working_dir, "Data/Acclimatised")

acclimatised_df = None
if explore_acclimatised_data:
    acclimatised_df = pd.read_csv(os.path.join(acclimatised_data_dir, "SiteID-67_RH.csv"))
    print(acclimatised_df.info())
acclimatised_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 289 entries, 0 to 288
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   DateTime_GMT  289 non-null    object 
 1   RH            289 non-null    float64
 2   Longitude     289 non-null    float64
 3   Latitude      289 non-null    float64
 4   ModelErrAvg   289 non-null    float64
 5   ModelType     289 non-null    object 
dtypes: float64(4), object(2)
memory usage: 13.7+ KB
None


Unnamed: 0,DateTime_GMT,RH,Longitude,Latitude,ModelErrAvg,ModelType
0,2018-12-21,91.5,150.2697,-35.54484,4.8,Regression Trees with kriging
1,2018-12-21 00:30:00,88.9,150.2697,-35.54484,4.9,TPS +1 covariate
2,2018-12-21 01:00:00,92.5,150.2697,-35.54484,4.2,TPS +3 covariates
3,2018-12-21 01:30:00,92.3,150.2697,-35.54484,4.1,GLM with splines
4,2018-12-21 02:00:00,93.0,150.2697,-35.54484,3.4,TPS +2 covariates
...,...,...,...,...,...,...
284,2018-12-26 22:00:00,71.8,150.2697,-35.54484,5.1,Random Forest with splines
285,2018-12-26 22:30:00,70.0,150.2697,-35.54484,5.6,Random Forest with splines
286,2018-12-26 23:00:00,59.9,150.2697,-35.54484,5.7,Random Forest with kriging
287,2018-12-26 23:30:00,62.5,150.2697,-35.54484,5.7,TPS +1 covariate


# Combining in-situ and remote data into a single dataframe

In [35]:
# Generate UTC_Datetime for in-situ observations

from pytz import timezone

aus_tz = timezone('Australia/Sydney')
df['Datetime'] = df['Datetime'].dt.tz_localize(aus_tz, ambiguous='NaT', nonexistent='NaT')
df = df[~df['Datetime'].isna()]
df['UTC_Datetime'] = df['Datetime'].dt.tz_convert('UTC')

df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['UTC_Datetime'] = df['Datetime'].dt.tz_convert('UTC')


Unnamed: 0.1,Unnamed: 0,SiteID,X,Y,Datetime,Temperature,RH,VPD,slope,aspect,relief,UTC_Datetime
0,0,79,150.2953,-35.48254,2019-01-20 04:00:00+11:00,17.941,97.525473,0.050885,7.76997,187.29736,27.341133,2019-01-19 17:00:00+00:00
1,1,79,150.2953,-35.48254,2019-01-20 05:00:00+11:00,17.753,97.881611,0.043049,7.76997,187.29736,27.341133,2019-01-19 18:00:00+00:00
2,2,79,150.2953,-35.48254,2019-01-20 06:00:00+11:00,17.878,98.236778,0.036114,7.76997,187.29736,27.341133,2019-01-19 19:00:00+00:00
3,3,79,150.2953,-35.48254,2019-01-20 07:00:00+11:00,18.066,97.406114,0.05376,7.76997,187.29736,27.341133,2019-01-19 20:00:00+00:00
4,4,79,150.2953,-35.48254,2019-01-20 08:00:00+11:00,18.379,97.881611,0.044776,7.76997,187.29736,27.341133,2019-01-19 21:00:00+00:00


In [43]:
# Fill in df with Acclimatised data

# For each row, open Acclimatised data csv file one-by-one to get data
def get_acclimatised_value(row, column):
    print(row)
    SiteID = row['SiteID']
    file_path = os.path.join(acclimatised_data_dir, f"SiteID-{SiteID}_{column}.csv")
    df_acclimatised = pd.read_csv(file_path)

    target_time = row['UTC_Datetime'].round('30 min').strftime('%Y-%m-%d %H:%M:%S')
    if '00:00:00' in target_time:
        target_time = target_time.replace(' 00:00:00', '')
    column_name = df_acclimatised.columns[df_acclimatised.columns.str.contains(column)]
    # debug
    print(target_time, column_name)
    print(df_acclimatised)
    # if '2019-01-16' in target_time | ('2018-12-27' in target_time):
    #     import numpy as np
    #     return np.nan
    acclimatised_value = df_acclimatised.loc[df_acclimatised['DateTime_GMT'] == target_time, column_name].values[0][0]

    return acclimatised_value

df['acclimatised_Temperature'] = df.apply(lambda row: get_acclimatised_value(row, 'Temperature'), axis=1)
df['acclimatised_RH'] = df.apply(lambda row: get_acclimatised_value(row, 'RH'), axis=1)
df.head()

Unnamed: 0                              0
SiteID                                 79
X                                150.2953
Y                               -35.48254
Datetime        2019-01-20 04:00:00+11:00
Temperature                        17.941
RH                              97.525473
VPD                              0.050885
slope                             7.76997
aspect                          187.29736
relief                          27.341133
UTC_Datetime    2019-01-19 17:00:00+00:00
Name: 0, dtype: object
2019-01-19 17:00:00 Index(['AirTemperature'], dtype='object')
            DateTime_GMT  AirTemperature  Longitude  Latitude  ModelErrAvg  \
0             2019-01-16            27.6   150.2953 -35.48254          0.8   
1    2019-01-16 00:30:00            28.2   150.2953 -35.48254          0.9   
2    2019-01-16 01:00:00            29.1   150.2953 -35.48254          1.1   
3    2019-01-16 01:30:00            29.1   150.2953 -35.48254          1.1   
4    2019-01-16 02:00

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['acclimatised_Temperature'] = df.apply(lambda row: get_acclimatised_value(row, 'Temperature'), axis=1)


2019-01-19 19:00:00 Index(['RH'], dtype='object')
            DateTime_GMT    RH  Longitude  Latitude  ModelErrAvg  \
0             2019-01-16  61.4   150.2953 -35.48254          5.2   
1    2019-01-16 00:30:00  61.3   150.2953 -35.48254          5.4   
2    2019-01-16 01:00:00  55.5   150.2953 -35.48254          5.7   
3    2019-01-16 01:30:00  64.2   150.2953 -35.48254          6.2   
4    2019-01-16 02:00:00  61.7   150.2953 -35.48254          6.2   
..                   ...   ...        ...       ...          ...   
668  2019-01-29 22:00:00  84.4   150.2953 -35.48254          4.4   
669  2019-01-29 22:30:00  76.4   150.2953 -35.48254          4.2   
670  2019-01-29 23:00:00  76.0   150.2953 -35.48254          5.1   
671  2019-01-29 23:30:00  72.4   150.2953 -35.48254          5.5   
672           2019-01-30  71.7   150.2953 -35.48254          5.7   

                         ModelType  
0       Random Forest with splines  
1       Random Forest with splines  
2       Random Forest 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['acclimatised_RH'] = df.apply(lambda row: get_acclimatised_value(row, 'RH'), axis=1)


Unnamed: 0.1,Unnamed: 0,SiteID,X,Y,Datetime,Temperature,RH,VPD,slope,aspect,relief,UTC_Datetime,acclimatised_Temperature,acclimatised_RH
0,0,79,150.2953,-35.48254,2019-01-20 04:00:00+11:00,17.941,97.525473,0.050885,7.76997,187.29736,27.341133,2019-01-19 17:00:00+00:00,18.5,87.2
1,1,79,150.2953,-35.48254,2019-01-20 05:00:00+11:00,17.753,97.881611,0.043049,7.76997,187.29736,27.341133,2019-01-19 18:00:00+00:00,18.5,87.6
2,2,79,150.2953,-35.48254,2019-01-20 06:00:00+11:00,17.878,98.236778,0.036114,7.76997,187.29736,27.341133,2019-01-19 19:00:00+00:00,18.6,87.6
3,3,79,150.2953,-35.48254,2019-01-20 07:00:00+11:00,18.066,97.406114,0.05376,7.76997,187.29736,27.341133,2019-01-19 20:00:00+00:00,19.0,85.0
4,4,79,150.2953,-35.48254,2019-01-20 08:00:00+11:00,18.379,97.881611,0.044776,7.76997,187.29736,27.341133,2019-01-19 21:00:00+00:00,19.5,81.2


In [44]:
# Investigate the result
print("df length: ", len(df))
df[(df.isna().any(axis=1)) & (df['SiteID'] != 251)]

df length:  100


Unnamed: 0.1,Unnamed: 0,SiteID,X,Y,Datetime,Temperature,RH,VPD,slope,aspect,relief,UTC_Datetime,acclimatised_Temperature,acclimatised_RH


# Calculating remote VPD from remote temperature and remote relative humidity

In [45]:
# Write a function for deriving VPD from temp and RH

import math
import numpy as np

def calculate_vpd(temp, rh):
    if pd.isna(temp) or pd.isna(rh):
        return np.nan
    es = 0.6108 * math.exp(17.27 * temp / (237.3 + temp))
    e = es * rh / 100
    vpd = es - e
    return vpd

df['acclimatised_VPD'] = df.apply(lambda row: calculate_vpd(row['acclimatised_Temperature'], row['acclimatised_RH']), axis=1)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['acclimatised_VPD'] = df.apply(lambda row: calculate_vpd(row['acclimatised_Temperature'], row['acclimatised_RH']), axis=1)


Unnamed: 0.1,Unnamed: 0,SiteID,X,Y,Datetime,Temperature,RH,VPD,slope,aspect,relief,UTC_Datetime,acclimatised_Temperature,acclimatised_RH,acclimatised_VPD
0,0,79,150.2953,-35.48254,2019-01-20 04:00:00+11:00,17.941,97.525473,0.050885,7.76997,187.29736,27.341133,2019-01-19 17:00:00+00:00,18.5,87.2,0.272611
1,1,79,150.2953,-35.48254,2019-01-20 05:00:00+11:00,17.753,97.881611,0.043049,7.76997,187.29736,27.341133,2019-01-19 18:00:00+00:00,18.5,87.6,0.264092
2,2,79,150.2953,-35.48254,2019-01-20 06:00:00+11:00,17.878,98.236778,0.036114,7.76997,187.29736,27.341133,2019-01-19 19:00:00+00:00,18.6,87.6,0.265751
3,3,79,150.2953,-35.48254,2019-01-20 07:00:00+11:00,18.066,97.406114,0.05376,7.76997,187.29736,27.341133,2019-01-19 20:00:00+00:00,19.0,85.0,0.329609
4,4,79,150.2953,-35.48254,2019-01-20 08:00:00+11:00,18.379,97.881611,0.044776,7.76997,187.29736,27.341133,2019-01-19 21:00:00+00:00,19.5,81.2,0.426173


# Save the resulting dataframes

In [46]:
df.to_csv(os.path.join(working_dir, "output/csv/acclimatised.csv"), index=False)