# Start

This script prepares vapor pressure deficit (VPD) data from BARRA2 and matches it with VPD from field observations for subsequent analysis.

📄 **What this script does**
1. Loads cleaned field data with topography — for example, output from `Nick_phd_data_complilation.ipynb`.
2. Downloads and explores **BARRA2 data** (`tas` for temperature and `hurs` for relative humidity) for the grid cells closest to the field sites, spanning from the first to the last month of the field observations.
3. Matches BARRA2 temperature and RH values to the field observations based on the nearest grid cell and UTC timestamp.
4. Calculates **VPD** (vapor pressure deficit) from the matched BARRA2 temperature and RH.
5. Saves the combined field and BARRA2 data as `barra2.csv` in the `output/csv` folder.

⚠️ **Important notes**
* Before running the script, set all variables in the **first cell**, and delete the **second cell** if not using a Google Colab environment.  
  *(The script was developed for use in Google Colab and has not been tested outside of it.)*
* The field data timestamps must be in **Australian local time**. The script will convert them to **UTC** before matching with BARRA2 data.
* BARRA2's temperature and RH are **instantaneous** readings.
* Field observations with **ambiguous timestamps** (e.g., during the daylight saving transition from AEDT to AEST) will be **excluded** from the output.




In [None]:
input_file_name = 'in-situ_topography_phd.csv'
output_file_name = 'barra2_vpd_phd.csv'

download_barra2_data = False  # around 30 min in Colab
explore_barra2_data = False

In [None]:
import sys
sys.path.append('../..')
from Utils.vpd import calculate_vpd
from Utils.barra2 import *
from Utils import add_UTC_Datetime

# Loading in-situ and remote data


In [None]:
# Load in-situ_topography.csv as the main df

df = pd.read_csv(os.path.join("..", "..", "output", "csv", input_file_name))
df['Datetime'] = pd.to_datetime(df['Datetime'])

first_datetime, last_datetime = min(df['Datetime']), (max(df['Datetime']) + pd.Timedelta(days=1))
print("First date: ", first_datetime.strftime("%Y%m%d"), ", last date: ", last_datetime.strftime("%Y%m%d"))

df.head()

## Downloading BARRA2 data

In [None]:
# Find the nearest barra2 grid for each site

barra2_lats, barra2_lons =  get_barra2_grid_point()
df[['barra2_X', 'barra2_Y']] = df.apply(
    lambda row: pd.Series(find_nearest_barra2_grid_point(row['X'], row['Y'], barra2_lons, barra2_lats)),
    axis=1
)

In [None]:
# Download all barra2 data

# List all barra2 cells that we want their data
barra2_cell_locations_list = list(set((x, y) for x, y in df[['barra2_X', 'barra2_Y']].values))
print("barra2_cell_locations_list length: ", len(barra2_cell_locations_list))
print("barra2_cell_locations_list: ", [(str(x), str(y)) for x, y in barra2_cell_locations_list])

if download_barra2_data:
    vars = ['tas', 'hurs']
    download_all_barra2_data(vars, barra2_cell_locations_list, first_datetime, last_datetime)

In [None]:
# # BARRA2 data exploration

# if explore_barra2_data:
#     barra2_data_dir = os.path.join("..", "..", "Data", "barra2")
#     barra2_df = pd.read_csv(os.path.join(barra2_data_dir, os.listdir(barra2_data_dir)[0]))
#     barra2_df.info()

In [None]:
# # BARRA2 netCDF data exploration

# import xarray as xr

# if explore_barra2_data:
#     tas_url = "https://thredds.nci.org.au/thredds/fileServer/ob53/output/reanalysis/AUST-04/BOM/ERA5/historical/hres/BARRA-C2/v1/1hr/tas/latest/tas_AUST-04_ERA5_historical_hres_BOM_BARRA-C2_v1_1hr_201812-201812.nc"
#     hurs_url = "https://thredds.nci.org.au/thredds/fileServer/ob53/output/reanalysis/AUST-04/BOM/ERA5/historical/hres/BARRA-C2/v1/1hr/hurs/latest/hurs_AUST-04_ERA5_historical_hres_BOM_BARRA-C2_v1_1hr_201812-201812.nc"
#     !curl -L {tas_url} -o "barra2_tas.nc"
#     !curl -L {hurs_url} -o "barra2_hurs.nc"

#     tas_ds = xr.open_dataset("barra2_tas.nc")
#     hurs_ds = xr.open_dataset("barra2_hurs.nc")
#     tas_ds

In [None]:
# if explore_barra2_data:
#     hurs_ds

# Combining in-situ and remote data into a single dataframe

In [None]:
# Generate UTC_Datetime for in-situ observations

df = add_UTC_Datetime(df)
df.head()

In [None]:
# For each row, open barra2 data csv file one-by-one to get data (40 min)

df['barra2_Temperature'] = df.apply(lambda row: get_barra2_value(row, 'tas'), axis=1)
df['barra2_RH'] = df.apply(lambda row: get_barra2_value(row, 'hurs'), axis=1)
df.head()

In [None]:
# Investigate null values.
df[df.isna().any(axis=1)]

# Calculating remote VPD from remote temperature and remote relative humidity

In [None]:
df['barra2_VPD'] = df.apply(lambda row: calculate_vpd(row['barra2_Temperature'], row['barra2_RH']), axis=1)
df.head()

# Save the resulting dataframes

In [None]:
df.to_csv(os.path.join("..", "..", "output", "csv", output_file_name), index=False)