isoNet Sample Points
====================
I will be using this notebook to generate the sample points for the isoNet dataset. I need the HydroGFD files as well, which will not be visible in this folder if you are viewing this on Github or downloaded from there. The instructions on dowloading them are in the data section of the code. It will be the exact same dataset used!

In [None]:
# Load libraries
import pandas as pd
import numpy as np
import tensorflow as tf
import netCDF4 as nc
import glob
import datetime

In [None]:
# Load in sample data from csv
samplePoints = pd.read_csv('SamplePoints_Alt.csv')

# Change Alt (m) to just Alt
samplePoints = samplePoints.rename(columns={'Alt (m)': 'Alt'})

## Add in dates for the data
I need data from 1988 to 2010. Every lat lon coordinate must have data for every day in that range. I will use the HydroGFD data to get the dates for the data.

In [None]:
# I need to create a new dataframe, I will do this by cycling through the samplePoints dataframe and creating a new dataframe with the same columns
# However, this time at each row within the samplePoints dataframe, I will copy that information with the date range of 1988-2010.
# This will mean that each coordinate will have monthly data for 22 years.

# Create a new dataframe
isoNet_Sample = pd.DataFrame(columns=['Lat', 'Lon', 'Alt', 'Year', 'Month'])

# Create a list of years
years = list(range(1988, 2011))

# Create a list of months
months = list(range(1, 13))

n = len(years) * len(months)

# Cycle through the samplePoints dataframe
for index, row in samplePoints.iterrows():
    temp = pd.DataFrame([row for _ in range(n)])

    # Add the years and days to the dataframe
    temp['Year'] = np.repeat(years, len(months))
    temp['Month'] = np.tile(months, len(years))

    # Append the temp dataframe to the isoNet_Sample dataframe
    isoNet_Sample = pd.concat([isoNet_Sample, temp], ignore_index=True)
    
    percent = (index + 1) / len(samplePoints) * 100
    print('Progress: ' + str(round(percent, 2)) + '%', end='\r')

In [28]:
# Make a Date column and remove the Year and Month columns
isoNet_Sample['Date'] = pd.to_datetime(isoNet_Sample[['Year', 'Month']].assign(day=1))
isoNet_Sample = isoNet_Sample.drop(columns=['Year', 'Month'])

## Extract the HydroGFD data separately using the coordinates

In [37]:
# I will now cycle through the isoNet_Sample dataframe and add the data from the netCDF files to the dataframe one row at a time. I will need to create a new column for each variable in the netCDF files starting with precipitation.
isoNet_Sample['Precip'] = np.nan

# Create a list of the netCDF files and ranges through
files = glob.glob('HydroGFD/prAdjust*')
fileRanges = [(int(file[-20:-16]), int(file[-11:-7])) for file in files]

# Cycle through the isoNet_Sample dataframe
for index, row in isoNet_Sample.iterrows():
    # Get the date and creat a time variable representing that date as days since 1850-01-01
    date = row['Date']
    time = (date - datetime.datetime(1850, 1, 1)).days
    # Find the file that contains the date by looking at the fileRanges list and examining the start and end years of each tuple in each element of the list
    for i in range(len(fileRanges)):
        if date.year >= fileRanges[i][0] and date.year <= fileRanges[i][1]:
            file = files[i]
            break
    
    # Open the netCDF file
    ncid = nc.Dataset(file)
    # Pull out the coordinates and time variables
    latCDF = ncid.variables['lat'][:].filled()
    lonCDF = ncid.variables['lon'][:].filled()
    timeCDF = ncid.variables['time'][:].filled()

    # Find the index of the closest latitude, longitude, and time to the row's latitude, longitude, and time
    latIndex = np.argmin(np.abs(latCDF - row['Lat']))
    lonIndex = np.argmin(np.abs(lonCDF - row['Lon']))
    timeIndex = np.argmin(np.abs(timeCDF - time))

    # Pull out the precipitation data
    precipData = ncid.variables['prAdjust'][timeIndex, latIndex, lonIndex].filled(np.nan)

    # Add the precipitation data to the row
    isoNet_Sample.at[index, 'Precip'] = precipData.item()

    percent = (index + 1) / len(isoNet_Sample) * 100
    print('Progress: ' + str(round(percent, 2)) + '%', end='\r') 


Progress: 100.0%

In [38]:
# Now I will add the temperature data to the dataframe
isoNet_Sample['Temp'] = np.nan

# Create a list of the netCDF files and ranges through
files = glob.glob('HydroGFD/tasAdjust*')
fileRanges = [(int(file[-20:-16]), int(file[-11:-7])) for file in files]

for index, row in isoNet_Sample.iterrows():
    # Get the date and creat a time variable representing that date as days since 1850-01-01
    date = row['Date']
    time = (date - datetime.datetime(1850, 1, 1)).days
    # Find the file that contains the date by looking at the fileRanges list and examining the start and end years of each tuple in each element of the list
    for i in range(len(fileRanges)):
        if date.year >= fileRanges[i][0] and date.year <= fileRanges[i][1]:
            file = files[i]
            break
    
    # Open the netCDF file
    ncid = nc.Dataset(file)
    # Pull out the coordinates and time variables
    latCDF = ncid.variables['lat'][:].filled()
    lonCDF = ncid.variables['lon'][:].filled()
    timeCDF = ncid.variables['time'][:].filled()

    # Find the index of the closest latitude, longitude, and time to the row's latitude, longitude, and time
    latIndex = np.argmin(np.abs(latCDF - row['Lat']))
    lonIndex = np.argmin(np.abs(lonCDF - row['Lon']))
    timeIndex = np.argmin(np.abs(timeCDF - time))

    # Pull out the precipitation data
    tempData = ncid.variables['tasAdjust'][timeIndex, latIndex, lonIndex].filled(np.nan)

    # Add the precipitation data to the row
    isoNet_Sample.at[index, 'Temp'] = tempData.item()

    percent = (index + 1) / len(isoNet_Sample) * 100
    print('Progress: ' + str(round(percent, 2)) + '%', end='\r') 


Progress: 100.0%

In [39]:
# Save the dataframe to a csv file
isoNet_Sample.to_csv('isoNet_Sample_nopreds.csv', index=False)

## Generate the estimations for the data

In [40]:
# Convert the Date column to year and day of year columns
isoNet_Sample['Year'] = isoNet_Sample['Date'].dt.year
isoNet_Sample['DOY'] = isoNet_Sample['Date'].dt.dayofyear

isoNet_Sample = isoNet_Sample.drop(columns=['Date'])

In [41]:
# Reorder the columns for the neural network
isoNet_Sample = isoNet_Sample[['Lat', 'Lon', 'Alt', 'Precip', 'Temp', 'Year', 'DOY']]

In [42]:
# Load in the neural network
model = tf.keras.models.load_model('isoNet.keras')

In [43]:
# Do the predictions
isoNet_Sample['Pred'] = model.predict(isoNet_Sample[['Lat', 'Lon', 'Alt', 'Precip', 'Temp', 'Year', 'DOY']])

2024-03-26 09:39:45.294316: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 78810144 exceeds 10% of free system memory.




In [51]:
# Convert the year and Day of year (DOY) columns to a date column
isoNet_Sample['Date'] = pd.to_datetime(isoNet_Sample['Year'].astype(str) + ' ' + isoNet_Sample['Day'].astype(str), format='%Y %j')


In [53]:
# Drop the Year and Day columns
isoNet_Sample = isoNet_Sample.drop(columns=['Year', 'Day'])

isoNet_Sample.head()

Unnamed: 0,Lat,Lon,Alt,Precip,Temp,Pred,Date
0,83.028903,-77.237688,31.0,4e-06,241.804398,-26.572453,1988-01-01
1,83.028903,-77.237688,31.0,3e-06,249.465118,-93.013992,1988-02-01
2,83.028903,-77.237688,31.0,0.0,246.644028,-98.664703,1988-03-01
3,83.028903,-77.237688,31.0,3e-06,248.061203,-98.636848,1988-04-01
4,83.028903,-77.237688,31.0,2e-06,254.822495,-98.14312,1988-05-01


In [48]:
# Save the dataframe to a csv file
isoNet_Sample.to_csv('isoNet_Sample_preds.csv', index=False)