isoNet Sample Points
====================
I will be using this notebook to generate the sample points for the isoNet dataset. I need the HydroGFD files as well, which will not be visible in this folder if you are viewing this on Github or downloaded from there. The instructions on dowloading them are in the data section of the code. It will be the exact same dataset used!

In [2]:
# Load libraries
import pandas as pd
import numpy as np
import tensorflow as tf
import netCDF4 as nc
import glob
import datetime

2024-03-21 21:06:52.471780: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-21 21:06:52.514470: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-21 21:06:52.514502: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-21 21:06:52.515699: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-21 21:06:52.525033: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-21 21:06:52.525969: I tensorflow/core/platform/cpu_feature_guard.cc:1

In [3]:
# Load in sample data from csv
samplePoints = pd.read_csv('SamplePoints_Alt.csv')

# Change Alt (m) to just Alt
samplePoints = samplePoints.rename(columns={'Alt (m)': 'Alt'})

## Add in dates for the data
I need data from 1988 to 2010. Every lat lon coordinate must have data for every day in that range. I will use the HydroGFD data to get the dates for the data.

In [4]:
# I need to create a new dataframe, I will do this by cycling through the samplePoints dataframe and creating a new dataframe with the same columns
# However, this time at each row within the samplePoints dataframe, I will copy that information with the date range of 1988-2010.
# This will mean that each coordinate will have monthly data for 22 years.

# Create a new dataframe
isoNet_Sample = pd.DataFrame(columns=['Lat', 'Lon', 'Alt', 'Year', 'Month'])

# Create a list of years
years = list(range(1988, 2011))

# Create a list of months
months = list(range(1, 13))

n = len(years) * len(months)

# Cycle through the samplePoints dataframe
for index, row in samplePoints.iterrows():
    temp = pd.DataFrame([row for _ in range(n)])

    # Add the years and days to the dataframe
    temp['Year'] = np.repeat(years, len(months))
    temp['Month'] = np.tile(months, len(years))

    # Append the temp dataframe to the isoNet_Sample dataframe
    isoNet_Sample = pd.concat([isoNet_Sample, temp], ignore_index=True)
    
    percent = (index + 1) / len(samplePoints) * 100
    print('Progress: ' + str(round(percent, 2)) + '%', end='\r')

  isoNet_Sample = pd.concat([isoNet_Sample, temp], ignore_index=True)


Progress: 100.0%

## Extract the HydroGFD data separately using the coordinates

In [5]:
# Create a dataframe of coords
stationCoords = pd.read_csv("latlon_points.csv")

In [6]:
# Starting with precip files 
path = "HydroGFD/prAdjust*"
precipFiles = glob.glob(path)
precipFiles.sort()
precipFiles

# Create an empty dataframe to store the data from the netCDF files, with the columns: lat, lon, Time, and Precip
precip = pd.DataFrame(columns=['Lat', 'Lon', 'Time', 'Precip'])

# Cycle through the precipFiles
# Loop through each file and pull out the data at each time step for every lat and lon coordinate we have in the CNIP dataset that is stored in the stationCoords dictionary
for file in precipFiles:
    ncid = nc.Dataset(file, "r")

    #Pull out the time data and coordiante data
    time = ncid.variables["time"][:].filled(np.nan)
    lat = ncid.variables["lat"][:].filled(np.nan)
    lon = ncid.variables["lon"][:].filled(np.nan)

    for coords in stationCoords.itertuples(index=False):
        latIndex = (np.abs(lat - coords[0])).argmin()
        lonIndex = (np.abs(lon - coords[1])).argmin()

        # Pull out the precipitation data at each time step
        precipData = ncid.variables["prAdjust"][:, latIndex, lonIndex].filled(0) #Filling with 0 is an assumption that if there is no data, then there is no precipitation
        
        # Place the lat, lon, time, and precipitation data into a dataframe
        df = pd.DataFrame({"Lat": coords[0], "Lon": coords[1], "Time": time, "Precipitation": precipData})
        precip = pd.concat([precip, df], ignore_index=True)
    print("Finished extracting data from " + file[-20:-3])
    ncid.close()

# Convert the time data to datetime format
precip["Time"] = precip["Time"].apply(lambda x: datetime(1850, 1, 1) + timedelta(days=x))

  precip = pd.concat([precip, df], ignore_index=True)


KeyboardInterrupt: 