## GREMLIN dataset Preprocessing

This dataset is from this link: https://mountainscholar.org/handle/10217/235392

and licensed CC:BY for the most part.

Paper on the UNET they used is here: https://journals.ametsoc.org/view/journals/apme/60/1/jamc-d-20-0084.1.xml?tab_body=pdf

In [1]:
import xarray as xr #have to install the python netCDF reader as well
import numpy as np

import pandas as pd
import random


import matplotlib.pyplot as plt

In [2]:
# Functions wrote just for this
from timeslicer import int_splits, randomizer

# Loading in the netCDF

In [3]:
data = 'gremlin_conus2_dataset.nc'
ds = xr.open_dataset(data)

In [4]:
ds

In [5]:
ds = ds.assign_coords(time=ds.time)
ds

## Variable Pre-Process

In [6]:
num_slices = len(ds.time)
print('Number of time slices:', num_slices)

Number of time slices: 2246


In [7]:
train_set, test_set, val_set = int_splits(int_length=num_slices)

In [8]:
train_rnd, test_rnd, val_rnd = randomizer(train_set, test_set, val_set)

number of training slices: 1798
number of testing slices: 336
number of validation slices: 112


Just to double check my math:

In [9]:
np.shape(ds.GOES_ABI_C07.data[train_rnd]) == np.shape(ds.GOES_ABI_C07.data[:train_set])
np.shape(ds.GOES_ABI_C07.data[test_rnd]) == np.shape(ds.GOES_ABI_C07.data[train_set:(train_set+test_set)])
np.shape(ds.GOES_ABI_C07.data[val_rnd]) == np.shape(ds.GOES_ABI_C07.data[(train_set+test_set):(train_set+test_set+val_set)])

True

We can turn this into a pandas dataset. 

TODO: Make a loop that does this!

In [10]:
d_rnd_train = {
    'ABI_C07': ds.GOES_ABI_C07.data[train_rnd].ravel(),
    'ABI_C09': ds.GOES_ABI_C09.data[train_rnd].ravel(),
    'ABI_C13': ds.GOES_ABI_C13.data[train_rnd].ravel(),
    'GLM': ds.GOES_GLM_GROUP.data[train_rnd].ravel(),
    'MRMS_REFC': ds.MRMS_REFC.data[train_rnd].ravel()
    }


df_rnd_train = pd.DataFrame(data=d_rnd_train)
df_rnd_train = df_rnd_train[df_rnd_train.MRMS_REFC != -99.0]
df_rnd_train = df_rnd_train.dropna()
df_rnd_train.to_parquet('../datasets/df_rnd_train.parquet')

In [11]:
d_rnd_test = {
    'ABI_C07': ds.GOES_ABI_C07.data[test_rnd].ravel(),
    'ABI_C09': ds.GOES_ABI_C09.data[test_rnd].ravel(),
    'ABI_C13': ds.GOES_ABI_C13.data[test_rnd].ravel(),
    'GLM': ds.GOES_GLM_GROUP.data[test_rnd].ravel(),
    'MRMS_REFC': ds.MRMS_REFC.data[test_rnd].ravel()
    }

df_rnd_test = pd.DataFrame(data=d_rnd_test)

df_rnd_test = df_rnd_test[df_rnd_test.MRMS_REFC != -99.0]
df_rnd_test = df_rnd_test.dropna()
df_rnd_test.to_parquet('../datasets/df_rnd_test.parquet')

In [12]:
d_rnd_val = {
    'ABI_C07': ds.GOES_ABI_C07.data[val_rnd].ravel(),
    'ABI_C09': ds.GOES_ABI_C09.data[val_rnd].ravel(),
    'ABI_C13': ds.GOES_ABI_C13.data[val_rnd].ravel(),
    'GLM': ds.GOES_GLM_GROUP.data[val_rnd].ravel(),
    'MRMS_REFC': ds.MRMS_REFC.data[val_rnd].ravel()
    }

df_rnd_val = pd.DataFrame(data=d_rnd_val)
df_rnd_val = df_rnd_val[df_rnd_val.MRMS_REFC != -99.0]
df_rnd_val = df_rnd_val.dropna()
df_rnd_val.to_parquet('../datasets/df_rnd_val.parquet')