https://aqs.epa.gov/aqsweb/airdata/download_files.html#Raw
➡ CO (42101)

This is the link to the original dataset the model needs to be trained on

**The data file I used in this notebook is a reduced version of the original to run faster on colab

**Original(1000000+ lines) ThisVersion(200 lines)

In [46]:
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

In [29]:
#Read csv

df = pd.read_csv("hourly_42101_2021_new.csv")

In [None]:
#check df

df

In [31]:
#dropping all irrelevant columns (only kept the necessary ones)

df = df.drop(["State Code", "County Code", "Parameter Code", "POC", "Datum",
              "Site Num", "MDL", "Uncertainty", "Qualifier", "Method Type",
              "Method Code", "Method Name", "Date of Last Change",
              "Parameter Name", "Units of Measure", "State Name", "County Name",
              "Date GMT", "Time GMT"], axis = 'columns')

In [None]:
#checking drop

df

In [33]:
#First function converts date into day of year, and then normalizes it to scale of 0 to 1
#Second function converts time of day, into hour of day, and then normalizes it to scale of 0 to 1

def adjust_date(arr):
  adjusted_date_local = []

  for date in arr:
    temp = date.split("/")
    curr = temp[2] + "-" + temp[0] + "-"  + temp[1]
    period = pd.Period(curr)
    adjusted_date_local.append(int(period.day_of_year)/365)

  return adjusted_date_local

def adjust_time(arr):
  adjusted_time_local = []

  for time in arr:
    strTime = time.replace(":", ".")
    adjusted_time_local.append(float(strTime)/24)

  return adjusted_time_local

In [34]:
#Creating the new adjusted columns by applying the functions to existing values
#Deleting old non-formatted columns

df["Date Local (adjusted)"] = adjust_date(df["Date Local"])
df["Time Local (adjusted)"] = adjust_time(df["Time Local"])

df = df.drop(["Date Local", "Time Local"], axis = 'columns')

In [35]:
#function that normalizes the longitude and latitude to a scale of 0 to 1

def adjust_long_lat(arr):
  adjusted_long_lat = []

  for pos in arr:
    adjusted_long_lat.append(float(pos)/180)

  return adjusted_long_lat

In [36]:
#Once again adding the new adjusted columns and deleting old non-formatted columns

df["Latitude (adjusted)"] = adjust_long_lat(df["Latitude"])
df["Longitude (adjusted)"] = adjust_long_lat(df["Longitude"])

df = df.drop(["Latitude", "Longitude"], axis = 'columns')

In [None]:
#checking new edits to df

df

In [38]:
#Turning "Sample Measurement" column into target (output) array and dropping from df

target = df["Sample Measurement"]
df = df.drop("Sample Measurement", axis = 'columns')

In [None]:
#Splitting dataset into 80/20 train/test split and checking shape

split_point = int(0.8 * len(df))
x_train = df[0:split_point]
x_test = df[split_point:]
y_train = target[0:split_point]
y_test = target[split_point:]

print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

In [40]:
#converting df DataFrame objects to Numpy, so reshape function can be applied

x_train = x_train.to_numpy()
x_test = x_test.to_numpy()
y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

In [41]:
#Reshaping current data format to fit 3D input array for LSTM
#(num_of_samples, num_timesteps, num_features)
#(total # of data points, 1 incremental timestep, 4 input categories)

x_train_reshaped = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_test_reshaped = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))

In [42]:
#Building LSTM model

model = Sequential() #For linear stack of layers
model.add(LSTM(x_train.shape[0], input_shape=(1, x_train.shape[1]))) #LSTM dimensions (≈200, 1, 4)
model.add(Dense(1)) #Fully connected output layer with one node (for "Sample Measurements")

In [None]:
#Compiling and fitting dataset to model

model.compile(loss='mean_squared_error', optimizer='adam') #MSE loss function since this is a regression problem

model.fit(x_train_reshaped, y_train, epochs=10, batch_size=32)

In [None]:
#Testing accuracy

loss = model.evaluate(x_test_reshaped, y_test)
print('Test loss:', 100 * loss, "%")

Model definitely needs fine-tuning and editing.

Extremely sparse data being used also contributed to terrible accuracy