# IsoNet
------
This will be used to create the model for predicting precipitable stable water isotopes. Only creates and exports the model and trains it. Is not used to predict.

Table of Contents:
1. [Importing Data](#importing-data)
2. [Data Preprocessing](#data-preprocessing)
3. [Model Creation](#model-creation)
4. [Model Training](#model-training)
5. [Model Export](#model-export)

In [168]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import datetime
import matplotlib.pyplot as plt

# Tensorflow and keras libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, InputLayer
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import RootMeanSquaredError, MeanAbsoluteError

## Importing Data
Importing the data, and then splitting it into features and labels

In [169]:
# Load in the data, separate the features and labels
data = pd.read_csv(r'Isoscape_Data.csv')
data = data.drop(['H2avg', 'dex'], axis=1)

# Delete the rows with NaN values
data = data.dropna()

features = data.drop(['O18Avg', 'Station'], axis=1)
labels = data['O18Avg']

# Adjusting the date time to be int values where Janurary 1st, of year 1 is 1
features['Date'] = pd.to_datetime(features['Date'])
#features['Date'] = features['Date'].map(datetime.datetime.toordinal)

# Convert the date column into year and day of year 
features['Year'] = features['Date'].dt.year
features['Day'] = features['Date'].dt.dayofyear
features = features.drop(['Date'], axis=1)

numFeatures = features.shape[1]

features.columns

Index(['Lat', 'Long', 'Alt', 'Precipitation (kg/m^2/s)', 'Temperature (K)',
       'Year', 'Day'],
      dtype='object')

## Data Preprocessing
Preprocessing the data, including:
* Convert into numpy arrays
* Split into training and testing sets
* Convert into tensorflow Datasets
* Windowing the data (Skipping for now, will do later) 
* Standardizing the data

In [170]:
# Split the data into training and testing sets
splitIndex = int(0.8 * len(features))
xTrain = features[:splitIndex]
xTest = features[splitIndex:]
yTrain = labels[:splitIndex]
yTest = labels[splitIndex:]

# Scale the data
scaler = StandardScaler()
xTrain = scaler.fit_transform(xTrain)
xTest = scaler.transform(xTest)

# Convert the data into numpy arrays
xTrain = np.array(xTrain)
yTrain = np.array(yTrain)
xVal = xTrain[:int(0.2 * len(xTrain))]
yVal = yTrain[:int(0.2 * len(yTrain))]
xTrain = xTrain[int(0.2 * len(xTrain)):]
yTrain = yTrain[int(0.2 * len(yTrain)):]

xTest = np.array(xTest)
yTest = np.array(yTest)


# Convert traindata into tensor dataset
trainData = tf.data.Dataset.from_tensor_slices((xTrain, yTrain))
valData = tf.data.Dataset.from_tensor_slices((xVal, yVal))

## Model Creation
Creating the model, including:
* Creating the model architecture
* Compiling the model

Model architecture:
* Input layer
* 1 LSTM layer
* 2 Dense layers
* Output layer

In [171]:
numNeuorns = 512
model = Sequential()
model.add(InputLayer(input_shape=(numFeatures, 1)))
model.add(LSTM(numNeuorns))
model.add(Dense(numNeuorns, activation='relu'))
#model.add(Dense(numNeuorns, activation='relu'))
model.add(Dense(1))

model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=[RootMeanSquaredError(), MeanAbsoluteError()])

## Model Training
Training and evaluating the model

In [172]:
es = EarlyStopping(monitor='val_loss', mode = 'min', patience=150, restore_best_weights=True)

batchSize = 64
model.fit(
    trainData.batch(batchSize),
    batch_size=batchSize,
    epochs=300,
    validation_data=valData.batch(batchSize),
    callbacks=[es]
)

Epoch 1/300


Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78/300
Epoch 7

<keras.src.callbacks.History at 0x7f39146f7f10>

## Model Export

In [173]:
# Printing how well the model did on the test data
print(model.evaluate(xTest, yTest))

[22.59682273864746, 4.7536115646362305, 3.482576608657837]


In [174]:
# Save the model
model.save('Models/UN-NAMED.keras')

In [137]:
# Load the best Model
model = tf.keras.models.load_model('Models/BestModel.keras')

In [138]:
# Create new pd DataFrame with the predictions and actual values including the features
predictions = model.predict(xTest)
df = features[splitIndex:]
df['Predictions'] = predictions
df['Actual'] = yTest

# Export the data to a csv file
df.to_csv('results_test.csv', index=False)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Predictions'] = predictions
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Actual'] = yTest


In [139]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_8 (LSTM)               (None, 256)               264192    
                                                                 
 dense_24 (Dense)            (None, 256)               65792     
                                                                 
 dense_25 (Dense)            (None, 256)               65792     
                                                                 
 dense_26 (Dense)            (None, 1)                 257       
                                                                 
Total params: 396033 (1.51 MB)
Trainable params: 396033 (1.51 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
