Model 1 Test Data Analysis
==========================
Table of Contents
-----------------
1. [Data Loading](#Data-Loading)
2. [Latitudinal Analysis](#Latitudinal-Analysis)
3. [Longitudinal Analysis](#Longitudinal-Analysis)
4. [Time Series Analysis](#Time-Series-Analysis)
5. [Conclusion](#Conclusion)

In [42]:
# Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [43]:
# Set the model number for the model that was trained
modelNum = 1

# Data Loading

In [44]:
df = pd.read_csv(f'../Models/Model_1/Model_{modelNum}_TestData.csv')
df.head()

Unnamed: 0,Lat,Lon,Alt,Precip (mm),Temp (°C),Year,JulianDaySin,O18 (‰) Actual,H2 (‰) Actual,O18 (‰) Predicted,H2 (‰) Predicted
0,37.30268,-2.90198,1773.0,0.0,554.3021,2016.0,-0.271958,-12.13,-82.3,-9.577793,-63.93953
1,28.22,-177.37,13.0,136.0,26.3,1974.0,-0.213521,-1.81,-5.2,-1.747705,-4.445577
2,47.12,9.49,450.0,4.274954,555.598486,2013.0,-0.9741,-13.7,-102.15,-9.57074,-64.93119
3,46.398641,6.233804,436.0,0.0,553.571753,2020.0,-0.711657,-7.92,-56.84,-10.892567,-75.69708
4,-7.315278,72.428056,1.0,578.0,573.957068,1987.0,0.951057,-3.04,-15.5,-5.473124,-31.105398


## Preparing the dataframe for analysis
Currently the dataframe looks good but in order to preform some analysis we will need to clean it up just a bit. We will start by converting the date to a datetime object and then by changing column names to not include units of measurement.

In [45]:
# Convert Year and Julian Sine Day to a single date
df['JulianDay'] = np.ceil((np.arcsin(df['JulianDaySin']) / np.pi + 0.5) * 365).astype(int)
df['Year'] = df['Year'].astype(int)

# Combine the Year and Julian Day into a single date
df['Date'] = pd.to_datetime(df['Year'].astype(str) + '/' + df['JulianDay'].astype(str), format='%Y/%j')

In [46]:
# Remove the Year and Julian Day columns
df = df.drop(['Year', 'JulianDay', 'JulianDaySin'], axis=1)

In [47]:
# Cycle through the column names and change the names to be more usable in code
import re
oldColumns = df.columns
codeCols = list(map(lambda x: re.sub(r'\(.*?\)', '', x), oldColumns))
codeCols

# Create a dictionary to map the old column names to the new column names
colDict = dict(zip(oldColumns, codeCols))

# Rename the columns
df = df.rename(columns=colDict)

# Set the Date column as the index
df = df.set_index('Date')

# Sort the columns by date
df = df.sort_index()

df

Unnamed: 0_level_0,Lat,Lon,Alt,Precip,Temp,O18 Actual,H2 Actual,O18 Predicted,H2 Predicted
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1960-04-03,51.930000,-10.250000,9.0,60.0,14.800000,-3.700,-23.00,-4.961939,-30.295210
1961-01-27,22.316667,114.166667,66.0,31.0,25.800000,-5.500,-49.00,-4.289477,-28.487822
1961-01-27,-40.350000,-9.880000,54.0,290.0,9.400000,-2.700,-14.90,-4.268211,-24.840668
1961-01-27,31.183333,29.950000,7.0,34.0,21.800000,-5.700,-26.70,-4.224815,-26.814116
1961-01-27,19.880000,102.130000,305.0,76.0,23.100000,-11.400,-79.90,-6.248234,-43.279255
...,...,...,...,...,...,...,...,...,...
2022-12-24,47.816667,13.717778,1618.0,0.6,557.845496,-8.647,-86.07,-10.082409,-71.329830
2022-12-29,47.816667,13.717778,1618.0,0.7,554.424078,-6.318,-38.49,-9.609411,-67.893590
2022-12-31,47.816667,13.717778,1618.0,13.4,555.377814,-15.763,-119.83,-9.811516,-69.397630
2023-07-10,47.816667,13.717778,1618.0,4.6,546.149634,-12.192,-84.37,-12.589744,-88.387440
