### Training separate ARIMA models for each city

References - https://medium.com/analysts-corner/comprehensive-guide-to-time-series-modeling-techniques-applications-and-best-practices-fd330eb0a755

In [25]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from pmdarima import auto_arima

## Load datsets of each location

In [7]:
bibile_df = pd.read_csv('ML Dataset\\bibile,_monaragala_data.csv')
colombo_df = pd.read_csv('ML Dataset\\colombo_proper_data.csv')
deniyaya_df = pd.read_csv('ML Dataset\\deniyaya,_matara_data.csv')
jaffna_df = pd.read_csv('ML Dataset\\jaffna_proper_data.csv')
kandy_df = pd.read_csv('ML Dataset\\kandy_proper_data.csv')
kurunegala_df = pd.read_csv('ML Dataset\\kurunegala_proper_data.csv')
nuwara_eliya_df = pd.read_csv('ML Dataset\\nuwara_eliya_proper_data.csv')

# Bibile, Monaragala ARIMA model

In [8]:
bibile_df.head()

Unnamed: 0,HCHO_reading,Location,Current_Date,Next_Date
0,1.9e-05,"bibile, monaragala",2019-01-02,2019-01-03
1,2.8e-05,"bibile, monaragala",2019-01-03,2019-01-04
2,3.7e-05,"bibile, monaragala",2019-01-04,2019-01-05
3,3.7e-05,"bibile, monaragala",2019-01-05,2019-01-06
4,0.000146,"bibile, monaragala",2019-01-06,2019-01-07


In [9]:
bibile_df.dtypes

HCHO_reading    float64
Location         object
Current_Date     object
Next_Date        object
dtype: object

In [10]:
# convert current_date dtype as datetime
bibile_df['Current_Date'] = pd.to_datetime(bibile_df['Current_Date'])

In [11]:
bibile_df.dtypes

HCHO_reading           float64
Location                object
Current_Date    datetime64[ns]
Next_Date               object
dtype: object

We are training an ARIMA model specifically for the 'bibile, monaragala' location, so don't need the 'Location' column since it contains the same value for all rows ('bibile, monaragala'). Additionally, the ARIMA model doesn't require knowledge of the specific dates of future data points during training. So we can exclude 'Next_Date' column as well.

In [12]:
# drop the 'Location' and 'Next_Date' columns
bibile_df = bibile_df.drop(['Location', 'Next_Date'], axis=1)

In [13]:
bibile_df.head()

Unnamed: 0,HCHO_reading,Current_Date
0,1.9e-05,2019-01-02
1,2.8e-05,2019-01-03
2,3.7e-05,2019-01-04
3,3.7e-05,2019-01-05
4,0.000146,2019-01-06


In [14]:
bibile_df.dtypes

HCHO_reading           float64
Current_Date    datetime64[ns]
dtype: object

Set 'Current_Date' as the index because we are working with time series data, as it aligns well with the principles and requirements of time series analysis and modeling.

References - https://mlpills.dev/time-series/date-manipulation-in-python-for-time-series/#:~:text=In%20time%20series%20analysis%2C%20leveraging,index%20using%20the%20set_index%20method.

In [15]:
# set 'Current_Date' as the index 
bibile_df.set_index('Current_Date', inplace=True)

In [16]:
bibile_df.head()

Unnamed: 0_level_0,HCHO_reading
Current_Date,Unnamed: 1_level_1
2019-01-02,1.9e-05
2019-01-03,2.8e-05
2019-01-04,3.7e-05
2019-01-05,3.7e-05
2019-01-06,0.000146


In [19]:
bibile_df.describe()

Unnamed: 0,HCHO_reading
count,1825.0
mean,0.0001331636
std,8.052639e-05
min,1.461232e-07
25%,7.099799e-05
50%,0.000124118
75%,0.0001850499
max,0.0003561278


## Model Training

In time series forecasting tasks it's essential to split the data sequentially to preserve the temporal order.

In [21]:
# split the data into training and testing sets without shuffling
train_size = 0.8

train_data, test_data = train_test_split(bibile_df, train_size=train_size, shuffle=False)

In [22]:
train_data.shape

(1460, 1)

In [23]:
test_data.shape

(365, 1)

In [None]:
# Fit the auto-ARIMA model
model = auto_arima(train_data, seasonal=False, trace=True)