### Time Series Analysis - TSA

TSA is a mathematical approach to predicting or forecasting the future pattern of data using historical data arranged in a successive order for a particular time period.

Assumption: The only assumption in TSA is that the data is “stationary”, which means that the data is independent of time influence.

### Components of TSA

Trends — Patterns inside data that reflect the series movement concerning time. The trend can be either linear or nonlinear in nature.

Seasonality — Data experience repetitive changes that recur every calendar year.

Cyclicity — Data experience changes that are not fixed and beyond the calendar year.

Randomness — Unknown, Irregular movements or changes in data.

### Different TSA models

The TSA has different models like AR, MA, ARIMA, ARMA, etc. Within all of these models, ARIMA is the most frequently used model. Now, why ARIMA is used most frequently? We are not going to discuss these answers there.

TSA also provides us with additional information about the data points, but in this article, we are going to understand how to perform a time series analysis in Python.

### Steps involve in TSA

1. Plot the time series: Look for trends, seasonality, outliers, etc.
2. Transform data so that the residuals are stationary: Log transforms or differencing.
3. Fit the residuals: AR, MA, etc.

# Work in python

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

In [None]:
data = pd.read_csv(filename)

In [None]:
data.columns

plot the time series

In [None]:
# ETS Decomposition
result = seasonal_decompose(data['Adj Close**'],model ='multiplicative')
# ETS plot 
result.plot()

### Step 2: Transform data so that the residuals are stationary: Log transforms or differencing.

ADF test is being done to check the seasonality of the data.

In [None]:
adfuller(data['Adj Close**'])

If P value is 0.32 which is more than 0.05 indicating that our data is not stationary. So we need to transform the data to stationary. Let’s use log transform to target the variable and transform it to stationery.

In [None]:
data['logarithm_base1'] = np.log2(data['Adj Close**'])
# Show the dataframe
data

After doing the log transform the P value comes to an acceptable range but if your P value is still not coming under the range then you need to do differencing and check the results until the P value comes under 0.05

In [None]:
df_d=data.diff(axis = 0, periods = 1)
df_d

### Step 3: Fit the residuals: AR, MA, etc.

In Python, there is a library named pmarima. Within this library, there is auto_arima which automatically tunes the parameters(p,d,q) where p is the number of autoregressive terms, d is the number of nonseasonal differences required for stationarity and q is the number of lagged forecast errors in the prediction equation.

In [None]:
# To install the library
!pip install pmdarima
# Import the library
from pmdarima import auto_arima
# Ignore harmless warnings
import warnings
warnings.filterwarnings("ignore")
# Fit auto_arima function to dataset
stepwise_fit = auto_arima(data['data_d'], start_p = 1, start_q = 1,
                          max_p = 3, max_q = 3, m = 12,
                          start_P = 0, seasonal = True,
                          d = None, D = 1, trace = True,
                          error_action ='ignore',   # we don't want to know if an order does not work
                          suppress_warnings = True,  # we don't want convergence warnings
                          stepwise = True)           # set to stepwise
# To print the summary
stepwise_fit.summary()

From the result, we got the optimal model for our data.

In [None]:
# Split data into train / test sets
train = data.iloc[:len(data)-12]
test = data.iloc[len(data)-12:] # set one year(12 months) for testing
# Fit a SARIMAX(0, 1, 1)x(2, 1, 1, 12) on the training set
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(train['data_d'], 
                order = (0, 1, 1), 
                seasonal_order =(2, 1, 1, 12))
result = model.fit()
result.summary()

Visualize the prediction results and actual results

In [None]:
start = len(train)
end = len(train) + len(test) - 1
# Predictions for one-year against the test set
predictions = result.predict(start, end,
                             typ = 'levels').rename("Predictions")
# plot predictions and actual values
predictions.plot(legend = True)
test['data_d'].plot(legend = True)

Looking for the error. We measure MSE (Mean Square Error) to judge the accuracy.

In [None]:
# Load specific evaluation tools
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
# Calculate root mean squared error
rmse(test["data_d"], predictions)
# Calculate mean squared error
mean_squared_error(test["data_d"], predictions)

Plotting future crude oil prices for the next few years.

In [None]:
# Train the model on the full dataset
model = model = SARIMAX(df['data_d'], 
                        order = (0, 1, 1), 
                        seasonal_order =(2, 1, 1, 12))
result = model.fit()
# Forecast for the next 3 years
forecast = result.predict(start = len(df), 
                          end = (len(df)-1) + 3 * 12, 
                          typ = 'levels').rename('Forecast')
# Plot the forecast values
df['data_d'].plot(figsize = (12, 5), legend = True)
forecast.plot(legend = True)