# ARIMA

**ARIMA (AutoRegressive Integrated Moving Average)** is a popular statistical method for time series forecasting. ARIMA models are used to understand past data or predict future data points in a series.

## Concept

ARIMA models are composed of three main components:
- **AR (AutoRegressive):** This part uses past values in the time series to predict future values based on a lagged relationship. The AR part involves terms like:
  $$
  \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \ldots + \phi_p Y_{t-p}
  $$
  where \( Φ_i \) represents the coefficients of the lagged observations.
- **I (Integrated):** This represents the differencing of raw observations to make the time series stationary, i.e., data values are replaced by the difference between the data values and a previous value. The differencing can be represented as:
  $$
  (1 - L)^d Y_t
  $$
  where \( L \) is the lag operator, and \( d \) is the order of differencing.
- **MA (Moving Average):** This involves modeling the error term as a combination of error terms from the past that move with time, expressed as:
  $$
  \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q}
  $$
  where \( θ_i \) are the coefficients of the moving average terms and \( ε_t \) are the error terms.

## Model Notation

ARIMA models are generally denoted as ARIMA(p, d, q) where:
- **p:** The number of lag observations included in the model (lag order).
- **d:** The number of times that the raw observations are differenced (degree of differencing).
- **q:** The size of the moving average window (order of moving average).

## Stationarity

A key assumption of ARIMA is that the underlying data must be stationary. Stationarity means that the statistical properties such as mean, variance, and autocorrelation are all constant over time. Non-stationary behaviors can be trends, cycles, random walks, or combinations of the three.

## Seasonal ARIMA (SARIMA)

For seasonal effects, ARIMA models can be extended to SARIMA which incorporates seasonal elements into the model. These are denoted as SARIMA(p, d, q)(P, D, Q)[s] where:
- **P, D, Q** represent the seasonal autoregressive order, differencing order, and moving average order, respectively.
- **s:** The number of time steps for a single seasonal period.


# Implemetation

### Import Libraries

**Press ▶ to import the libraries.**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import ipywidgets as widgets
from IPython.display import display, clear_output

from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from IPython.display import display, clear_output, HTML
from statsmodels.tsa.arima.model import ARIMA

import warnings
warnings.filterwarnings("ignore")

print("Libraries are imported.")

### Import and show Data

**Press ▶ to load the data.**

In [None]:
import os
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output

# List all .csv and Excel files in the current directory
supported_extensions = ['.csv', '.xlsx', '.xls']
files = [f for f in os.listdir('./Data') if any(f.endswith(ext) for ext in supported_extensions)]

# Create a dropdown widget
dropdown = widgets.Dropdown(
    options=files,
    description='Files:',
    disabled=False,
)

# Create a button widget
button = widgets.Button(
    description='Select',
    disabled=False,
    button_style='',
    tooltip='Click to select file',
    icon='check'
)

# Output widget to display messages
output = widgets.Output()

# Function to handle button click
def on_button_click(b):
    with output:
        clear_output()
        selected_file = dropdown.value
        global data
        if selected_file.endswith('.csv'):
            data = pd.read_csv("./Data/" +selected_file)
        elif selected_file.endswith(('.xlsx', '.xls')):
            data = pd.read_excel("./Data/" +selected_file)
        print(f"File '{selected_file}' uploaded as data.")

# Attach the function to the button widget
button.on_click(on_button_click)

# Display the dropdown, button widgets, and initial message within the output widget
with output:
    print("Please select a file from the dropdown and click 'Select'.")
display(output)
display(dropdown)
display(button)



**Press ▶ to display the data.**

In [None]:
display(data.head())
print ("The data is composed of ", data.shape[0], " rows and ", data.shape[1], " columns.")

### Data Preprocessing

**Press ▶ to select the target column.**

In [None]:
import ipywidgets as widgets
import pandas as pd

# Create a Dropdown widget for column selection
dropdown = widgets.Dropdown(
    options=data.columns.tolist(),
    value=data.columns[0],
    description='Select Target Column:',
    disabled=False,
    layout=widgets.Layout(width='500px'),
    style={'description_width': '200px'}
)

# Create a Button widget
button = widgets.Button(
    description='Select',
    button_style='',  # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to select the target column as the last column',
    icon='check'  # FontAwesome icon names (without 'fa-')
)

# Create an Output widget for displaying messages
output = widgets.Output()

# Function to handle button click that rearranges the DataFrame
def on_button_clicked(b):
    with output:
        output.clear_output()
        global data
        # Get the selected column name
        selected_column = dropdown.value
        # Reorder the DataFrame columns
        new_columns = [col for col in data.columns if col != selected_column] + [selected_column]
        data = data[new_columns]
        print(f"Column '{selected_column}' has been moved to the last position.")

# Link the button click event to the function
button.on_click(on_button_clicked)

# Display the widgets and output
display(widgets.VBox([dropdown, button, output]))


**Press ▶ to save the target column name.**

In [None]:
target = data.columns[-1]

## Predict Bead Area

## Parameters

### AR Order (p)
**Definition:** The number of lag observations included in the model (also called the lag order).

**Explanation:** This parameter captures the relationship between an observation and a number of its lagged observations. For instance, if p = 2, the model will use the previous two observations to predict the current observation.

**Importance:**
- Helps capture the autocorrelation in the time series.
- Useful for understanding the influence of previous observations on the current observation.

### Difference Order (d)
**Definition:** The number of times that the raw observations are differenced to make the time series stationary.

**Explanation:** Differencing involves subtracting the previous observation from the current observation. If d = 1, the model will use the differences between consecutive observations to stabilize the mean of the time series.

**Importance:**
- Essential for transforming a non-stationary time series into a stationary one.
- Helps remove trends and seasonality, making the series easier to model with linear methods.

### MA Order (q)
**Definition:** The number of lagged forecast errors in the prediction equation (also called the moving average order).

**Explanation:** This parameter captures the dependency between an observation and a residual error from a moving average model applied to lagged observations. For example, if q = 1, the model will use the previous forecast error in the prediction of the current observation.

**Importance:**
- Helps capture the error of the model as a linear combination of error terms from past observations.
- Useful for understanding and modeling the residual errors in the time series.

## Summary
The AR (p), differencing (d), and MA (q) parameters collectively determine the complexity and accuracy of an ARIMA model. Selecting the right combination of these parameters is key to building a robust model that can effectively forecast future values in a time series.

Understanding these parameters enables you to:
- Identify the appropriate model for your data.
- Improve forecast accuracy.
- Gain insights into the underlying patterns and structures in your time series data.


**Press ▶ to set the model parameters and forecast the data.**

In [None]:
window = 100

# Define widgets with adjusted layout
index_range_slider = widgets.IntRangeSlider(
    value=[0, min(500, len(data))],
    min=0,
    max=len(data),
    step=1,
    description='Index Range:',
    layout=widgets.Layout(width='600px'),  # Increase width for better readability
    style={'description_width': '150px'},  # Increase description width
    continuous_update=False
)

train_size_slider = widgets.IntSlider(
    value=80,
    min=50,
    max=95,
    step=1,
    description='Train %:',
    layout=widgets.Layout(width='600px'),  # Increase width
    style={'description_width': '150px'},  # Increase description width
    continuous_update=False
)

# ARIMA parameter sliders
p_slider = widgets.IntSlider(
    value=5,
    min=0,
    max=10,
    step=1,
    description='AR Order (p):',
    layout=widgets.Layout(width='600px'),
    style={'description_width': '150px'},
    continuous_update=False
)

d_slider = widgets.IntSlider(
    value=1,
    min=0,
    max=2,
    step=1,
    description='Difference Order (d):',
    layout=widgets.Layout(width='600px'),
    style={'description_width': '150px'},
    continuous_update=False
)

q_slider = widgets.IntSlider(
    value=0,
    min=0,
    max=10,
    step=1,
    description='MA Order (q):',
    layout=widgets.Layout(width='600px'),
    style={'description_width': '150px'},
    continuous_update=False
)

# Slider to control future prediction steps
future_steps_slider = widgets.IntSlider(
    value=10,
    min=0,
    max=100,
    step=1,
    description='Future Steps:',
    layout=widgets.Layout(width='600px'),
    style={'description_width': '150px'},
    continuous_update=False)

apply_button = widgets.Button(description="Apply Changes", layout=widgets.Layout(width='800px'))

# Define the function to apply changes and update the plots
def apply_changes(b):
    with output:
        clear_output(wait=True)
        
        # Extract the parameters from widgets
        index_range = index_range_slider.value
        train_size_pct = train_size_slider.value / 100
        p = p_slider.value
        d = d_slider.value
        q = q_slider.value
        future_steps = future_steps_slider.value
        
        # Slice the data
        df = data[index_range[0]:index_range[1]]
        
        # Prepare the data
        y = df[target]
        
        # Train-test split
        train_size = int(len(df) * train_size_pct)
        y_train, y_test = y[:train_size], y[train_size:train_size+future_steps]
        
        # Fit ARIMA model on training data
        model = ARIMA(y_train, order=(p, d, q))
        model_fit = model.fit()

        # Forecast on the test data and beyond
        forecast_steps = window  # 10 additional future steps
        forecast = model_fit.forecast(steps=forecast_steps)
        
        # Extract predictions for test data
        y_pred = forecast[:future_steps]
        #future_predictions = forecast[len(y_test):]
        
        mse = mean_squared_error(y_test, y_pred)
        display(HTML(f'<b>Mean Squared Error: {mse:.5f}</b>'))  # Display MSE in bold
        
        # Plot predicted vs actual
        plt.figure(figsize=(10, 6))
        plt.plot(y_train.index, y_train, label='Training', color='green')
        plt.plot(y_test.index, y_test, label='Actual', color='blue')
        plt.plot(y_test.index, y_pred, label='Predicted', color='red', linestyle='--')
        plt.axvline(x=y_test.index[-1], color='gray', linestyle='--')  # Mark the end of actual test data
        #future_index = range(y_test.index[-1] + 1, y_test.index[-1] + 1 + len(future_predictions))
        #plt.plot(future_index, future_predictions, label='Future Predictions', color='green', linestyle='--')
        plt.xlabel('Time')
        plt.ylabel(target)
        plt.title('Actual vs Predicted '+target)
        plt.legend()
        plt.show()
        
        # Calculate loss for each point
        pointwise_mse_loss = (y_test.values - y_pred) ** 2  # Match dimension with y_test
        
        # Plot the pointwise loss
        plt.figure(figsize=(10, 6))
        plt.plot(y_test.index, y_test, label='Actual', color='blue')
        plt.plot(y_test.index, y_pred, label='Predicted', color='red', linestyle='--')
        plt.plot(y_test.index, pointwise_mse_loss, label='Pointwise MSE Loss', color='orange')
        plt.xlabel('Time')
        plt.ylabel('MSE Loss')
        plt.title('Pointwise MSE Loss of Predicted vs Actual '+target)
        plt.legend()
        plt.show()

# Link the apply button to the function
apply_button.on_click(apply_changes)

# Display the widgets and the output area
output = widgets.Output()

display(index_range_slider, train_size_slider, p_slider, d_slider, q_slider, future_steps_slider, apply_button, output)


## [🏠 Home](../../../welcomePage.ipynb)