## Introduction to ARIMA
#### What is ARIMA?
ARIMA stands for AutoRegressive Integrated Moving Average. It is a class of statistical models designed for analyzing and forecasting time series data.  
ARIMA models are built upon three key components:

1. **AutoRegressive (AR)**: This refers to the model's ability to predict future values based on past values. It uses a linear combination of lagged observations.
2. **Integrated (I)**: This refers to the differencing of raw observations to make the time series stationary (i.e., the mean, variance, and autocorrelation structure of the series do not change over time).
3. **Moving Average (MA)**: This involves modeling the error of the forecast as a linear combination of error terms from past time steps.

These components are combined into a single model represented as ARIMA **(p,d,q)**, where:

1. $p$: The number of lag observations included in the AR model (autoregressive terms).
2. $d$: The degree of differencing required to make the series stationary.
3. $q$: The size of the moving average window (moving average terms).

#### What Does ARIMA Solve?
ARIMA is designed to solve forecasting problems for time series data. It provides a method to predict future values based on patterns and relationships within the historical data.

##### Typical problems ARIMA addresses include:
1. Forecasting future sales or demand.
2. Predicting stock prices or economic indicators.
3. Anticipating seasonal trends in temperature, rainfall, or traffic.

##### When Should ARIMA Be Used?
1. **The data is a time series**: ARIMA is specifically for data points recorded in sequential time order.
2. **There is a need for accurate forecasts**: It’s particularly effective for series that are autocorrelated (past values influence future values).
3. **The series is stationary or can be made stationary**: ARIMA assumes the series has a constant mean and variance over time. Non-stationary series can often be transformed using differencing.
4. **There are no clear seasonal effects**: For data with strong seasonal patterns, the Seasonal ARIMA (SARIMA) extension is often better.


#### What is a Time Series?
A time series is a sequence of data points measured at successive points in time, typically at regular intervals. Examples include:

1. Daily temperature measurements.
2. Monthly sales revenue.
3. Hourly website traffic.

Time series data has 4 key components:

1. **Trend**: The long-term direction of the data (e.g., increasing, decreasing, or constant over time).
2. **Seasonality**: Recurring patterns or cycles over a fixed period (e.g., sales increasing every December).
3. **Noise**: Random variations in the data.
4. **Cyclic patterns**: Fluctuations that are not tied to a fixed calendar period.

##### What is Stationarity?
A stationary time series has statistical properties that remain **constant** over time. Specifically:

1. The mean does not change over time.
2. The variance remains constant.
3. The autocorrelation structure does not change.
##### Why is Stationarity Important?
ARIMA models work under the assumption that the input time series is stationary. If your data isn’t stationary, ARIMA will struggle to provide accurate forecasts.

##### How to Test for Stationarity
There are two common ways to check if a time series is stationary:

1. **Visual Inspection**: Plot the time series. If you observe trends or changing variances, the series is likely not stationary.
2. **Statistical Tests**: Augmented Dickey-Fuller (ADF) test: A hypothesis test where the null hypothesis is that the series is not stationary.

##### Making a Time Series Stationary
If the data is not stationary, there are techniques to make it stationary:
1. **Differencing**: 
    1. Subtract the previous observation from the current observation.
    2. This helps remove trends and makes the data more stable.

2. **Log Transformation**: 
    1. Applying a logarithm can stabilize the variance.

3. **Detrending**:
    1. Remove the trend component explicitly (e.g., by subtracting a fitted trend line).

4. **Seasonal Differencing**:
    1. Subtract the value from the same season in the previous cycle.

**Example of Differencing**:  
Suppose you have the following series: $[10,12,14,16,18]$  
The differenced series becomes: $[12−10,14−12,16−14,18−16] = [2,2,2,2]$  
Now the series is stationary because it has a constant mean (2).  

##### p, d and q
Selecting appropriate values for the p, d, and q parameters is critical for building an effective ARIMA model.  
This process involves analyzing the autocorrelation and partial autocorrelation plots of the time series data to identify the appropriate values for p and q.  
The value of d is determined based on the number of differencing steps needed to achieve stationarity.  
Generally, if the data is stationary, the value of d is 0, and if the data is not stationary, the value of d is 1.

## Main Task 
Analyzing and Forecasting the reach of a professional social media account.

Analyzing and forecasting the reach of a professional social media account can help the content creator plan and optimize their social media strategy.  
By knowing the expected reach of their Instagram account, they can plan the timing and content of their posts to maximize engagement and grow their followers.

In [1]:
import pandas as pd
import numpy as np