### How to Choose Values of p, d and q? 

There are various ways to choose the values of parameters of the ARIMA model. Without being confused we can do this using the following steps: 

Test for stationarity using the augmented dickey fuller test. 

If the time series is stationary try to fit the **ARMA** model, and if the time series is non-stationary then seek the value of d.

If the data is getting stationary, then drraw the autocorrelation and partial autocorrelation graph of the data

Draw a partial autocorrelation graph of the data (PACF), this will help us finding the value of p because the cut-off point to the PACF is p.

Draw an autocorreltion graph of the data (ACF). This will help us find the value of q because the cut-off point to the ACF is q.



Read more at: https://analyticsindiamag.com/ai-mysteries/quick-way-to-find-p-d-and-q-values-for-arima/

### Step 1 Determing whether we need to have d using Augmented Dickey- Fuller test

Basically this means we need to determine whether the time series is stationary. Time series are stationary if they do not have trend or seasonal effects.

In [None]:
from statsmodels.tsa.stattools import adfuller 
result = adfuller(data['Passengers'])

### Step 2 Finding the optimal value of d

The value of d is determined based on the number of differencing steps needed to achieve stationarity. Generally, if the data is stationary, the value of d is 0, and if the data is not stationary, the value of d is 1.

In [None]:
# Taking 1st order differentiating and 2nd order differentiating.
ax2.plot(data.Passengers.diff())

# second order differencing
ax3.plot(data.Passengers.diff().diff())


# There is no such method that can tell us how much value of d will be optimal. However, the value of differencing can be optimal till 2 so we will try out time series in both.

### Step 3: Finding the value of p parameter using PACF Partial Autocorrelation

PACF (Partial Autocorrelation Function): This plot helps determine the p parameter for the AR part of the model. You look for the lag after which most partial autocorrelations are not significantly different from zero.

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf 
plot_pacf(data.Passengers.diff().dropna())

### Step 4 Finding the value of q parameter using ACF

ACF (Autocorrelation Function): This plot helps identify the q parameter for the MA part of the model. You look for the lag after which most autocorrelations are not significantly different from zero. 

In [None]:
plot_acf(data.Passengers.diff().dropna())

### Step 5 building the model

In [None]:
from statsmodels.tsa.arima_model import ARIMA 
model = ARIMA(data.Passengers, order = (1,1,2)) 
model_fit = model.fit(disp=0) 
model_fit.summary()

### Step 6: If you want to build a SARIMA Model

To implement the SARIMA model, we need to follow the same process we followed to implement the ARIMA model. Below are the changes you need to make in your implementation of ARIMA to implement the SARIMA model.

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

Now, to implement SARIMA, change your model from ARIMA to SARIMA and add seasonality parameters as shown in the code below:

The value of s in SARIMA represents the seasonal period or the number of time steps in each seasonal cycle. In time series data, seasonality often occurs at regular intervals. For example, in monthly data, the seasonality repeats every 12 months, while in daily data, it may repeat every 7 days (weekly seasonality) or every 30 days (monthly seasonality).

In the provided SARIMA example, the value of s is set to 12, indicating that the time series data exhibits seasonality with a repeating pattern every 12-time steps. It typically corresponds to a seasonal cycle of 12 months, suggesting that the data has yearly seasonality.

In [None]:
p, d, q, s = 1, 1, 1, 12


model = SARIMAX(time_series, order=(p, d, q), seasonal_order=(p, d, q, s))
results = model.fit()
print(results.summary())