Pmdarima stands for autoregressive integrated moving average with exogenous regressors model. It is a statistical model that uses time series data to predict future trends. It is a class of model that captures a suite of different standard temporal structures in time series data. This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:

pmd stands for predictive modeling dataset. This is the dataset that contains the time series data that you want to model.

In [1]:
%pip install pmdarima

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import the dependencies

In [2]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.holtwinters import ExponentialSmoothing

from sklearn import metrics
from sklearn.model_selection import TimeSeriesSplit
from sklearn.preprocessing import MinMaxScaler

from joblib import Parallel, delayed
from tqdm import tqdm
from prophet import Prophet
from pmdarima import auto_arima


Importing plotly failed. Interactive plots will not work.


In [3]:
df = pd.read_csv('data_atmospheric.csv')
df

Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
43819,43820,2014,12,31,19,8.0,-23,-2.0,1034.0,NW,231.97,0,0
43820,43821,2014,12,31,20,10.0,-22,-3.0,1034.0,NW,237.78,0,0
43821,43822,2014,12,31,21,10.0,-22,-3.0,1034.0,NW,242.70,0,0
43822,43823,2014,12,31,22,8.0,-22,-4.0,1034.0,NW,246.72,0,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43824 entries, 0 to 43823
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   No      43824 non-null  int64  
 1   year    43824 non-null  int64  
 2   month   43824 non-null  int64  
 3   day     43824 non-null  int64  
 4   hour    43824 non-null  int64  
 5   pm2.5   41757 non-null  float64
 6   DEWP    43824 non-null  int64  
 7   TEMP    43824 non-null  float64
 8   PRES    43824 non-null  float64
 9   cbwd    43824 non-null  object 
 10  Iws     43824 non-null  float64
 11  Is      43824 non-null  int64  
 12  Ir      43824 non-null  int64  
dtypes: float64(4), int64(8), object(1)
memory usage: 4.3+ MB


In [5]:
df.shape

(43824, 13)

In [6]:
df["No"].nunique()

43824

In [7]:
df.columns

Index(['No', 'year', 'month', 'day', 'hour', 'pm2.5', 'DEWP', 'TEMP', 'PRES',
       'cbwd', 'Iws', 'Is', 'Ir'],
      dtype='object')

In [8]:
df.drop(columns=['No', 'pm2.5', 'DEWP', 'TEMP', 
       'cbwd', 'Iws', 'Is', 'Ir'], inplace=True)

In [9]:
#statistacal summary
df.describe()

Unnamed: 0,year,month,day,hour,PRES
count,43824.0,43824.0,43824.0,43824.0,43824.0
mean,2012.0,6.523549,15.72782,11.5,1016.447654
std,1.413842,3.448572,8.799425,6.922266,10.268698
min,2010.0,1.0,1.0,0.0,991.0
25%,2011.0,4.0,8.0,5.75,1008.0
50%,2012.0,7.0,16.0,11.5,1016.0
75%,2013.0,10.0,23.0,17.25,1025.0
max,2014.0,12.0,31.0,23.0,1046.0


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43824 entries, 0 to 43823
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    43824 non-null  int64  
 1   month   43824 non-null  int64  
 2   day     43824 non-null  int64  
 3   hour    43824 non-null  int64  
 4   PRES    43824 non-null  float64
dtypes: float64(1), int64(4)
memory usage: 1.7 MB


In [12]:
df["PRES"].unique()

array([1021.      , 1020.      , 1019.      , 1018.      , 1017.      ,
       1015.      , 1014.      , 1016.      , 1022.      , 1023.      ,
       1024.      , 1025.      , 1026.      , 1027.      , 1028.      ,
       1029.      , 1030.      , 1031.      , 1032.      , 1033.      ,
       1034.      , 1035.      , 1036.      , 1037.      , 1038.      ,
       1039.      , 1013.      , 1012.      , 1011.      , 1010.      ,
       1009.      , 1008.      , 1007.      , 1006.      , 1004.      ,
       1003.      , 1002.      , 1005.      , 1040.      , 1041.      ,
       1042.      , 1001.      , 1000.      ,  999.      ,  997.      ,
        996.      ,  998.      ,  995.      ,  994.      , 1043.      ,
       1019.5     , 1029.666667, 1032.333333, 1044.      , 1045.      ,
        993.      ,  992.      , 1046.      , 1027.5     ,  991.      ])

In [18]:
unique = df.drop(columns=['hour','PRES']).drop_duplicates(keep='first').reset_index(drop=True)

In [19]:
unique

Unnamed: 0,year,month,day
0,2010,1,1
1,2010,1,2
2,2010,1,3
3,2010,1,4
4,2010,1,5
...,...,...,...
1821,2014,12,27
1822,2014,12,28
1823,2014,12,29
1824,2014,12,30


In [None]:
Minimum = df["PRES"].min()