# Time Series

## Persiapan Lingkungan



*   Download & Install Package

In [2]:
!pip install orchest
!pip install jupyterlab_code_formatter
!pip install pandas numpy scikit-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting orchest
  Downloading orchest-0.3.11.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyarrow<8.0,>=1.0.0
  Downloading pyarrow-7.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.7/26.7 MB[0m [31m42.5 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: orchest
  Building wheel for orchest (setup.py) ... [?25l[?25hdone
  Created wheel for orchest: filename=orchest-0.3.11-py3-none-any.whl size=19359 sha256=ba22233c7c5088bf8fe931cde0d4f34acaff409d8d70cd024ad9baed29781792
  Stored in directory: /root/.cache/pip/wheels/28/fd/ad/876cab218568a51f43fe21359fda871407b5366898829e089e
Successfully built orchest
Installing collected packages: pyarrow, orchest
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 9.0.0
    Uninstallin

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


*   Import Package

In [3]:
import orchest
import pandas as pd
import numpy as np
from numpy import array
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
import os

*   [Dataset Passenger](https://raw.githubusercontent.com/dhamvi01/Univariate-Time-Series-using-LSTM/master/airline-passengers.csv)

In [4]:
df = pd.read_csv("https://raw.githubusercontent.com/dhamvi01/Univariate-Time-Series-using-LSTM/master/airline-passengers.csv")

In [5]:
df.head(5)

Unnamed: 0,Month,Passengers
0,1949-01,112
1,1949-02,118
2,1949-03,132
3,1949-04,129
4,1949-05,121


In [6]:
max_value, min_value = max(df["Passengers"]), min(df["Passengers"])
print(f"Nilai maksimal {max_value} dan nilai minimal {min_value} dari kolom 'Passengers'")

Nilai maksimal 622 dan nilai minimal 104 dari kolom 'Passengers'


## Preprocessing

Take ```Passengers``` column

In [7]:
passengers = array(df["Passengers"])



Bagi kolom menjadi _input_ dan _output_


*   ```idx_nilai_akhir = idx_iterasi + langkah``` ambil ```idx``` nilai terakhir
*   cek kondisi ```if idx_nilai_akhir > len(kolom)``` berhenti jika ```idx_nilai_akhir``` lebih besar dari panjang kolom
*   ```kolom-input kolom-output = kolom[i:idx_akhir], kolom[idx_akhir]``` ambil nilai _array_ sesuai dari indeks ```i``` sampai```idx_nilai_akhir```


In [8]:
def split_sequence(sequence, n_steps):
    """
    Split a univariate sequence into samples
        end_ix = i + n_steps # find the end of this pattern
        if end_ix > len(sequence)-1 # check if we are beyond the sequence
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] # gather input and output parts of the pattern
    """
    X, y = list(), list()
    for i in range(len(sequence)):
        end_ix = i + n_steps
        if end_ix > len(sequence)-1:
            break
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

ambil kolom _input_ dengan nama ```matrix_data```

In [9]:
data, data_actual = split_sequence(passengers, 3)
matrix_data = pd.DataFrame((data), columns=["Xt-2","Xt-1", "Xt"])
y_actual = pd.DataFrame((data_actual), columns=["output"])

[Hitungan](https://docs.google.com/spreadsheets/d/1-Aeaz4VK1eppbsd0klJYrCulL_PP_CdThG12nqEeaNQ/edit?usp=sharing) manual nilai error terhadap kolom ```Xt``` dengan ```pred_y```

In [10]:
matrix_data.head(5)

Unnamed: 0,Xt-2,Xt-1,Xt
0,112,118,132
1,118,132,129
2,132,129,121
3,129,121,135
4,121,135,148


Bagi ```matrix_data``` menjadi data latih ```X``` dan data target ```y```

In [11]:
X = matrix_data.drop(columns="Xt")
y = matrix_data["Xt"]

Normalisasi dengan ```MinMaxScaler()``` data latih ```X``` dan data target ```y```

In [12]:
scaler = MinMaxScaler()

X_norm = scaler.fit_transform(X)
y_norm = scaler.fit_transform(y.values.reshape(-1, 1)) # reshape to 1 dimentional column

In [13]:
split_percentage = 0.2

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X_norm, y_norm, test_size=split_percentage, random_state=0)

In [15]:
# use from KNeighborsRegressor because the data is numeric, for count root mean squared error
neigh = KNeighborsRegressor(n_neighbors=3)
neigh.fit(X_train, y_train)

In [16]:
pred_y = neigh.predict(X_test)

In [17]:
mse = mean_squared_error(y_test, pred_y)
print("Mean Squared Error:", mse)

Mean Squared Error: 0.006128763185080589


In [18]:
# use squeze to make 1 dimentional array 
# https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html

pred_y_series = pd.Series(pred_y.squeeze())
y_test_series = pd.Series(y_test.squeeze())
passengers_df_norm = pd.concat((pred_y_series, y_test_series), axis=1)
passengers_df_norm.columns = ["pred_y", "y_test"]

In [19]:
passengers_df_norm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   pred_y  29 non-null     float64
 1   y_test  29 non-null     float64
dtypes: float64(2)
memory usage: 592.0 bytes


In [20]:
passengers_df_norm.to_csv("/content/drive/MyDrive/prosaindata/source/tasks/output/passengers-df-norm.csv")