# Time Series

## Persiapan Lingkungan



*   Download & Install Package

In [None]:
!pip install orchest
!pip install jupyterlab_code_formatter
!pip install pandas numpy scikit-learn

*   Import Package

In [2]:
import orchest
import pandas as pd
import numpy as np
from numpy import array
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
import os

*   [Dataset Passenger](https://raw.githubusercontent.com/dhamvi01/Univariate-Time-Series-using-LSTM/master/airline-passengers.csv)

In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/dhamvi01/Univariate-Time-Series-using-LSTM/master/airline-passengers.csv")

In [4]:
df.head(5)

Unnamed: 0,Month,Passengers
0,1949-01,112
1,1949-02,118
2,1949-03,132
3,1949-04,129
4,1949-05,121


## Preprocessing

Take ```Passengers``` column

In [5]:
passengers = array(df["Passengers"])



Bagi kolom menjadi _input_ dan _output_


*   ```idx_nilai_akhir = idx_iterasi + langkah``` ambil ```idx``` nilai terakhir
*   cek kondisi ```if idx_nilai_akhir > len(kolom)``` berhenti jika ```idx_nilai_akhir``` lebih besar dari panjang kolom
*   ```kolom-input kolom-output = kolom[i:idx_akhir], kolom[idx_akhir]``` ambil nilai _array_ sesuai dari indeks ```i``` sampai```idx_nilai_akhir```


In [6]:
def split_sequence(sequence, n_steps):
    """
    Split a univariate sequence into samples
        end_ix = i + n_steps # find the end of this pattern
        if end_ix > len(sequence)-1 # check if we are beyond the sequence
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] # gather input and output parts of the pattern
    """
    X, y = list(), list()
    for i in range(len(sequence)):
        end_ix = i + n_steps
        if end_ix > len(sequence)-1:
            break
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

ambil kolom _input_ dengan nama ```matrix_data```

In [7]:
data, data_actual = split_sequence(passengers, 3)
matrix_data = pd.DataFrame((data), columns=["Xt-2","Xt-1", "Xt"])
y_actual = pd.DataFrame((data_actual), columns=["output"])

In [8]:
matrix_data.head(5)

Unnamed: 0,Xt-2,Xt-1,Xt
0,112,118,132
1,118,132,129
2,132,129,121
3,129,121,135
4,121,135,148


Bagi ```matrix_data``` menjadi data latih ```X``` dan data target ```y```

In [9]:
X = matrix_data.drop(columns="Xt")
y = matrix_data["Xt"]

Normalisasi dengan ```MinMaxScaler()``` data latih ```X``` dan data target ```y```

In [10]:
scaler = MinMaxScaler()

X_norm = scaler.fit_transform(X)
y_norm = scaler.fit_transform(y.values.reshape(-1, 1)) # reshape to 1 dimentional column

In [11]:
split_percentage = 0.2

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X_norm, y_norm, test_size=split_percentage, random_state=0)

In [13]:
# use from KNeighborsRegressor because the data is numeric, for count root mean squared error
neigh = KNeighborsRegressor(n_neighbors=3)
neigh.fit(X_train, y_train)

In [14]:
pred_y = neigh.predict(X_test)

In [15]:
mse = mean_squared_error(y_test, pred_y)
print("Mean Squared Error:", mse)

Mean Squared Error: 0.006128763185080589


In [16]:
# use squeze to make 1 dimentional array 
# https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html

pred_y_series = pd.Series(pred_y.squeeze())
y_test_series = pd.Series(y_test.squeeze())
passengers_df_norm = pd.concat((pred_y_series, y_test_series), axis=1)
passengers_df_norm.columns = ["pred_y", "y_test"]

In [17]:
passengers_df_norm.head(5)

Unnamed: 0,pred_y,y_test
0,0.115187,0.173745
1,0.143501,0.252896
2,0.040541,0.028958
3,0.268983,0.26834
4,0.377735,0.299228


In [18]:
passengers_df_norm.to_csv("/content/drive/MyDrive/prosaindata/source/tasks/output/passengers-df-norm.csv")