<a href="https://colab.research.google.com/github/calistasalscpw/Selena-Finance-Tracker/blob/main/Selena_Shopee_rev_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Capstone Project - Selena (Shopee)

**Proses Analisis oleh Model ML (Menggunakan File .pkl)**

- **Backend kemudian memuat model ML yang sudah dilatih (disimpan dalam format .pkl) menggunakan library Pickle di Python.**
- Model ML ini memproses data transaksi pengguna yang sudah diambil dari database untuk melakukan beberapa tugas analisis:
    - Arus Kas: Mengidentifikasi pola pemasukan dan pengeluaran.
    - Saran Keuangan: Berdasarkan analisis tren keuangan pengguna, model bisa memberikan saran, misalnya mengurangi pengeluaran di kategori tertentu.
    - Deteksi Anomali Pengeluaran: Model ML mendeteksi adanya transaksi atau pengeluaran yang tidak biasa atau mencurigakan.

Pengembalian Hasil Analisis ke Backend (ML ke CC)

- Model ML menghasilkan output analisis, seperti rekomendasi keuangan atau deteksi anomali, yang dikonversi ke **format JSON**.
- Backend menerima hasil ini dan menggabungkannya menjadi satu respons JSON yang siap dikirim ke aplikasi mobile.

## Import Libraries

In [1]:
import csv
import pickle
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import pandas as pd

## Preprocessing Data

In [2]:
#FUNGSI UNTUK PREPROCESSING DATA

def preprocess_data(data_path):
  # Membaca file dataset
  read_file = pd.read_excel(data_path)
  read_file.to_csv("./data/dataset.csv", index=None, header=True)
  df = pd.DataFrame(pd.read_csv("./data/dataset.csv"))

  # Mengambil kolom yang dibutuhkan
  data_filtered = df[['Waktu Pesanan Selesai', 'Total Pembayaran']].copy()

  # Menghapus titik dari kolom 'Total Pembayaran'
  data_filtered['Total Pembayaran'] = data_filtered['Total Pembayaran'].astype(str).str.replace('.', '', regex=False)

  # Mengkonversi tipe data
  data_filtered['Total Pembayaran'] = pd.to_numeric(data_filtered['Total Pembayaran'])
  data_filtered['Waktu Pesanan Selesai'] = pd.to_datetime(data_filtered['Waktu Pesanan Selesai']).dt.date

  # Menghapus pesanan yang dibatalkan/tidak selesai
  data_final = data_filtered[data_filtered['Total Pembayaran'] != 0]

  # Assign value variabel time dan series
  time = data_final['Waktu Pesanan Selesai'].tolist()
  series = data_final['Total Pembayaran'].tolist()

  return time, series

## Global Variables

In [4]:
### Inisiasi variabel global ###

DATA_PATH = './data/Order.completed.20240903_20241003.xlsx'
TIME, SERIES = preprocess_data(DATA_PATH)

# Model
SPLIT_TIME = 90
WINDOW_SIZE = 7
BATCH_SIZE = 4
SHUFFLE_BUFFER_SIZE = 100

## Defining Functions
---



In [5]:
def plot_series(time, series, format="-", start=0, end=None):
    plt.figure(figsize=(16, 8))
    plt.plot(time[start:end], series[start:end], format)
    plt.xlabel("Time")
    plt.ylabel("Value")
    plt.grid(True)
    plt.gca().xaxis.set_major_formatter(DateFormatter("%Y-%m-%d"))  # Menyesuaikan format tanggal
    plt.gcf().autofmt_xdate()  # Memutar tanggal agar tidak bertabrakan
    plt.show()

In [6]:
def train_val_split(time, series):
    """Split time series into train and validation sets"""
    time_train = time[:SPLIT_TIME]
    series_train = series[:SPLIT_TIME]
    time_valid = time[SPLIT_TIME:]
    series_valid = series[SPLIT_TIME:]

    return time_train, series_train, time_valid, series_valid

In [7]:
def create_windowed_dataset(series, window_size=WINDOW_SIZE, batch_size=BATCH_SIZE, shuffle_buffer=SHUFFLE_BUFFER_SIZE):
    """
    Membuat dataset dalam bentuk windowed untuk time series.

    Parameters:
    - series: Data time series dalam bentuk array atau list.
    - window_size: Jumlah langkah dalam satu jendela input.
    - batch_size: Jumlah data yang diproses dalam satu batch.
    - shuffle_buffer: Ukuran buffer untuk pengacakan data.

    Returns:
    - dataset: Dataset dengan pasangan input-output berdasarkan window size.
    """

    # Konversi data series ke dalam dataset tensorflow
    dataset = tf.data.Dataset.from_tensor_slices(series)

    # Membuat window untuk setiap sample
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)

    # Konversi setiap window ke dalam bentuk batch
    dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))

    # Memisahkan window menjadi input (features) dan output (label)
    dataset = dataset.map(lambda window: (window[:-1], window[-1]))

    # Mengacak data menggunakan shuffle buffer
    dataset = dataset.shuffle(buffer_size=shuffle_buffer)

    # Membuat batch data
    dataset = dataset.batch(batch_size).prefetch(1)

    return dataset


In [8]:
def create_uncompiled_model():
    """Define uncompiled model

    Returns:
        tf.keras.Model: uncompiled model
    """

    model = tf.keras.models.Sequential([
        tf.keras.Input(shape=(WINDOW_SIZE, 1)),
        tf.keras.layers.Conv1D(filters=16, kernel_size=3, strides=1, padding='causal', activation='relu'),
        tf.keras.layers.LSTM(16, return_sequences=True),
        tf.keras.layers.LSTM(8),
        tf.keras.layers.Dense(8, activation='relu'),
        tf.keras.layers.Dense(1)
    ])

    return model

In [9]:
def create_model():
    """Creates and compiles the model

    Returns:
        tf.keras.Model: compiled model
    """

    model = create_uncompiled_model()

    model.compile(loss = tf.keras.losses.Huber(),
                  optimizer = tf.keras.optimizers.SGD(momentum=0.9),
                  metrics=["mae"])

    return model

In [10]:
time_train, series_train, time_valid, series_valid = train_val_split(TIME, SERIES)
series_train_windowed = create_windowed_dataset(series_train)

# model
uncompiled_model = create_uncompiled_model()

In [11]:
example_batch = series_train_windowed.take(1)

try:
	predictions = uncompiled_model.predict(example_batch, verbose=False)
except:
	print("Your model is not compatible with the dataset you defined earlier. Check that the loss function and last layer are compatible with one another.")
else:
	print("Your current architecture is compatible with the windowed dataset! :)")
	print(f"predictions have shape: {predictions.shape}")

Your current architecture is compatible with the windowed dataset! :)
predictions have shape: (4, 1)


  self.gen.throw(value)


In [12]:
model = create_model()

In [13]:
lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10**(epoch / 20))

history = model.fit(series_train_windowed, epochs=100, callbacks=[lr_schedule])

Epoch 1/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 6ms/step - loss: 136782.8594 - mae: 136783.3594 - learning_rate: 1.0000e-08
Epoch 2/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 159255.5312 - mae: 159256.0312 - learning_rate: 1.1220e-08
Epoch 3/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 144413.4531 - mae: 144413.9531 - learning_rate: 1.2589e-08
Epoch 4/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 128240.8516 - mae: 128241.3516 - learning_rate: 1.4125e-08
Epoch 5/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 129213.1172 - mae: 129213.6172 - learning_rate: 1.5849e-08
Epoch 6/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 149128.1094 - mae: 149128.6094 - learning_rate: 1.7783e-08
Epoch 7/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - l