# **Klasifikasi Kematangan Pisang**

Notebook ini mendemonstrasikan proses klasifikasi tingkat kematangan pisang (mentah, matang, busuk) berdasarkan data citra RGB. Proses ini mencakup pemuatan data, *feature engineering*, pelatihan model, dan evaluasi menggunakan CatBoost.

## Instalasi, Impor, dan Muat Data

In [9]:
!pip install -q catboost pandas scikit-learn matplotlib

import pandas as pd
import numpy as np
import colorsys
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
file_name = "Untitled spreadsheet - Sheet1.csv"
df = pd.read_csv(file_name)

print("Data berhasil dimuat:")
print(df.head())

Data berhasil dimuat:
                     id  label    r    g   b                      timestamp  \
0  NoSJvUrfcoDWISxk9cyK      1  111  187  97  2025-11-08T17:18:05.504+07:00   
1  3hHcToxTZTlfvhyj5XN6      2  116  182  97  2025-11-08T17:38:04.997+07:00   
2  CSUfD9zgyhVh9S0vgENw      3  111  188  97  2025-11-08T17:42:14.559+07:00   
3  4YBW5cHhpdmHZ90c4MLV      4  109  186  99  2025-11-08T17:45:28.646+07:00   
4  Kf4VEdxOmxcLVTLRgRuU      5  111  189  96  2025-11-08T17:55:51.915+07:00   

  ripeness  
0   mentah  
1   mentah  
2   mentah  
3   mentah  
4   mentah  


## Feature Engineering

In [10]:
EPS = 1e-6

def add_features(d):
    d = d.copy()
    for c in ['r','g','b']:
        d[c] = d[c].astype(float)

    s = d[['r','g','b']].sum(axis=1) + EPS
    d['brightness'] = s / 3.0
    d['r_chroma'] = d['r'] / s
    d['g_chroma'] = d['g'] / s
    d['b_chroma'] = d['b'] / s
    d['rg_ratio'] = d['r'] / (d['g'] + EPS)
    d['gb_ratio'] = d['g'] / (d['b'] + EPS)
    d['br_ratio'] = d['b'] / (d['r'] + EPS)

    hsv_list = d[['r','g','b']].apply(
        lambda row: colorsys.rgb_to_hsv(row['r']/255, row['g']/255, row['b']/255), axis=1
    )
    d['hue'] = [h for h,s,v in hsv_list]
    d['saturation'] = [s for h,s,v in hsv_list]
    d['value'] = [v for h,s,v in hsv_list]
    return d

df_fe = add_features(df)

feature_cols = ['r','g','b','brightness','r_chroma','g_chroma','b_chroma',
                'rg_ratio','gb_ratio','br_ratio',
                'hue', 'saturation', 'value']
X = df_fe[feature_cols]
y = df_fe['ripeness']

print(f"Fitur (X) siap dengan {len(feature_cols)} kolom.")
print(f"Target (y) siap: 'ripeness'")

Fitur (X) siap dengan 13 kolom.
Target (y) siap: 'ripeness'


## Train-Test Split

In [11]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42, stratify=y
)
print("Data berhasil dibagi (80% train, 20% test). Distribusi y_train:")
print(y_train.value_counts(normalize=True))

Data berhasil dibagi (80% train, 20% test). Distribusi y_train:
ripeness
matang    0.401596
busuk     0.313830
mentah    0.284574
Name: proportion, dtype: float64


## Training Data

In [12]:
from catboost import CatBoostClassifier

model = CatBoostClassifier(
    loss_function='MultiClass',
    eval_metric='TotalF1',
    iterations=1000,
    depth=6,
    learning_rate=0.1,
    l2_leaf_reg=3.0,
    random_seed=42,
    verbose=100,
    early_stopping_rounds=50,
    auto_class_weights='Balanced'
)

model.fit(X_train, y_train, eval_set=(X_test, y_test))


0:	learn: 0.8186149	test: 0.8959058	best: 0.8959058 (0)	total: 11.2ms	remaining: 11.2s
Stopped by overfitting detector  (50 iterations wait)

bestTest = 0.8959057739
bestIteration = 0

Shrink model to first 1 iterations.


<catboost.core.CatBoostClassifier at 0x7f3784addeb0>

## Memuat Model

In [13]:
try:
    model.save_model("catboost_banana_model.cbm")
    print("Model CatBoost berhasil disimpan ke 'catboost_banana_model.cbm'")
except Exception as e:
    print(f"Gagal menyimpan model: {e}")

Model CatBoost berhasil disimpan ke 'catboost_banana_model.cbm'
