# **1. Import Library**

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV

# **2. Memuat Dataset dari Hasil Clustering**

Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

In [6]:
# Memuat dataset hasil clustering
df = pd.read_csv('fish/fish.csv')  # Pastikan file ini sudah ada
print(df.head())

              species  length  weight  w_l_ratio
0  Anabas testudineus   10.66    3.45       0.32
1  Anabas testudineus    6.91    3.27       0.47
2  Anabas testudineus    8.38    3.46       0.41
3  Anabas testudineus    7.57    3.36       0.44
4  Anabas testudineus   10.83    3.38       0.31


# **3. Data Splitting**

Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

# **4. Membangun Model Klasifikasi**


## **a. Membangun Model Klasifikasi**

In [7]:
# Inisialisasi model
logreg = LogisticRegression(max_iter=1000, random_state=42)
rf = RandomForestClassifier(random_state=42)

# Latih model
logreg.fit(X_train, y_train)
rf.fit(X_train, y_train)

NameError: name 'X_train' is not defined

Tulis narasi atau penjelasan algoritma yang Anda gunakan.

## **b. Evaluasi Model Klasifikasi**

In [None]:
# Fungsi untuk evaluasi model
def evaluate_model(model, X_train, y_train, X_test, y_test):
    # Prediksi
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)
    
    # Hitung metrik
    accuracy_train = accuracy_score(y_train, y_pred_train)
    accuracy_test = accuracy_score(y_test, y_pred_test)
    f1_train = f1_score(y_train, y_pred_train, average='weighted')
    f1_test = f1_score(y_test, y_pred_test, average='weighted')
    
    # Confusion matrix
    cm = confusion_matrix(y_test, y_pred_test)
    
    # Tampilkan hasil
    print(f"Akurasi (Train): {accuracy_train:.2f}, Akurasi (Test): {accuracy_test:.2f}")
    print(f"F1-Score (Train): {f1_train:.2f}, F1-Score (Test): {f1_test:.2f}")
    print("Confusion Matrix:")
    print(cm)
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred_test))

# Evaluasi Logistic Regression
print("Logistic Regression:")
evaluate_model(logreg, X_train, y_train, X_test, y_test)

# Evaluasi Random Forest
print("\nRandom Forest:")
evaluate_model(rf, X_train, y_train, X_test, y_test)

Logistic Regression:
Akurasi (Train): 1.00, Akurasi (Test): 1.00
F1-Score (Train): 1.00, F1-Score (Test): 1.00
Confusion Matrix:
[[ 78   0   0]
 [  0 437   1]
 [  0   0 279]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        78
           1       1.00      1.00      1.00       438
           2       1.00      1.00      1.00       279

    accuracy                           1.00       795
   macro avg       1.00      1.00      1.00       795
weighted avg       1.00      1.00      1.00       795


Random Forest:
Akurasi (Train): 1.00, Akurasi (Test): 1.00
F1-Score (Train): 1.00, F1-Score (Test): 1.00
Confusion Matrix:
[[ 78   0   0]
 [  0 438   0]
 [  0   0 279]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        78
           1       1.00      1.00      1.00       438
           2       1.00      1.00      1.00       279

    accu

Tulis hasil evaluasi algoritma yang digunakan, jika Anda menggunakan 2 algoritma, maka bandingkan hasilnya.

## **c. Tuning Model Klasifikasi (Optional)**

In [None]:
# Tentukan parameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# GridSearchCV
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Model terbaik
best_rf = grid_search.best_estimator_
print(f"Best Parameters: {grid_search.best_params_}")

Best Parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}


## **d. Evaluasi Model Klasifikasi setelah Tuning (Optional)**

In [None]:
print("Random Forest (After Tuning):")
evaluate_model(best_rf, X_train, y_train, X_test, y_test)

Random Forest (After Tuning):
Akurasi (Train): 1.00, Akurasi (Test): 1.00
F1-Score (Train): 1.00, F1-Score (Test): 1.00
Confusion Matrix:
[[ 78   0   0]
 [  0 438   0]
 [  0   0 279]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        78
           1       1.00      1.00      1.00       438
           2       1.00      1.00      1.00       279

    accuracy                           1.00       795
   macro avg       1.00      1.00      1.00       795
weighted avg       1.00      1.00      1.00       795



## **e. Analisis Hasil Evaluasi Model Klasifikasi**

Logistic Regression: Akurasi dan F1-score sempurna (100%), tetapi Random Forest lebih unggul.

Random Forest: Akurasi dan F1-score mencapai 100%, menunjukkan performa yang sangat baik.

Setelah Tuning: Performa Random Forest tetap(akurasi 100%).

Rekomendasi:

    Jika hasil belum memuaskan, coba algoritma lain seperti Gradient Boosting atau SVM.

    Kumpulkan lebih banyak data untuk meningkatkan generalisasi model.