<a href="https://colab.research.google.com/github/Jammyeong/MachineLearningClass/blob/main/3rdWeek/Tugas_ML_3_Infrared.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# Load dataset
df = pd.read_csv('/content/drive/MyDrive/smt akhir/ml/Infrared.csv')

In [3]:
# Encode kategorikal
df_encoded = df.copy()
le = LabelEncoder()
for col in ['Gender', 'Age', 'Ethnicity']:
    df_encoded[col] = le.fit_transform(df_encoded[col])

# Imputasi missing value pada kolom Distance
df_encoded['Distance'] = df_encoded['Distance'].fillna(df_encoded['Distance'].median())

In [4]:
# Split fitur dan target
X = df_encoded.drop('aveOralM', axis=1)
y = df_encoded['aveOralM']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# K-NN Model
knn = KNeighborsRegressor()
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)

# Decision Tree Model
tree = DecisionTreeRegressor(random_state=42)
tree.fit(X_train, y_train)
y_pred_tree = tree.predict(X_test)

In [6]:
# K-NN
mse_knn = mean_squared_error(y_test, y_pred_knn)
rmse_knn = np.sqrt(mse_knn)
r2_knn = r2_score(y_test, y_pred_knn)

# Decision Tree
mse_tree = mean_squared_error(y_test, y_pred_tree)
rmse_tree = np.sqrt(mse_tree)
r2_tree = r2_score(y_test, y_pred_tree)

# Tampilkan hasil
results = pd.DataFrame({
    'Model': ['K-NN', 'Decision Tree'],
    'MSE': [mse_knn, mse_tree],
    'RMSE': [rmse_knn, rmse_tree],
    'R² Score': [r2_knn, r2_tree]
})

In [7]:
# Output K-NN
print("K-Nearest Neighbors (K-NN) Results:")
print(f"MSE : {mse_knn:.4f}")
print(f"RMSE: {rmse_knn:.4f}")
print(f"R²  : {r2_knn:.4f}\n")

# Output Decision Tree
print("Decision Tree Results:")
print(f"MSE : {mse_tree:.4f}")
print(f"RMSE: {rmse_tree:.4f}")
print(f"R²  : {r2_tree:.4f}")

K-Nearest Neighbors (K-NN) Results:
MSE : 0.0955
RMSE: 0.3091
R²  : 0.5464

Decision Tree Results:
MSE : 0.1136
RMSE: 0.3371
R²  : 0.4604


### 📐 Mean Squared Error (MSE)

MSE (Mean Squared Error) adalah rata-rata dari kuadrat selisih antara nilai prediksi dan nilai aktual.

Rumus:

    MSE = (1 / n) * Σ (yᵢ - ŷᵢ)²

Keterangan:
- yᵢ : nilai aktual ke-i
- ŷᵢ : nilai prediksi ke-i
- n  : jumlah sampel

Semakin kecil nilai MSE, semakin baik model karena menunjukkan bahwa prediksi semakin mendekati nilai aktual.


### 📐 Root Mean Squared Error (RMSE)

RMSE adalah akar dari Mean Squared Error. RMSE digunakan untuk mengetahui seberapa jauh prediksi model dari nilai aktual dalam satuan yang sama dengan target aslinya.

Rumus:

    RMSE = √( (1 / n) * Σ (yᵢ - ŷᵢ)² )

Keterangan:
- yᵢ : nilai aktual ke-i
- ŷᵢ : nilai prediksi ke-i
- n  : jumlah sampel

Semakin kecil nilai RMSE, semakin kecil kesalahan prediksi, dan semakin baik model.


### 📐 R² Score (Koefisien Determinasi)

R² Score mengukur seberapa baik model menjelaskan variabilitas dari data target. Nilainya berada antara 0 dan 1.

Rumus:

    R² = 1 - ( Σ (yᵢ - ŷᵢ)² ) / ( Σ (yᵢ - ȳ)² )

Keterangan:
- yᵢ : nilai aktual ke-i
- ŷᵢ : nilai prediksi ke-i
- ȳ  : rata-rata nilai aktual
- Semakin mendekati 1 → model semakin baik dalam menjelaskan variansi data.
- R² = 1 artinya prediksi sempurna.
