# **Naive Bayes**  
Naïve Bayes adalah algoritma klasifikasi berbasis teorema Bayes, yang berasumsi bahwa setiap fitur dalam dataset saling independen satu sama lain  

## **1. Teori Dasar Naïve Bayes**
Naïve Bayes bekerja dengan menghitung probabilitas suatu kelas berdasarkan data yang ada, menggunakan Teorema Bayes :

$$
P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
$$

Di mana:
- $ P(A | B) $ = Probabilitas hipotesis $ A $ benar jika diberikan bukti $ B $.
- $ P(B | A) $ = Probabilitas mendapatkan bukti $ B $ jika hipotesis $ A $ benar.
- $ P(A) $ = Probabilitas awal hipotesis $ A $ (prior).
- $ P(B) $ = Probabilitas bukti $ B $ terjadi.

## **2. Prinsip "Naive" (Independensi Fitur)**
Naive Bayes mengasumsikan bahwa setiap fitur dalam dataset **tidak saling bergantung**, sehingga probabilitas gabungan fitur dapat dihitung sebagai:

$$
P(A | X_1, X_2, ..., X_n) = \frac{P(X_1, X_2, ..., X_n | A) \cdot P(A)}{P(X_1, X_2, ..., X_n)}
$$

Karena diasumsikan fitur **independen**, maka :

$$
P(X_1, X_2, ..., X_n | A) = P(X_1 | A) \cdot P(X_2 | A) \cdot ... \cdot P(X_n | A)
$$

Sehingga :

$$
P(A | X_1, X_2, ..., X_n) = \frac{P(A) \cdot \prod_{i=1}^{n} P(X_i | A)}{P(X_1, X_2, ..., X_n)}
$$

## **3. Jenis Naive Bayes**
### **a) Gaussian Naive Bayes** (Untuk Data Numerik)
Jika fitur $ X $ mengikuti **distribusi normal (Gaussian)**, maka probabilitas dihitung dengan :

$$
P(x | C) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{-(x-\mu)^2}{2\sigma^2}}
$$

Di mana :
- $ \mu $ = Rata-rata fitur dalam kelas tertentu
- $ \sigma $ = Standar deviasi fitur dalam kelas tertentu
- $ x $ = Nilai fitur

### **b) Multinomial Naive Bayes** (Untuk Data Kategori)
Digunakan untuk **klasifikasi teks** berdasarkan frekuensi kata :

$$
P(X | C) = \frac{(N_{c, X} + \alpha)}{(N_c + \alpha \cdot d)}
$$

Di mana :
- $ N_{c, X} $ = Jumlah kata $ X $ dalam kelas $ C $
- $ N_c $ = Total jumlah kata dalam kelas $ C $
- $ d $ = Jumlah total kata unik dalam semua kelas
- $ \alpha $ = Parameter smoothing (Laplace Smoothing)

### **c) Bernoulli Naive Bayes** (Untuk Data Biner)
Digunakan jika fitur hanya memiliki dua kemungkinan (ada/tidak ada) :

$$
P(X | C) = P(X_1 | C)^{x_1} \cdot P(X_2 | C)^{x_2} \cdot ... \cdot P(X_n | C)^{x_n} \cdot (1 - P(X_1 | C))^{(1 - x_1)}
$$

## **4. Kelebihan dan kekurangan :**
### **Kelebihan :**
1. Cepat & Efisien → Dapat bekerja dengan dataset besar dengan waktu komputasi cepat.
2. Mudah diimplementasikan → Algoritma sederhana dan membutuhkan sedikit data untuk pelatihan.
3. Performa bagus untuk klasifikasi teks → Banyak digunakan dalam analisis sentimen dan deteksi spam.

### **Kekurangan :**
1. Asumsi independensi fitur → Tidak realistis dalam banyak kasus, karena fitur sering memiliki korelasi.
2. Menghasilkan probabilitas nol → Jika suatu fitur tidak pernah muncul dalam kelas tertentu, probabilitasnya menjadi 0, sehingga harus digunakan teknik Laplace Smoothing.
3. Kurang fleksibel dibanding model yang lebih kompleks → Model seperti Decision Tree atau Neural Network bisa menangkap hubungan antar fitur lebih baik.

## **5. Contoh penggunaan Naive Bayes :**
1. Deteksi Email Spam → Mengklasifikasikan email sebagai spam atau tidak.
2. Analisis Sentimen → Menganalisis ulasan positif atau negatif.
3. Klasifikasi Penyakit → Menentukan kemungkinan penyakit berdasarkan gejala.
4. Sistem Rekomendasi Film → Menentukan preferensi pengguna berdasarkan riwayat.

# **Implementasi Naive Bayes untuk klasifikasi data**
## Implementasi pada data yang ada outliernya
**Import library**

In [1]:
%pip install numpy




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install pandas




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


Note: you may need to restart the kernel to use updated packages.


In [3]:
#Untuk manipulasi data.
import numpy as np 
import pandas as pd

from sklearn.preprocessing import LabelEncoder # untuk Mengubah data kategori menjadi numerik.  
from sklearn.model_selection import train_test_split # untuk Memisahkan data latih dan uji.  
from sklearn.preprocessing import StandardScaler # untuk Standarisasi fitur.
from sklearn.naive_bayes import GaussianNB # untuk Menggunakan algoritma Naive Bayes.

# Untuk evaluasi model.  
from sklearn.metrics import confusion_matrix 
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

**Load Dataset**

In [4]:
dataset = pd.read_csv('hasil_gabungan.csv') # Membaca dataset dalam format CSV.  
dataset.head() #Menampilkan 5 baris pertama untuk melihat struktur dataset.

Unnamed: 0,id,class,petal_length,petal_width,sepal_length,sepal_width
0,1,Iris-setosa,86.4,70.5,20.1,30.5
1,2,Iris-setosa,1.4,0.2,4.9,3.0
2,3,Iris-setosa,1.3,0.2,4.7,3.2
3,4,Iris-setosa,1.5,0.2,4.6,3.1
4,5,Iris-setosa,1.4,0.2,5.0,3.6


**Encoding label**

In [5]:
# LabelEncoder digunakan untuk mengubah nilai kategori menjadi angka.
en = LabelEncoder() 

# fit_transform() → Menerapkan encoding pada kolom 'class' agar bisa digunakan dalam model machine learning.
dataset['class'] = en.fit_transform(dataset['class'])
dataset.head()

Unnamed: 0,id,class,petal_length,petal_width,sepal_length,sepal_width
0,1,0,86.4,70.5,20.1,30.5
1,2,0,1.4,0.2,4.9,3.0
2,3,0,1.3,0.2,4.7,3.2
3,4,0,1.5,0.2,4.6,3.1
4,5,0,1.4,0.2,5.0,3.6


**Memisahkan fitur dan label**

In [6]:
X = dataset.iloc[:, 2:].values  # Mengambil semua kolom fitur (tanpa 'id' dan 'class')
y = dataset.iloc[:, 1].values   # Mengambil kolom 'class' sebagai label

In [7]:
X

array([[86.4, 70.5, 20.1, 30.5],
       [ 1.4,  0.2,  4.9,  3. ],
       [ 1.3,  0.2,  4.7,  3.2],
       [ 1.5,  0.2,  4.6,  3.1],
       [ 1.4,  0.2,  5. ,  3.6],
       [ 1.7,  0.4,  5.4,  3.9],
       [ 1.4,  0.3,  4.6,  3.4],
       [ 1.5,  0.2,  5. ,  3.4],
       [ 1.4,  0.2,  4.4,  2.9],
       [ 1.5,  0.1,  4.9,  3.1],
       [ 1.5,  0.2,  5.4,  3.7],
       [ 1.6,  0.2,  4.8,  3.4],
       [ 1.4,  0.1,  4.8,  3. ],
       [ 1.1,  0.1,  4.3,  3. ],
       [ 1.2,  0.2,  5.8,  4. ],
       [ 1.5,  0.4,  5.7,  4.4],
       [ 1.3,  0.4,  5.4,  3.9],
       [ 1.4,  0.3,  5.1,  3.5],
       [ 1.7,  0.3,  5.7,  3.8],
       [ 1.5,  0.3,  5.1,  3.8],
       [ 1.7,  0.2,  5.4,  3.4],
       [ 1.5,  0.4,  5.1,  3.7],
       [ 1. ,  0.2,  4.6,  3.6],
       [ 1.7,  0.5,  5.1,  3.3],
       [ 1.9,  0.2,  4.8,  3.4],
       [ 1.6,  0.2,  5. ,  3. ],
       [ 1.6,  0.4,  5. ,  3.4],
       [ 1.5,  0.2,  5.2,  3.5],
       [ 1.4,  0.2,  5.2,  3.4],
       [ 1.6,  0.2,  4.7,  3.2],
       [ 1

In [8]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [9]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)  # Membagi data menjadi data latih dan data uji

print("x_train shape: ", x_train.shape) # menampilkan jumlah sampel (baris) data latih dan fitur (kolom) data latih
print("x_test shape: ", x_test.shape) # menampilkan jumlah sampel (baris) data uji dan fitur (kolom) data uji
print("y_train shape: ", y_train.shape) # menampilkan jumlah sampel (baris) data latih dan label (kolom) data latih
print("y_test shape: ", y_test.shape) # menampilkan jumlah sampel (baris) data uji dan label (kolom) data uji

x_train shape:  (120, 4)
x_test shape:  (30, 4)
y_train shape:  (120,)
y_test shape:  (30,)


In [10]:
# Menampilkan evaluasi model
akurasi = classification_report(y_test, y_pred)
print(akurasi)

NameError: name 'y_pred' is not defined

**Normalisasi data**

In [13]:
sc = StandardScaler() # Membuat objek StandardScaler
x_train = sc.fit_transform(x_train) # Standarisasi fitur data latih
x_test = sc.transform(x_test) # Standarisasi fitur data uji

In [14]:
x_train

array([[ 2.05248585e-01, -4.05331282e-02,  1.84045284e-01,
        -1.55822932e-01],
       [ 6.23540005e-02, -9.26937221e-02, -8.69416372e-02,
        -2.46740688e-01],
       [-3.79320170e-01, -2.62215652e-01, -3.38572350e-01,
        -9.52110945e-02],
       [ 1.06365533e+01,  8.83980798e+00,  2.64228378e+00,
         4.04154681e+00],
       [-6.75501673e-02, -1.57894464e-01, -8.69416372e-02,
        -2.46740688e-01],
       [-1.58483085e-01, -1.57894464e-01, -2.80503724e-01,
        -2.31587729e-01],
       [ 2.44219835e-01, -2.74929797e-02,  2.80826327e-01,
        -4.29333848e-03],
       [-3.66329753e-01, -2.36135355e-01, -2.03078889e-01,
         1.08596209e-02],
       [-4.18291420e-01, -2.36135355e-01, -2.03078889e-01,
         1.08596209e-02],
       [-8.05405840e-02, -1.31814167e-01, -1.25654054e-01,
        -1.70975891e-01],
       [-1.32502251e-01, -1.57894464e-01, -2.80503724e-01,
        -2.77046607e-01],
       [-4.31281837e-01, -2.62215652e-01, -2.80503724e-01,
      

In [15]:
x_test

array([[ 0.04936358, -0.09269372, -0.02887301, -0.20128181],
       [ 0.12730608, -0.01445283,  0.06790803, -0.12551701],
       [ 0.1402965 , -0.00141268, -0.0095168 , -0.15582293],
       [-0.05455975, -0.11877402, -0.16436647, -0.12551701],
       [-0.39231059, -0.2752558 , -0.29985993, -0.11036405],
       [ 0.03637317, -0.05357328, -0.08694164, -0.12551701],
       [-0.0155885 , -0.11877402, -0.02887301, -0.23158773],
       [-0.41829142, -0.26221565, -0.39664098, -0.09521109],
       [-0.405301  , -0.26221565, -0.39664098, -0.14066997],
       [-0.0155885 , -0.13181417, -0.18372268, -0.18612885],
       [ 0.07534442,  0.01162747,  0.08726424, -0.11036405],
       [-0.405301  , -0.26221565, -0.18372268,  0.0563185 ],
       [-0.08054058, -0.10573387, -0.24179131, -0.17097589],
       [ 0.12730608, -0.05357328,  0.00983941, -0.12551701],
       [ 0.20524859,  0.01162747,  0.24211391, -0.12551701],
       [ 0.16627733, -0.00141268,  0.00983941, -0.12551701],
       [-0.41829142, -0.

In [16]:
clasifier = GaussianNB() # Membuat objek Gaussian Naive Bayes
clasifier.fit(x_train, y_train) # Melatih model menggunakan data latih

In [17]:
y_pred = clasifier.predict(x_test) # Melakukan prediksi menggunakan data uji
y_pred # Menampilkan hasil prediksi

array([1, 2, 2, 1, 0, 1, 1, 0, 0, 1, 2, 0, 1, 1, 2, 2, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 0, 0, 1, 2, 0])

In [18]:
clasifier.predict_proba(x_test) # Menampilkan probabilitas hasil prediksi

array([[4.94795173e-05, 9.99806387e-01, 1.44133839e-04],
       [1.46102923e-01, 1.41033717e-01, 7.12863360e-01],
       [1.74002226e-01, 4.23104233e-02, 7.83687351e-01],
       [1.36640183e-05, 9.99980582e-01, 5.75400156e-06],
       [9.99999944e-01, 2.79600391e-11, 5.57106190e-08],
       [2.72734812e-04, 9.99085453e-01, 6.41812465e-04],
       [3.00352869e-05, 9.99936519e-01, 3.34457783e-05],
       [9.99999991e-01, 2.53732853e-12, 9.21457038e-09],
       [9.99999976e-01, 1.95046674e-11, 2.36795901e-08],
       [1.72607396e-05, 9.99965506e-01, 1.72334787e-05],
       [1.93456080e-01, 1.12264710e-02, 7.95317449e-01],
       [9.99999978e-01, 5.87987412e-14, 2.21457168e-08],
       [3.24490816e-05, 9.99960581e-01, 6.96991112e-06],
       [6.40242934e-03, 9.64538006e-01, 2.90595642e-02],
       [1.83732070e-01, 1.05566290e-06, 8.16266874e-01],
       [1.90875186e-01, 8.88230910e-03, 8.00242505e-01],
       [9.99999991e-01, 2.44408151e-11, 8.78389160e-09],
       [9.99999999e-01, 1.30081

In [19]:
cm = confusion_matrix(y_test, y_pred) # Membuat confusion matrix
cm # Menampilkan confusion matrix

array([[13,  0,  0],
       [ 0,  6,  0],
       [ 0,  5,  6]], dtype=int64)

In [20]:
# Menampilkan evaluasi model
akurasi = classification_report(y_test, y_pred)
print(akurasi)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       0.55      1.00      0.71         6
           2       1.00      0.55      0.71        11

    accuracy                           0.83        30
   macro avg       0.85      0.85      0.80        30
weighted avg       0.91      0.83      0.83        30



In [21]:
ydata = pd.DataFrame() # Membuat dataframe kosong
ydata['y_test'] = pd.DataFrame(y_test) # Menambahkan kolom 'y_test' pada dataframe ydata
ydata['y_pred'] = pd.DataFrame(y_pred) # Menambahkan kolom 'y_pred' pada dataframe ydata
ydata # Menampilkan dataframe ydata

Unnamed: 0,y_test,y_pred
0,1,1
1,2,2
2,2,2
3,1,1
4,0,0
5,2,1
6,1,1
7,0,0
8,0,0
9,1,1


## Implementasi pada data yang sudah dihapus outliernya

**Load Dataset**

In [7]:
dataset = pd.read_csv('cleaned_data.csv') # Membaca dataset dalam format CSV.
dataset.head() #Menampilkan 5 baris pertama untuk melihat struktur dataset.

Unnamed: 0,id,class,petal_length,petal_width,sepal_length,sepal_width
0,2,Iris-setosa,1.4,0.2,4.9,3.0
1,3,Iris-setosa,1.3,0.2,4.7,3.2
2,4,Iris-setosa,1.5,0.2,4.6,3.1
3,5,Iris-setosa,1.4,0.2,5.0,3.6
4,6,Iris-setosa,1.7,0.4,5.4,3.9


**Encoding Label**

In [4]:
# LabelEncoder digunakan untuk mengubah nilai kategori menjadi angka.
en = LabelEncoder() 

# fit_transform() → Menerapkan encoding pada kolom 'class' agar bisa digunakan dalam model machine learning.
dataset['class'] = en.fit_transform(dataset['class'])
dataset.head()

Unnamed: 0,id,class,petal_length,petal_width,sepal_length,sepal_width
0,2,0,1.4,0.2,4.9,3.0
1,3,0,1.3,0.2,4.7,3.2
2,4,0,1.5,0.2,4.6,3.1
3,5,0,1.4,0.2,5.0,3.6
4,6,0,1.7,0.4,5.4,3.9


**Memisahkan Fitur dan Label**

In [5]:
X = dataset.iloc[:, 2:].values  # Mengambil semua kolom fitur (tanpa 'id' dan 'class')
y = dataset.iloc[:, 1].values   # Mengambil kolom 'class' sebagai label

In [8]:
X

array([[1.4, 0.2, 4.9, 3. ],
       [1.3, 0.2, 4.7, 3.2],
       [1.5, 0.2, 4.6, 3.1],
       [1.4, 0.2, 5. , 3.6],
       [1.7, 0.4, 5.4, 3.9],
       [1.4, 0.3, 4.6, 3.4],
       [1.5, 0.2, 5. , 3.4],
       [1.4, 0.2, 4.4, 2.9],
       [1.5, 0.1, 4.9, 3.1],
       [1.5, 0.2, 5.4, 3.7],
       [1.6, 0.2, 4.8, 3.4],
       [1.4, 0.1, 4.8, 3. ],
       [1.1, 0.1, 4.3, 3. ],
       [1.3, 0.4, 5.4, 3.9],
       [1.4, 0.3, 5.1, 3.5],
       [1.7, 0.3, 5.7, 3.8],
       [1.5, 0.3, 5.1, 3.8],
       [1.7, 0.2, 5.4, 3.4],
       [1.5, 0.4, 5.1, 3.7],
       [1. , 0.2, 4.6, 3.6],
       [1.7, 0.5, 5.1, 3.3],
       [1.9, 0.2, 4.8, 3.4],
       [1.6, 0.2, 5. , 3. ],
       [1.6, 0.4, 5. , 3.4],
       [1.5, 0.2, 5.2, 3.5],
       [1.4, 0.2, 5.2, 3.4],
       [1.6, 0.2, 4.7, 3.2],
       [1.6, 0.2, 4.8, 3.1],
       [1.5, 0.4, 5.4, 3.4],
       [1.5, 0.1, 5.2, 4.1],
       [1.4, 0.2, 5.5, 4.2],
       [1.5, 0.1, 4.9, 3.1],
       [1.2, 0.2, 5. , 3.2],
       [1.3, 0.2, 5.5, 3.5],
       [1.5, 0

In [9]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2])

In [10]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)  # Membagi data menjadi data latih dan data uji

print("x_train shape: ", x_train.shape) # menampilkan jumlah sampel (baris) data latih dan fitur (kolom) data latih
print("x_test shape: ", x_test.shape) # menampilkan jumlah sampel (baris) data uji dan fitur (kolom) data uji
print("y_train shape: ", y_train.shape) # menampilkan jumlah sampel (baris) data latih dan label (kolom) data latih
print("y_test shape: ", y_test.shape) # menampilkan jumlah sampel (baris) data uji dan label (kolom) data uji

x_train shape:  (108, 4)
x_test shape:  (27, 4)
y_train shape:  (108,)
y_test shape:  (27,)


**Normalisasi Data**

In [11]:
sc = StandardScaler() # Membuat objek StandardScaler
x_train = sc.fit_transform(x_train) # Standarisasi fitur data latih
x_test = sc.transform(x_test) # Standarisasi fitur data uji

In [12]:
x_train

array([[-1.51694027, -1.37720833, -1.85814471, -0.05929995],
       [ 0.40410978,  0.36164104, -0.32135113, -0.05929995],
       [-1.51694027, -1.37720833, -0.44941726,  1.27494882],
       [ 0.22401133,  0.09412575, -0.32135113, -0.85984921],
       [ 0.76430666,  0.76291397,  0.06284727, -0.05929995],
       [ 0.40410978,  0.36164104,  0.44704567, -2.19409798],
       [-1.51694027, -1.37720833, -1.85814471,  0.47439956],
       [-1.69703871, -1.37720833, -1.60201245,  1.54179858],
       [ 0.40410978,  0.36164104,  0.1909134 , -0.3261497 ],
       [-1.33684183, -1.10969304, -1.08974792,  1.00809907],
       [-1.39687464, -1.37720833, -0.57748339,  1.80864833],
       [ 1.3646348 ,  1.43170218,  2.36803765, -0.05929995],
       [ 0.34407696,  0.09412575,  0.5751118 , -1.92724822],
       [-1.33684183, -1.37720833, -0.96168179,  2.07549809],
       [ 0.70427384,  0.62915632,  1.08737633, -0.05929995],
       [ 0.16397852,  0.09412575, -0.193285  , -0.59299945],
       [ 1.06447073,  0.

In [13]:
x_test

array([[ 0.9444051 ,  1.1641869 ,  1.34350859,  0.20754981],
       [ 0.22401133, -0.03963189, -0.193285  , -0.05929995],
       [ 0.88437229,  0.89667161,  0.70317793, -0.85984921],
       [ 1.00443791,  0.76291397,  0.83124406, -0.05929995],
       [-1.27680901, -1.10969304, -0.57748339,  2.34234784],
       [ 1.06447073,  0.22788339,  0.31897953, -1.12669896],
       [-0.01611992, -0.17338954, -0.44941726, -1.66039847],
       [ 0.76430666,  1.03042925,  0.83124406,  0.47439956],
       [ 0.58420822,  0.76291397,  0.44704567, -0.59299945],
       [-1.45690746, -1.37720833, -1.21781405, -0.05929995],
       [-1.39687464, -1.51096597, -1.21781405,  0.20754981],
       [-1.51694027, -1.24345069, -1.08974792,  1.27494882],
       [-1.39687464, -1.10969304, -0.57748339,  1.00809907],
       [ 0.9444051 ,  1.43170218,  0.44704567,  1.00809907],
       [-1.45690746, -1.37720833, -1.08974792,  0.74124932],
       [-1.39687464, -1.51096597, -0.83361566,  2.87604735],
       [-1.39687464, -1.

In [14]:
clasifier = GaussianNB() # Membuat objek Gaussian Naive Bayes
clasifier.fit(x_train, y_train) # Melatih model menggunakan data latih

In [15]:
y_pred = clasifier.predict(x_test) # Melakukan prediksi menggunakan data uji
y_pred # Menampilkan hasil prediksi

array([2, 1, 2, 2, 0, 2, 1, 2, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 0, 1, 2])

In [16]:
clasifier.predict_proba(x_test) # Menampilkan probabilitas hasil prediksi

array([[4.15100357e-181, 1.70962237e-006, 9.99998290e-001],
       [7.33378475e-068, 9.99979584e-001, 2.04163416e-005],
       [1.25498344e-157, 1.77727508e-003, 9.98222725e-001],
       [1.53453925e-159, 9.94946374e-004, 9.99005054e-001],
       [1.00000000e+000, 1.70275359e-019, 5.73531566e-029],
       [9.05567361e-137, 3.93442932e-001, 6.06557068e-001],
       [1.22206630e-051, 9.99999773e-001, 2.27253275e-007],
       [4.44079322e-156, 2.86486550e-004, 9.99713513e-001],
       [9.41002657e-127, 3.05812334e-001, 6.94187666e-001],
       [1.00000000e+000, 7.80444344e-024, 2.85381671e-034],
       [1.00000000e+000, 1.58440388e-024, 2.46413726e-034],
       [1.00000000e+000, 2.02432202e-024, 8.73129504e-035],
       [1.00000000e+000, 1.41638533e-019, 6.67197908e-030],
       [2.82040438e-194, 4.83136368e-008, 9.99999952e-001],
       [1.00000000e+000, 1.50262603e-024, 1.14169811e-034],
       [1.00000000e+000, 2.10979110e-027, 5.05225559e-036],
       [1.00000000e+000, 6.64977997e-024

In [17]:
cm = confusion_matrix(y_test, y_pred) # Membuat confusion matrix
cm # Menampilkan confusion matrix

array([[13,  0,  0],
       [ 0,  5,  0],
       [ 0,  1,  8]], dtype=int64)

In [18]:
# Menampilkan evaluasi model
akurasi = classification_report(y_test, y_pred)
print(akurasi)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       0.83      1.00      0.91         5
           2       1.00      0.89      0.94         9

    accuracy                           0.96        27
   macro avg       0.94      0.96      0.95        27
weighted avg       0.97      0.96      0.96        27



In [19]:
ydata = pd.DataFrame() # Membuat dataframe kosong
ydata['y_test'] = pd.DataFrame(y_test) # Menambahkan kolom 'y_test' pada dataframe ydata
ydata['y_pred'] = pd.DataFrame(y_pred) # Menambahkan kolom 'y_pred' pada dataframe ydata
ydata # Menampilkan dataframe ydata

Unnamed: 0,y_test,y_pred
0,2,2
1,1,1
2,2,2
3,2,2
4,0,0
5,2,2
6,1,1
7,2,2
8,2,2
9,0,0
