# Classification Task dengan Naive Bayes

Referensi: [https://en.wikipedia.org/wiki/Naive_Bayes_classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

## Bayes' Theorem

Bayes' theorem menawarkan suatu formula untuk menghitung nilai probability dari suatu event dengan memanfaatkan pengetahuan sebelumnya dari kondisi terkait; atau sering kali dikenal dengan istilah conditional probability.

$P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$ <br/>

$P(y|X) = \frac{P(X|y) \times P(y)}{P(X)}$ 

## Pengenalan Naive Bayes

##### Studi Kasus

![](./images/asep_joko_snack.png)

##### Prior Probability: $P(y)$
- Referensi: [https://en.wikipedia.org/wiki/Prior_probability](https://en.wikipedia.org/wiki/Prior_probability)
- $P(Asep) = 0.5$
- $P(Joko) = 0.5$

##### Normalizer: $P(X)$
- pesanan: lumpia, bakso
- Asep: $0.1 \times 0.8 \times 0.5 = 0.04$
- Joko: $0.3 \times 0.2 \times 0.5 = 0.03$
- Normalizer $= 0.04 + 0.03 = 0.07$

##### Posterior Probability: $P(y|X)$ (kasus 1)
- Referensi: [https://en.wikipedia.org/wiki/Posterior_probability](https://en.wikipedia.org/wiki/Posterior_probability)
- pesanan: lumpia, bakso
- $P($Asep$|$lumpia,bakso$) = \frac{0.04}{0.07} = 0.57$
- $P($Joko$|$lumpia,bakso$) = \frac{0.03}{0.07} = 0.43$

##### Posterior Probability: $P(y|X)$ (kasus 2)
- pesanan: siomay, bakso
- Normalizer = $(0.1 \times 0.8 \times 0.5) + (0.5 \times 0.2 \times 0.5) = 0.09$
- $P($Asep$|$siomay,bakso$) = \frac{0.1 \times 0.8 \times 0.5}{0.09} = 0.444$
- $P($Joko$|$siomay,bakso$) = \frac{0.5 \times 0.2 \times 0.5}{0.09} = 0.555$

## Dataset: Breast Cancer Wisconsin (Diagnostic) 

Referensi: [https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic))

#### Load Dataset

In [1]:
from sklearn.datasets import load_breast_cancer

# load_breast_cancer?
X, y = load_breast_cancer(return_X_y=True)

#### Trainig & Testing Set

In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=0)

## Naive Bayes dengan Scikit Learn

In [3]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

model = GaussianNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

0.9298245614035088

In [4]:
model.score(X_test, y_test)

0.9298245614035088

## Membandingkan Performa Classifiers

##### Naive Bayes vs Logistic Regression

In [5]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
nb = GaussianNB()

train_sizes = range(10, len(X_train), 25)
