# 4. Naive Bayes: Un Ejemplo

Haremos un ejemplo para ilustrar el clasificador Naive Bayes.

En este ejemplo, clasificaremos textos según hablen de China ('zh') o Japón ('ja').

In [None]:
import numpy as np

## Datos de Entrenamiento

Supongamos que tenemos los siguientes datos de entrenamiento:

In [None]:
training = [
    ('chinese beijing chinese', 'zh'),
    ('chinese chinese shangai', 'zh'),
    ('chinese macao', 'zh'),
    ('tokyo japan chinese', 'ja'),
]

In [None]:
X_train = [doc for doc, _ in training]
y_train = [cls for _, cls in training]

In [None]:
X_train

In [None]:
classes = ['zh', 'ja']

In [None]:
features = ['chinese', 'beijing', 'shangai', 'macao', 'tokyo', 'japan']

## Clasificador Naive Bayes

### Distribución a Priori ("prior")

Calculemos la distribución a priori (probabilidad de cada clase) usando máxima verosimilitud:

$$P(Y = y) = \frac{Count(Y = y)}{\sum_{y'} Count(Y = y')}$$

In [None]:
from collections import Counter

class_count = Counter(y_train)
class_count

In [None]:
prior_prob = {}
for c in classes:
    prior_prob[c] = class_count[c] / len(y_train)
    
    print(f'P({c}) = {prior_prob[c]:0.2f}')

In [None]:
prior_prob

### Distribuciones Condicionales

Calculemos las distribuciones condicionales, esto es, la probabilidad de cada feature para cada clase.

Usaremos máxima verosimilitud y suavizado "add-one":

$$P(X_i = x|Y = y) = \frac{Count(X_i = x, Y = y) + 1}{\sum_{x'} Count(X_i = x', Y = y)+ |V|}$$

Primero calculamos los conteos:

In [None]:
feature_count = {}

for doc, cls in training:
    tokens = doc.split()  # lista de palabras
    for feature in tokens:
        if (feature, cls) not in feature_count:
            feature_count[feature, cls] = 0
        feature_count[feature, cls] = feature_count[feature, cls] + 1

O más cortito con `defaultdict`:

In [None]:
from collections import defaultdict
feature_count = defaultdict(int)

for doc, cls in training:
    tokens = doc.split()  # lista de palabras
    for feature in tokens:
        feature_count[feature, cls] += 1

In [None]:
dict(feature_count)

Ahora calculamos las distribuciones:

In [None]:
V = len(features)

cond_prob = {}
for c in classes:
    cond_prob[c] = {}
    
    count_sum = sum(feature_count[f, c] for f in features)
    denom = count_sum + V

    for f in features:
        num = feature_count[f, c] + 1
        cond_prob[c][f] = num / denom

        print(f'P({f}|{c}) = {num} / {denom} ~ {cond_prob[c][f]:0.2f}')

### Predicción

Dado un documento, calculemos su clasificación. Para ello, calcularemos la probabilidad de cada clase, o mejor dicho algo propocional a esos valores (nos ahorramos el denominador $P(X=x)$).

$$P(Y=y|X=x) \propto P(Y=y) \prod_{i} P(X_i = x_i|Y=y)$$

In [None]:
doc = 'chinese chinese chinese tokyo japan'.split()

In [None]:
zh_prob = prior_prob['zh']
for w in doc:
    zh_prob = zh_prob * cond_prob['zh'][w]

print(f'P(zh|doc) ~ {zh_prob:0.4f}')

In [None]:
ja_prob = prior_prob['ja']
for w in doc:
    ja_prob = ja_prob * cond_prob['ja'][w]

print(f'P(ja|doc) ~ {ja_prob:0.4f}')

**¿Cuál es la clasificación?**

Valores probabilísticos:

In [None]:
zh_prob / (zh_prob + ja_prob), ja_prob / (zh_prob + ja_prob)

## Naive Bayes con Scikit-learn

Veamos cómo podemos clasificar documentos en **scikit-learn** usando Naive Bayes.

### Bolsas de Palabras (Bag of Words)

Representaremos a los documentos de manera vectorial usando bolsas de palabras:

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

Entrenamos (sin etiquetas) para que el vectorizador asigne una columna a cada feature posible:

In [None]:
vect.fit(X_train)

In [None]:
vect.get_feature_names_out()

Veamos cómo se vectorizan los datos de entrenamiento:

In [None]:
X2 = vect.transform(X_train)

In [None]:
X2  # shape?

In [None]:
X2.todense()

Internamente, el vectorizador guarda el mapeo de features a columnas:

In [None]:
vect.vocabulary_

Ahora vectorizamos un nuevo documento:

In [None]:
doc = 'chinese chinese chinese tokyo japan'

In [None]:
X_test = vect.transform([doc])

In [None]:
X_test.todense()

In [None]:
# qué pasa si vectorizo esto?
doc = 'buenos aires'
X_test = vect.transform([doc])
X_test.todense()

### Multinomial Naive Bayes

Instanciamos y entrenamos [Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes):

In [None]:
from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb.fit(X2, y_train)

Ahora predecimos:

In [None]:
mnb.predict(X_test)

También podemos obtener las probabilidades:

In [None]:
mnb.predict_proba(X_test)

### Parámetros Internos

Veamos cómo es internamente el modelo Naive Bayes en scikit-learn.

In [None]:
mnb.classes_

In [None]:
mnb.class_count_

In [None]:
mnb.feature_count_

In [None]:
np.exp(mnb.class_log_prior_)

In [None]:
np.exp(mnb.feature_log_prob_)

## Ejercicios

1. Aplicar Naive Bayes al problema de reconocimiento de dígitos manuscritos.

## Referencias

- [Naive Bayes classifier (Wikipedia)](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

Python:
- [defaultdict](https://docs.python.org/2/library/collections.html#collections.defaultdict)

Scikit-learn:
- [Working With Text Data](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
- [CountVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html)
- [Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html#naive-bayes)