# Naives Bayes Scikit Learn

Scikit-learn ofrece varios clasificadores Naive Bayes, cada uno adecuado para diferentes tipos de datos:

### 1. **GaussianNB**
- **Para datos continuos** (distribución normal/gaussiana).
- **Ejemplo**:
  ```python
  from sklearn.naive_bayes import GaussianNB
  modelo = GaussianNB()
  ```
### 2. MultinomialNB
- **Para datos discretos** (conteos como frecuencia de palabras).
Uso típico: Clasificación de texto.
**Ejemplo**:
```python
from sklearn.naive_bayes import MultinomialNB
modelo = MultinomialNB()
```

### 3. BernoulliNB
- **Para características binarias** (ej: 1/0, verdadero/falso).
**Ejemplo**:
```python
from sklearn.naive_bayes import BernoulliNB
modelo = BernoulliNB()
```

In [None]:
### 2. MultinomialNB

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score

# 1. Cargar dataset de noticias (elegimos solo 4 categorías)
categories = ['rec.sport.hockey', 'sci.space', 'talk.politics.misc', 'comp.graphics']
news = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

# 2. Vectorizar texto
# We need to convert the text data into a format suitable for machine learning
# This converts the text into a matrix of token counts
# example: "The cat sat on the mat" -> {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1} so value of the document is [2, 1, 1, 1, 1]
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(news.data)
y = news.target

# 3. Separar datos
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 4. Entrenar modelo
modelo = MultinomialNB()
modelo.fit(X_train, y_train)

# 5. Predecir y evaluar
y_pred = modelo.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReporte de clasificación:\n", classification_report(y_test, y_pred, target_names=news.target_names))

# 6. Prueba con nuevos textos
ejemplos = [
    "NASA launched a new satellite into space today.",
    "The graphics card performance has increased drastically.",
    "The hockey game last night was thrilling!",
    "The recent political debate sparked controversy."
]
X_nuevos = vectorizer.transform(ejemplos)
pred_nuevos = modelo.predict(X_nuevos)

print("\nPredicciones:")
for texto, pred in zip(ejemplos, pred_nuevos):
    print(f"'{texto}' --> {news.target_names[pred]}")

Accuracy: 0.8884924174843889

Reporte de clasificación:
                     precision    recall  f1-score   support

     comp.graphics       0.92      0.91      0.91       303
  rec.sport.hockey       0.89      0.93      0.91       270
         sci.space       0.94      0.78      0.85       308
talk.politics.misc       0.80      0.96      0.87       240

          accuracy                           0.89      1121
         macro avg       0.89      0.89      0.89      1121
      weighted avg       0.89      0.89      0.89      1121


Predicciones:
'NASA launched a new satellite into space today.' --> sci.space
'The graphics card performance has increased drastically.' --> comp.graphics
'The hockey game last night was thrilling!' --> rec.sport.hockey
'The recent political debate sparked controversy.' --> talk.politics.misc


In [None]:
### 1. **GaussianNB**

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# 1. Cargar el dataset
iris = load_iris()
X, y = iris.data, iris.target
feature_names = iris.feature_names
target_names = iris.target_names

# 2. Separar entrenamiento/prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Entrenar modelo GaussianNB
modelo = GaussianNB()
modelo.fit(X_train, y_train)

# 4. Predecir y evaluar
y_pred = modelo.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReporte de clasificación:\n", classification_report(y_test, y_pred, target_names=target_names))

# 5. Prueba con una flor nueva
nueva_flor = [[5.1, 3.5, 1.4, 0.2]]  # sepal/petal length/width
pred = modelo.predict(nueva_flor)
print(f"\nFlor predicha: {target_names[pred[0]]}")

Accuracy: 0.9777777777777777

Reporte de clasificación:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.92      0.96        13
   virginica       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45


Flor predicha: setosa


In [None]:
### 3. BernoulliNB

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Datos: correos de ejemplo
correos = [
    "Gana dinero ahora",
    "Reunión a las 10",
    "Compra ya, oferta limitada",
    "Llamada importante del jefe",
    "Premios gratis por registrarte",
    "Te envío el informe",
    "Haz clic para obtener dinero",
    "¿Puedes ayudarme con este reporte?",
    "Gana dinero fácil y rápido",
    "Nos vemos en la reunión"
]

# Etiquetas: 1 = spam, 0 = no spam
etiquetas = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

# Vectorizar (presencia/ausencia de palabras)
vectorizer = CountVectorizer(binary=True)  # clave: binary=True
X = vectorizer.fit_transform(correos)

# Separar datos
X_train, X_test, y_train, y_test = train_test_split(X, etiquetas, test_size=0.3, random_state=42)

# Entrenar modelo
modelo = BernoulliNB()
modelo.fit(X_train, y_train)

# Predecir y evaluar
y_pred = modelo.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReporte de clasificación:\n", classification_report(y_test, y_pred))

# Probar con nuevos correos
nuevos = [
    "Haz clic aquí para ganar premios",
    "Revisa el informe del proyecto"
]
X_nuevos = vectorizer.transform(nuevos)
pred = modelo.predict(X_nuevos)

print("\nPredicciones:")
for texto, etiqueta in zip(nuevos, pred):
    print(f"'{texto}' --> {'SPAM' if etiqueta == 1 else 'NO SPAM'}")