## Análisis de sentimientos
Los modelos Naive Bayes son muy útiles cuando queremos analizar sentimientos, clasificar textos en tópicos o recomendaciones, ya que las características de estos desafíos cumplen muy bien con los supuestos teóricos y metodológicos del modelo.

## Contenido del proyecto
* [0. Importar librerias](#c0)
* [1. Cargar el conjunto de datos](#c1)
* [2. Limpieza de datos](#c2)
    * [2.1 Identificar las variables](#s21)
    * [2.2 Eliminar información irrelevante](#s22)
* [3. Naive Bayes](#c3)
    * [3.1 Gaussian](#s31)
    * [3.2 Multinomia](#s32)
    * [3.3 Bernoulli](#s33)
* [4. Random Forest](#c4)

### 0. Importar librerias

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
import pandas as pd

### 1. Cargar el conjunto de datos

In [2]:
total_data = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/naive-bayes-project-tutorial/main/playstore_reviews.csv")
total_data.head()

Unnamed: 0,package_name,review,polarity
0,com.facebook.katana,privacy at least put some option appear offli...,0
1,com.facebook.katana,"messenger issues ever since the last update, ...",0
2,com.facebook.katana,profile any time my wife or anybody has more ...,0
3,com.facebook.katana,the new features suck for those of us who don...,0
4,com.facebook.katana,forced reload on uploading pic on replying co...,0


### 2. Limpieza de datos

#### 2.1 Identificar variables

- ***package_name***. Nombre de la aplicación móvil (categórico)
- ***review***. Comentario sobre la aplicación móvil (categórico)
- ***polarity***. Variable de clase (0 o 1), siendo 0 un comentario negativo y 1, positivo (categórico numérico)

In [4]:
total_data.shape

(891, 3)

Observamos 3 columnas y 891 líneas.

#### 2.2 Eliminar información irrelevante

A continuación vamos a eliminar la columna "package_name", ya que el nombre de la aplicación no es relevante porque lo que vamos a clasificar son las reseñas sin importar de qué aplicación son.

In [5]:
total_data.drop(["package_name"], axis = 1, inplace = True)
total_data.head()

Unnamed: 0,review,polarity
0,privacy at least put some option appear offli...,0
1,"messenger issues ever since the last update, ...",0
2,profile any time my wife or anybody has more ...,0
3,the new features suck for those of us who don...,0
4,forced reload on uploading pic on replying co...,0


In [20]:
from sklearn.model_selection import train_test_split

X = total_data["review"]
y = total_data["polarity"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)


X_train.head()

331      just did the latest update on viber and yet ...
733     keeps crashing it only works well in extreme ...
382     the fail boat has arrived the 6.0 version is ...
704     superfast, just as i remember it ! opera mini...
813     installed and immediately deleted this crap i...
Name: review, dtype: object

Dividimos el Dataset en su train y su test.

In [23]:
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

vec_model = CountVectorizer(stop_words = "english")

X_train_v = vec_model.fit_transform(X_train).toarray()
X_test_v = vec_model.transform(X_test).toarray()

### 3. Naive Bayes

#### 3.1 Gaussian

In [24]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from pickle import dump

accuracy_results = []

model = GaussianNB()

model.fit(X_train_v, y_train)
y_pred = model.predict(X_train_v)

accuracy = accuracy_score(y_train, y_pred)
accuracy_results.append(accuracy_score(y_train, y_pred))

print("Accuracy Gaussian:", accuracy_results)

Accuracy Gaussian: [0.9859550561797753]


In [25]:
model = GaussianNB()

model.fit(X_train_v, y_train)
y_pred_test = model.predict(X_test_v)

accuracy = accuracy_score(y_test, y_pred_test)
print("Accuracy en test con Gaussian:", accuracy)

Accuracy en test con Gaussian: 0.8044692737430168


In [26]:
from pickle import dump

dump(model, open("/workspaces/ivandla96-Proyecto-Naive-Bayes/models/naive-bayes-gaussian.sav", "wb"))

Guardamos el modelo para una posible situación en el futuro de utilizarlo.

#### 3.2 Multinomial

In [32]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from pickle import dump

accuracy_results = []

model = MultinomialNB(
    alpha=1.0,
    fit_prior=True,
)

model.fit(X_train_v, y_train)
y_pred = model.predict(X_train_v)

accuracy = accuracy_score(y_train, y_pred)
accuracy_results.append(accuracy_score(y_train, y_pred))

print("Accuracy Multinomial:", accuracy_results)

Accuracy Multinomial: [0.9606741573033708]


In [31]:
model = MultinomialNB(
    alpha=1.0,
    fit_prior=True,
)
model.fit(X_train_v, y_train)
y_pred_test = model.predict(X_test_v)

accuracy = accuracy_score(y_test, y_pred_test)
print("Accuracy en test con Multinomial:", accuracy)

Accuracy en test con Multinomial: 0.8156424581005587


El accuracy nos da un poco mejor que con el Gaussian.

In [35]:
from pickle import dump

dump(model, open("/workspaces/ivandla96-Proyecto-Naive-Bayes/models/naive-bayes-Multinomial.sav", "wb"))

#### 3.3 Bernouilli

In [36]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score
from pickle import dump

accuracy_results = []

model = BernoulliNB(
    alpha=1.0,
    binarize=0.0,
)

model.fit(X_train_v, y_train)
y_pred = model.predict(X_train_v)

accuracy = accuracy_score(y_train, y_pred)
accuracy_results.append(accuracy_score(y_train, y_pred))

print("Accuracy en Bernouilli NB:", accuracy_results)

Accuracy en Bernouilli NB: [0.9199438202247191]


In [37]:
model = BernoulliNB(
    alpha=1.0,
    binarize=0.0,
)
model.fit(X_train_v, y_train)
y_pred_test = model.predict(X_test_v)

accuracy = accuracy_score(y_test, y_pred_test)
print("Accuracy en test con Bernouilli NB:", accuracy)

Accuracy en test con Bernouilli NB: 0.770949720670391


Descartamos este modelo ya que nos da el peor accuracy de los 3 realizados.

In [38]:
from pickle import dump

dump(model, open("/workspaces/ivandla96-Proyecto-Naive-Bayes/models/naive-bayes-Bernouilli.sav", "wb"))

### 4. Random Forest

Como nos piden en el proyecto, vamos a intentar optimizar los resultados con un Random Forest

In [39]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

accuracy_results = []

for dataset in [
    X_train_v

]:
    model = RandomForestClassifier(
    n_estimators=10, 
    max_depth=10, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    max_features=None, 
    bootstrap=True, 
    n_jobs=8, 
    random_state=42,
)
    model.fit(dataset, y_train)
    y_pred = model.predict(dataset)
    accuracy = accuracy_score(y_train, y_pred)
    accuracy_results.append(accuracy_score(y_train, y_pred))

print("Accuracy:", accuracy_results)
print("Mejor Accuracy:", max(accuracy_results))

Accuracy: [0.8623595505617978]
Mejor Accuracy: 0.8623595505617978
