##Actividad 3 Modelo Supervisado

Objetivo: Buscamos determinar qué modelo de aprendizaje automático es el más apropiado para predecir el sentimiento (output) basado en una crítica de película (input).

Input(x) -> movie review
Output(y) -> sentiment

In [166]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, classification_report, confusion_matrix



Leer Dataset

In [167]:
df_review = pd.read_excel('Data_Base.xlsx')

df_review

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49994,I thought this movie did a down right good job...,positive
49995,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49996,I am a Catholic taught in parochial elementary...,negative
49997,I'm going to have to disagree with the previou...,negative


In [168]:
# Tomar 2000 ejemplos positivos y 2000 ejemplos negativos
df_positive = df_review[df_review['sentiment']=='positive'][:2000]
df_negative = df_review[df_review['sentiment']=='negative'][:2000]

# Combinar los dos conjuntos de datos
df_review_imb = pd.concat([df_positive, df_negative])

# Transformar texto a vectores de características
tfidf = TfidfVectorizer(stop_words='english')
review_vectors = tfidf.fit_transform(df_review_imb['review'])

# Particionar los datos en entrenamiento y prueba
train_x, test_x, train_y, test_y = train_test_split(review_vectors, df_review_imb['sentiment'], train_size=0.7, random_state=42)

In [169]:
# Crear instancias de los modelos
svc = SVC(kernel='linear')
dec_tree = DecisionTreeClassifier()
gnb = GaussianNB()
log_reg = LogisticRegression()

# Entrenar los modelos
svc.fit(train_x, train_y)
dec_tree.fit(train_x, train_y)
gnb.fit(train_x.toarray(), train_y)
log_reg.fit(train_x, train_y)


In [170]:
# Evaluar los modelos
def evaluar_modelo(modelo, nombre):
    # Predicción en el conjunto de prueba
    predicciones = modelo.predict(test_x.toarray())

    # Calcular la precisión
    precision = accuracy_score(test_y, predicciones)

    # Calcular la puntuación F1
    f1 = f1_score(test_y, predicciones, pos_label='positive')

    # Imprimir los resultados
    print(f"Modelo: {nombre}")
    print(f"Precisión: {precision}")
    print(f"Puntuación F1: {f1}")
    print(classification_report(test_y, predicciones, labels=['positive', 'negative']))
    print(confusion_matrix(test_y, predicciones, labels=['positive', 'negative']))
    print()



In [171]:
# Evaluar los modelos
evaluar_modelo(svc, "Support Vector Machines (SVM)")
evaluar_modelo(dec_tree, "Decision Tree")
evaluar_modelo(gnb, "Naive Bayes")
evaluar_modelo(log_reg, "Logistic Regression")

Modelo: Support Vector Machines (SVM)
Precisión: 0.855
Puntuación F1: 0.8608
              precision    recall  f1-score   support

    positive       0.86      0.86      0.86       622
    negative       0.85      0.84      0.85       578

    accuracy                           0.85      1200
   macro avg       0.85      0.85      0.85      1200
weighted avg       0.85      0.85      0.85      1200

[[538  84]
 [ 90 488]]

Modelo: Decision Tree
Precisión: 0.6975
Puntuación F1: 0.7046379170056957
              precision    recall  f1-score   support

    positive       0.71      0.70      0.70       622
    negative       0.68      0.70      0.69       578

    accuracy                           0.70      1200
   macro avg       0.70      0.70      0.70      1200
weighted avg       0.70      0.70      0.70      1200

[[433 189]
 [174 404]]

Modelo: Naive Bayes
Precisión: 0.6025
Puntuación F1: 0.6180944755804644
              precision    recall  f1-score   support

    positive       0