### Importacion de Biblioteca

Aquí importaremos todas las bibliotecas necesarias para el análisis de sentimientos y la clasificación de comentarios.

In [55]:
import praw
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.sentiment import SentimentIntensityAnalyzer
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report


### Descarga de recursos de NLTK

 Descargaremos los recursos necesarios de NLTK, como stopwords y el lexicon para análisis de sentimientos.

In [56]:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('vader_lexicon')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\juand\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\juand\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\juand\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\juand\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

### Inicialización de Reddit y Obtención de comentarios

Aquí inicializaremos la API de Reddit utilizando la biblioteca PRAW. Ademas, recopilaremos comentarios de Reddit de un subreddit sobre los seguros de Salud en USA con un liminte de 20 posts y los almacenaremos en una lista.


In [57]:
reddit = praw.Reddit(
    client_id='vkbyNxlrdz1Wy9DNxfxnzA',
    client_secret='sC4RfUNF95ThEF0BtuTyfD1sV6DyWA',
    user_agent='my-app by u/Witty-Cause-3665'
)

subreddit = reddit.subreddit('Health')
posts = subreddit.search(" Healthcare USA", limit=20
                        )
comments_data = []

for post in posts:
    post.comments.replace_more(limit=None)
    for comment in post.comments.list():
        comments_data.append(comment.body)

### Preprocesamiento de texto y Análisis de sentimientos
Limpiaremos y procesaremos los comentarios antes de realizar el análisis de sentimientos. Por ultimo, Usaremos el SentimentIntensityAnalyzer de NLTK para realizar el análisis de sentimientos.

In [58]:
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    tokens = word_tokenize(text)
    tokens = [word for word in tokens if word not in stop_words]
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

sia = SentimentIntensityAnalyzer()

def analyze_sentiment(text):
    sentiment_score = sia.polarity_scores(text)
    return sentiment_score['compound']



### Análisis de sentimientos


Utilizaremos el SentimentIntensityAnalyzer de NLTK para llevar a cabo el análisis de sentimientos. Posteriormente, aplicaremos el preprocesamiento y el análisis de sentimientos a los comentarios recopilados. Después, procederemos a crear un DataFrame de pandas para almacenar los comentarios procesados y sus etiquetas de sentimiento. Seguidamente, dividiremos los datos en conjuntos de entrenamiento y prueba. Luego, emplearemos TF-IDF para convertir los comentarios en características numéricas. Acto seguido, entrenaremos un modelo SVM para la clasificación de sentimientos. Una vez completado el entrenamiento, realizaremos predicciones en el conjunto de prueba y evaluaremos el rendimiento del modelo. Finalmente, crearemos un DataFrame para mostrar los comentarios, tanto el sentimiento real como el sentimiento predicho.

In [59]:
sia = SentimentIntensityAnalyzer()

def analyze_sentiment(text):
    sentiment_score = sia.polarity_scores(text)
    return sentiment_score['compound']
comments_processed = [preprocess_text(comment) for comment in comments_data]
sentiments = [analyze_sentiment(comment) for comment in comments_processed]
sentiment_labels = ['Positive' if score > 0.05 else 'Negative' if score < -0.05 else 'Neutral' for score in sentiments]

data = pd.DataFrame({'Comment Body': comments_data, 'Sentiment': sentiment_labels})

X_train, X_test, y_train, y_test = train_test_split(data['Comment Body'], data['Sentiment'], test_size=0.2, random_state=42)

tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

svm_model = SVC(kernel='linear')
svm_model.fit(X_train_tfidf, y_train)

y_pred = svm_model.predict(X_test_tfidf)

results_df_HealthCare = pd.DataFrame({'Comment Body': X_test, 'Actual Sentiment': y_test, 'Predicted Sentiment': y_pred})

results_df_HealthCare


Unnamed: 0,Comment Body,Actual Sentiment,Predicted Sentiment
327,"In fairness, several of the countries that spe...",Positive,Positive
33,Do these new case reports ever take into accou...,Neutral,Positive
15,It’s ironic. Independence Day 2020 could turn ...,Positive,Positive
314,"So, ""not for profit"" doesn't mean no one is ma...",Positive,Positive
57,What are you implying,Neutral,Neutral
...,...,...,...
94,Then you don’t have any issue with Trump havin...,Positive,Negative
195,So this is going to be another one of his futu...,Positive,Positive
311,"If it wasn't for profit, why would anyone inve...",Negative,Negative
292,Perhaps a trip overseas ?,Neutral,Positive


In [61]:
results_df_HealthCare.to_csv('results_df_HealthCare.csv', index=False)