# Decoding the Comments: Sentiment, Topics, and Predictive Insights

The comments section is often referred to as the "pulse" of online communities, offering an unfiltered view of audience sentiment, engagement, and reactions. In this final notebook of the project, we shift our focus to the comments dataset to uncover deeper insights about the audience and their interactions with video content.

Through a combination of **data analysis** and **machine learning techniques**, this notebook will explore the following key areas:
- **Sentiment Analysis**: Understanding the emotional tone of user comments and its correlation with video engagement.
- **Topic Modeling**: Uncovering recurring themes and patterns in audience discussions.
- **Predictive Modeling**: Using comments to predict engagement metrics and identify key drivers of interaction.
- **Advanced NLP**: Applying modern Natural Language Processing (NLP) techniques to classify sentiment, detect emotions, and extract nuanced insights.

This notebook not only aims to generate actionable insights but also demonstrates the application of machine learning and NLP tools to real-world data. By the end, we hope to bridge the gap between content creators and their audience, providing a clearer understanding of how comments reflect and influence engagement.

As the final notebook in this series, this analysis will tie together the insights from previous notebooks, offering a holistic view of how video content and audience interaction shape overall engagement trends.

In [2]:
!pip install isodate nltk wordcloud vaderSentiment scikit-learn pandas numpy matplotlib seaborn gensim xgboost spacy transformers textblob

Collecting transformers
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers)
  Downloading huggingface_hub-0.30.2-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.1-cp39-abi3-macosx_11_0_arm64.whl.metadata (6.8 kB)
Downloading transformers-4.51.3-py3-none-any.whl (10.4 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m46.1 MB/s[0m eta [36m0:00:00[0m31m43.4 MB/s[0m eta [36m0:00:01[0m
[?25hDownloading huggingface_hub-0.30.2-py3-none-any.whl (481 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m481.4/481.4 kB[0m [31m27.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tokenizers-0.21.1-cp39-abi3-macosx_11_0_arm64.whl (2.7 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m31m14.

In [3]:
# Data Manipulation
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil import parser

# Text Processing and NLP
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob
from wordcloud import WordCloud

# Machine Learning
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import xgboost as xgb

# Topic Modeling and Advanced NLP (Optional)
from gensim import corpora
from gensim.models import LdaModel
import spacy
from transformers import pipeline
import pyLDAvis.gensim_models as gensimvis

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid", color_codes=True)

# Utilities
from collections import Counter
from tqdm import tqdm

# Download NLTK data
nltk.download('stopwords')
nltk.download('punkt')



RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
module 'torch' has no attribute 'compiler'

In [4]:
video_df = pd.read_csv("dataFolder/processed/cleanedDataFrame.csv")

In [None]:
comments_df = pd.read_csv("dataFolder/processed/cleanedComments.csv")