# Sentiment Analysis

Sentiment analysis helps us understand the emotional tone of statements and explore how sentiment relates to other features like text length, word count, and keyword usage. We used VADER, because it is optimized for analyzing sentiment in short, informal texts, making it well-suited for conversational language and social media-style statements, to assign a sentiment score and classify statements as Positive, Neutral, or Negative.

Key Objectives:

Identify Emotional Tone – Categorize statements to see if negative sentiment is more common.

Analyze Trends – Understand how sentiment varies across the dataset.

Correlate Sentiment with Other Features – Check if longer texts or certain words are linked to sentiment shifts.

Prepare Data for Modeling – Use sentiment categories as features for predictive models.

Detect Emotional Patterns – Identify emotionally charged language that may indicate mental health concerns.

This analysis provides insights into sentiment distribution and helps prepare data for further exploration and modeling.

In [3]:
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

# Ensure VADER is available
nltk.download('vader_lexicon')

# Initialize Sentiment Intensity Analyzer
sia = SentimentIntensityAnalyzer()

# Load the cleaned dataset
file_path = "cleaned_data.csv"  # Update with actual path
df = pd.read_csv(file_path)

# Ensure all statements are treated as strings
df["statement"] = df["statement"].astype(str)

# Apply VADER sentiment analysis safely
df["sentiment_score"] = df["statement"].apply(lambda x: sia.polarity_scores(x)["compound"] if isinstance(x, str) else 0)

# Categorize sentiment based on score
def categorize_sentiment(score):
    if score >= 0.05:
        return "Positive"
    elif score <= -0.05:
        return "Negative"
    else:
        return "Neutral"

df["sentiment_category"] = df["sentiment_score"].apply(categorize_sentiment)

# Save the dataset with sentiment scores and categories
sentiment_file_path = "sentiment_data.csv"
df.to_csv(sentiment_file_path, index=False)

# Print confirmation message
print(f"Sentiment analysis complete! New dataset saved as '{sentiment_file_path}'.")

# Display first few rows
df.head()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Sentiment analysis complete! New dataset saved as 'sentiment_data.csv'.


Unnamed: 0,statement,status,sentiment_score,sentiment_category
0,oh my gosh,Anxiety,0.0,Neutral
1,trouble sleeping confused mind restless heart ...,Anxiety,-0.7269,Negative
2,all wrong back off dear forward doubt stay in ...,Anxiety,-0.7351,Negative
3,ive shifted my focus to something else but im ...,Anxiety,-0.4215,Negative
4,im restless and restless its been a month now ...,Anxiety,-0.4939,Negative
