TASK **4**

Analyze and visualize sentiment patterns in social media data to understand public opinion and attitudes towards specific topics or brands.

In [None]:
# ================================================
# 📦 1. Import Required Libraries
# ================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import re
from collections import Counter
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

In [None]:
# ================================================
# 📥 2. Load and Inspect Dataset
# ================================================
df = pd.read_csv("/content/twitter_training.csv", header=None)
df.columns = ['Tweet_ID', 'Topic', 'Sentiment', 'Tweet']
print("✅ Dataset Loaded")
print(df.head())


In [None]:
# ================================================
# 🧹 3. Data Cleaning & Preprocessing
# ================================================
# Basic cleaning: remove URLs, mentions, special chars
def clean_text(text):
    text = re.sub(r'@[A-Za-z0-9_]+', '', text)  # Remove mentions
    text = re.sub(r'https?://\S+|www\.\S+', '', text)  # Remove URLs
    text = re.sub(r'#', '', text)  # Remove hashtag symbol
    text = re.sub(r'[^a-zA-Z\s]', '', text)  # Remove special chars
    text = text.lower().strip()
    return text

df['clean_tweet'] = df['Tweet'].astype(str).apply(clean_text)

# Remove stopwords
stop_words = set(stopwords.words('english'))
df['clean_tweet'] = df['clean_tweet'].apply(lambda x: " ".join([word for word in x.split() if word not in stop_words]))

In [None]:
# ================================================
# 📊 4. Sentiment Distribution (Overall)
# ================================================
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x='Sentiment', order=df['Sentiment'].value_counts().index, palette='Set2')
plt.title('Overall Sentiment Distribution', fontweight='bold')
plt.xlabel('Sentiment')
plt.ylabel('Tweet Count')
plt.tight_layout()
plt.show()

# Pie Chart
plt.figure(figsize=(6, 6))
df['Sentiment'].value_counts().plot.pie(
    autopct='%1.1f%%',
    startangle=140,
    colors=sns.color_palette('Set2'),
    explode=(0.01, 0.01, 0.01, 0.01)
)
plt.title('Sentiment Proportion')
plt.ylabel('')
plt.tight_layout()
plt.show()


In [None]:
# ================================================
# ☁️ 5. Word Cloud per Sentiment
# ================================================
def generate_wordcloud(sentiment):
    text = " ".join(df[df['Sentiment'] == sentiment]['clean_tweet'])
    wc = WordCloud(width=800, height=400, background_color='white', colormap='Set2').generate(text)
    plt.figure(figsize=(10, 5))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.title(f"Word Cloud - {sentiment}", fontweight='bold')
    plt.tight_layout()
    plt.show()

for sentiment in df['Sentiment'].unique():
    generate_wordcloud(sentiment)

In [None]:
# ================================================
# 📈 6. Top Words by Sentiment
# ================================================
def plot_top_words(sentiment, n=15):
    words = " ".join(df[df['Sentiment'] == sentiment]['clean_tweet']).split()
    most_common = Counter(words).most_common(n)
    word_df = pd.DataFrame(most_common, columns=['Word', 'Count'])

    plt.figure(figsize=(10, 5))
    sns.barplot(data=word_df, x='Count', y='Word', palette='Set2')
    plt.title(f"Top {n} Words - {sentiment}", fontweight='bold')
    plt.tight_layout()
    plt.show()

for sentiment in df['Sentiment'].unique():
    plot_top_words(sentiment)

In [None]:
# ================================================
# 🧠 7. Topic-Level Sentiment Analysis
# ================================================
# Get top 5 topics by tweet count
top_topics = df['Topic'].value_counts().head(5).index

for topic in top_topics:
    topic_df = df[df['Topic'] == topic]
    plt.figure(figsize=(8, 4))
    sns.countplot(data=topic_df, x='Sentiment', order=['Positive', 'Neutral', 'Negative', 'Irrelevant'], palette='Set3')
    plt.title(f'Sentiment Distribution for Topic: {topic}', fontweight='bold')
    plt.ylabel('Tweet Count')
    plt.xlabel('Sentiment')
    plt.tight_layout()
    plt.show()

✅ **Conclusion**

This study successfully accomplished the objective of analyzing and visualizing sentiment patterns in social media data to gain a deeper understanding of public opinion and attitudes toward specific topics and brands. Utilizing a comprehensive Twitter dataset, the project applied a structured methodology encompassing data cleaning, sentiment categorization, and multi-level visual exploration.

The key findings and contributions of this analysis are as follows:

Effective Preprocessing of Textual Data: Social media content was systematically cleaned to remove noise such as user mentions, hyperlinks, special characters, and stopwords, resulting in a refined corpus suitable for sentiment analysis.

Exploratory Analysis of Sentiment Distribution: The overall sentiment landscape was visualized through bar and pie charts, revealing a predominance of both positive and negative sentiments, alongside a notable proportion of neutral and irrelevant expressions.

Lexical Analysis by Sentiment Category: Word cloud visualizations and frequency-based plots highlighted the most frequently used terms within each sentiment class, offering insight into common linguistic patterns and emotional expressions.

Sentiment Analysis at the Topic Level: By disaggregating sentiment by topic or brand, the analysis provided nuanced insights into public perception of specific entities, which is critical for brand monitoring and strategic communication.