# Data Exploration for Sentiment Analysis

This notebook is used for exploratory data analysis (EDA) on the social media data collected for sentiment analysis. The goal is to visualize data distributions, understand sentiment trends, and prepare for further analysis and modeling.

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

# Load the dataset
data_path = '../data/processed/your_processed_data.csv'  # Update with your processed data path
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

In [None]:
# Visualize the distribution of sentiments
plt.figure(figsize=(10, 6))
sns.countplot(x='sentiment', data=data)
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()

In [None]:
# Analyze sentiment over time
data['date'] = pd.to_datetime(data['date'])  # Ensure date is in datetime format
sentiment_over_time = data.groupby(data['date'].dt.date)['sentiment'].value_counts().unstack().fillna(0)

# Plot sentiment over time
plt.figure(figsize=(14, 7))
sentiment_over_time.plot(kind='line')
plt.title('Sentiment Over Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend(title='Sentiment')
plt.show()

## Conclusion

In this notebook, we performed exploratory data analysis on the sentiment data collected from social media. We visualized the distribution of sentiments and analyzed how sentiments change over time. This analysis will help in understanding trends and making informed business decisions.