# Twitter Sentiment Analysis - EDA
This notebook performs univariate and multivariate analysis on the Twitter sentiment dataset.

## Univariate Analysis
### Sentiment Distribution

In [None]:

import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='sentiment', data=df)
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Tweet Count')
plt.show()


### Tweet Length Distribution

In [None]:

df['tweet_length'] = df['clean_tweet'].apply(len)

plt.figure(figsize=(10, 5))
sns.histplot(df['tweet_length'], kde=True, bins=40)
plt.title('Distribution of Tweet Lengths')
plt.xlabel('Tweet Length')
plt.ylabel('Frequency')
plt.show()

sns.boxplot(x='sentiment', y='tweet_length', data=df)
plt.title('Tweet Length by Sentiment')
plt.show()


## Bivariate/Multivariate Analysis
### Word Count vs Sentiment

In [None]:

df['word_count'] = df['clean_tweet'].apply(lambda x: len(x.split()))

sns.boxplot(x='sentiment', y='word_count', data=df)
plt.title("Word Count Distribution by Sentiment")
plt.show()


### Correlation Matrix of Numeric Features

In [None]:

import numpy as np

corr_matrix = df[['tweet_length', 'word_count']].corr()

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Numeric Features')
plt.show()


## Insights Summary

- **Sentiment Distribution**: Distribution of sentiment classes helps understand class imbalance.
- **Tweet Length & Word Count**: These features vary with sentiment and may impact classification.
- **Correlations**: Tweet length and word count are strongly correlated as expected.
- **Feature Influence**: Textual features (via TF-IDF) are primary drivers of sentiment classification, with tweet length and word count potentially serving as supporting features.
