# Sentiment Analysis of Twitter Data for Brand Monitoring




### **Introduction**
This notebook demonstrates the implementation of sentiment analysis on Twitter data for brand monitoring. 
It integrates data preprocessing, feature engineering, machine learning models, and visualization 
techniques to analyze sentiment trends and evaluate brand perception.



### **Data Preprocessing**
- **Text Cleaning**: Remove noise such as URLs, mentions, hashtags, and special characters.
- **Tokenization**: Split tweets into individual tokens.
- **Stop Word Removal**: Remove common words that do not contribute to sentiment (e.g., "and", "is").
- **Lemmatization**: Reduce words to their base or root form.

#### Code Implementation


In [None]:

import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download necessary NLTK data
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

# Example dataset
data = {
    'tweet': [
        'I love the new iPhone! #Apple',
        'Terrible customer service from Amazon. @Amazon',
        'Netflix has great shows, but the app crashes too often.'
    ],
    'sentiment': ['positive', 'negative', 'neutral']
}
df = pd.DataFrame(data)

# Preprocessing function
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_tweet(tweet):
    # Remove URLs, mentions, hashtags, and special characters
    tweet = re.sub(r"http\S+|@\S+|#\S+|[^A-Za-z0-9\s]", "", tweet)
    tweet = tweet.lower()
    tokens = word_tokenize(tweet)
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]
    return " ".join(tokens)

df['cleaned_tweet'] = df['tweet'].apply(preprocess_tweet)
df



### **Feature Engineering**
- Extract features such as tweet length, hashtag count, and mentions count.
- Perform Named Entity Recognition (NER) to associate entities with sentiment.


In [None]:

from sklearn.feature_extraction.text import TfidfVectorizer

# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X = tfidf_vectorizer.fit_transform(df['cleaned_tweet'])
y = df['sentiment']

# Named Entity Recognition Example
import spacy
nlp = spacy.load("en_core_web_sm")

def extract_entities(tweet):
    doc = nlp(tweet)
    return [(ent.text, ent.label_) for ent in doc.ents]

df['entities'] = df['tweet'].apply(extract_entities)
df



### **Model Training**
- Implement Logistic Regression, SVM, Decision Tree, and Random Forest.
- Evaluate models using accuracy, precision, recall, and F1-score.


In [None]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression
lr = LogisticRegression(max_iter=100)
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)

# Support Vector Machine
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)
svm_pred = svm.predict(X_test)

# Decision Tree
dt = DecisionTreeClassifier(max_depth=10)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)

# Random Forest
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

# Model Evaluation
print("Logistic Regression:
", classification_report(y_test, lr_pred))
print("SVM:
", classification_report(y_test, svm_pred))
print("Decision Tree:
", classification_report(y_test, dt_pred))
print("Random Forest:
", classification_report(y_test, rf_pred))



### **Visualizations**
- **Word Cloud**: Highlight common words in positive and negative tweets.
- **Feature Importance**: Show most influential features for sentiment classification.


In [None]:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Generate word clouds
positive_words = " ".join(df[df['sentiment'] == 'positive']['cleaned_tweet'])
negative_words = " ".join(df[df['sentiment'] == 'negative']['cleaned_tweet'])

# Positive Word Cloud
plt.figure(figsize=(8, 5))
plt.imshow(WordCloud(background_color='white').generate(positive_words))
plt.axis('off')
plt.title("Positive Sentiments Word Cloud")
plt.show()

# Negative Word Cloud
plt.figure(figsize=(8, 5))
plt.imshow(WordCloud(background_color='white').generate(negative_words))
plt.axis('off')
plt.title("Negative Sentiments Word Cloud")
plt.show()
