# Leveraging Sentiment Analysis for Enhanced Brand Management

## Problem Statement
---
In the rapidly evolving technology market, understanding customer sentiment towards products is crucial for companies like Google and Apple. These insights can guide product development, marketing strategies, customer service, and more. However, manually analysing customer sentiment is a time-consuming and labour-intensive process. Given the vast amount of customer feedback available on platforms like Twitter, it is virtually impossible for humans to process all the data in a timely manner.
Moreover, human analysis is subject to bias and inconsistency, and the quality of analysis can vary greatly depending on the individual’s understanding and interpretation. This makes it difficult to scale and standardize the sentiment analysis process.

Therefore, there is a need for an automated, efficient, and reliable solution to analyse customer sentiment towards Google and Apple products. Machine Learning, with its ability to learn patterns from large datasets and make predictions, offers a promising solution to this problem.
By applying Machine Learning techniques for sentiment analysis, we can process vast amounts of data in a fraction of the time it would take a human. This not only saves time and resources but also provides consistent and unbiased analysis. Furthermore, Machine Learning models can continuously learn and improve over time, adapting to new trends and nuances in customer sentiment.

## Business Understanding
---
Twitter is a platform where users often share their experiences and opinions about products. Analysing these sentiments can provide valuable feedback on what users like or dislike about a product, which can guide improvements and new features. Sentiment analysis can help understand how the brand is perceived in the market. Positive sentiment is usually associated with a strong brand image, while negative sentiment can indicate potential issues that need to be addressed.  By analysing sentiment, Apple and Google can identify trends in consumer behaviour and preferences. This can inform strategic decisions, such as the timing of product releases or marketing campaigns.

Comparing sentiment towards different products can provide insights into competitive positioning. For example, if sentiment towards an Apple product is more positive than a similar Google product, it might indicate a competitive advantage for Apple. Negative tweets can be a signal of customer service issues that need to be addressed. Apple and Google can use sentiment analysis to proactively identify and resolve these issues. This data-driven approach fosters brand loyalty, increases customer satisfaction, and ultimately drives sales growth.

### Key Objectives
$i.$ Utilize Natural Language Processing techniques to construct a machine learning model for automated sentiment analysis of tweets related to Google and Apple products.<br>
$ii.$ Evaluate and select the most suitable machine learning model for sentiment analysis based on its performance metrics.<br>
$iii.$ Analyse frequency of the sentiments expressed in tweets about Google and Apple products.

## Data Understanding
---
The dataset, sourced from CrowdFlower via data.world, comprises over 9,000 tweets with sentiment ratings labeled as positive, negative, or neutral by human raters. The dataset contains three columns:

$i.$ `tweet_text`: This column contains the text of the tweet, facilitating sentiment analysis based on the content itself.

$ii.$ `emotion_in_tweet_is_directed_at`: This column indicates whether the expressed emotion pertains to a specific brand or product. It enables targeted sentiment analysis tailored to the brand's performance.

$iii.$ `is_there_an_emotion_directed_at_a_brand_or_product (target variable)`: This column serves as a quick indicator of brand-related sentiment, allowing for efficient initial filtering of relevant data.

## Exploratory Data Analysis
---

In [56]:
#importing the needed libraries
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#nlp
import nltk
from nltk.corpus import stopwords,wordnet
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import RegexpTokenizer,TweetTokenizer
from nltk.stem import WordNetLemmatizer
from string import punctuation
#modeling
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score,precision_score,f1_score,confusion_matrix
from sklearn.pipeline import Pipeline


In [4]:
#importing the data
data = pd.read_csv('dataset.csv', encoding= 'unicode_escape')

data.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [23]:
#changing the column names for easier readability
data.columns = ['tweet_text', 'product', 'sentiment']

In [24]:
#Viewing the first few rows
data.head()

Unnamed: 0,tweet_text,product,sentiment
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [25]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   tweet_text  9092 non-null   object
 1   product     3291 non-null   object
 2   sentiment   9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [26]:
data['product'].value_counts()

product
iPad                               946
Apple                              661
iPad or iPhone App                 470
Google                             430
iPhone                             297
Other Google product or service    293
Android App                         81
Android                             78
Other Apple product or service      35
Name: count, dtype: int64

In [27]:
data['sentiment'].value_counts(normalize= True)

sentiment
No emotion toward brand or product    0.592654
Positive emotion                      0.327505
Negative emotion                      0.062686
I can't tell                          0.017156
Name: proportion, dtype: float64

## Data Preprocessing And Modeling
---

In [90]:
# defining the features 
X = data[['tweet_text']]
y = data['sentiment']

#train test split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state= 42)

In [91]:
X_train.head(10)

Unnamed: 0,tweet_text
3312,Good design means getting simple. @mention tal...
1302,Wishing I was at #sxsw to see the rumored demo...
5344,RT @mention ÷¼ We have problemsÛ_TIME TO STO...
4580,@mention New iPad Apps For Speech Therapy And ...
3094,iPad 2 queue is epic. #sxsw {link}
1221,"According to Google's Marissa Mayer, future of..."
7873,#SXSW Gets Its Own Apple Store - {link}
437,Adobe has developed an engine to essentially c...
3126,Marissa Meyer showing hot location visuals at ...
4708,#SXSW keynote Marissa Mayers: 12 billion miles...


In [93]:
X_train.iloc[100]

tweet_text    Don't miss your chance to win RT @mention Goin...
Name: 7932, dtype: object

In [94]:
X_train.dropna(inplace= True)

**Cleaning the tweets**<br>
$i$ Remove "link" because it's just a place holder for an external link<br>
$ii$ Remove stopwords<br>
$iii$ Remove punctuation<br>
$iv$ all forms of "SXSW" because it's in large number of tweets therefore it has no value<br>
$v$ Remove websites and html formating

Lowercase every word in the corpus<br>
Tokenize<br>
Stematize


In [60]:
nltk.download('wordnet', quiet= True)

True

In [62]:
#downloading stopwords
nltk.download('stopwords', quiet= True)
stopword_list = stopwords.words('english')

#instantiate regextokenize
tokenizer = RegexpTokenizer(r"(?u)\w{3,}")

#add 'SXSW' to the stopwordlist
stopword_list.append('sxsw')

#add link the stopword_list
stopword_list.append('link')

#add punctuations to stopwords
stopword_list += punctuation

#instantiating lemmatizer
lemma = WordNetLemmatizer()

#Instantiating tweet tokenizer

tweet_tokenize = TweetTokenizer(strip_handles= True)

In [106]:
def clean_tweets(text):
    """
    This function takes a tweet and preprocesses it in readiness for modelling
    """
    #Use TweetTokenizer object to remove the handles from the Tweet
    no_handle = tweet_tokenize.tokenize(text)

    #Join the list of non-handle tokens back together
    tweet = " ".join(no_handle) 

    #remove http websites, hashtag sign, any words in curly brackets,
        #any words with ampersand in front, www dot com websites, links,
        #videos, and non-english characters
    clean = re.sub("(https?:\/\/\S+) \
                   |(#[A-Za-z0-9_]+) \
                   |(\{([a-zA-Z].+)\}) \
                   |(&[a-z]+;) \
                   |(www\.[a-z]?\.?(com)+|[a-z]+\.(com))\
                   |({link})\
                   |([^\x00-\x7F]+\ *(?:[^\x00-\x7F]| )*)"," ", tweet)
    
    #Turn all the tokens lowercase
    lower_tweet = clean.lower()
    #Only include words with 3 or more characters
    token_list = tokenizer.tokenize(lower_tweet)

    #Remove stopwords
    stopwords_removed=[token for token in token_list if token not in stopword_list]

    #Lemmatize the remaining word tokens
    lemma_list = [lemma.lemmatize(token) for token in stopwords_removed]

    #Turn the lemma list into a string for the Vectorizer
    cleaned_string = " ".join(lemma_list) 
    
    return cleaned_string

In [107]:
#sample tweet before cleaning
X_train['tweet_text'].iloc[100]

"Don't miss your chance to win RT @mention Going to #SXSW? Come by the #EMC Consulting booth for your chance to win an iPad 2! @mention"

In [102]:
#sanity check for the cleaninig function
clean_tweets(X_train['tweet_text'].iloc[100])

'miss chance win going come emc consulting booth chance win ipad'

In [105]:
X_train['clean_tweet'] = X_train['tweet_text'].apply(clean_tweets)

X_train.head()

Unnamed: 0,tweet_text,clean_tweet
3312,Good design means getting simple. @mention tal...,good design mean getting simple talk text way ...
1302,Wishing I was at #sxsw to see the rumored demo...,wishing see rumored demo new social network ci...
5344,RT @mention ÷¼ We have problemsÛ_TIME TO STO...,problem _time stop edchat musedchat sxswi clas...
4580,@mention New iPad Apps For Speech Therapy And ...,new ipad apps speech therapy communication sho...
3094,iPad 2 queue is epic. #sxsw {link},ipad queue epic
