# Twitter Sentiment about Apple and Google

## Business Understanding

* The project is meant to analyze twits concerning sentiments that people on the internet have about Google and Apple products. 
* The dataset contains a total of 9,000 tweets which are an amalgamation of either positive, negative, or either tweets that containing varying sentiments concerning the products

# Data Understanding and Cleaning

In [None]:
#import the libraries
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt 
from IPython.display import display
import warnings 
warnings.filterwarnings("ignore")
import re


#import sklearn libraries
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE, SMOTEN
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from scipy.stats import randint
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report,roc_auc_score, roc_curve

In [13]:
data = pd.read_csv("Data/tweet_product_company.csv", encoding='latin-1')
display(data.head())

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [10]:
data.tail()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product
9092,Ï¡Ïàü_ÊÎÒ£Áââ_£â_ÛâRT @...,,No emotion toward brand or product


In [14]:
data.describe().T

Unnamed: 0,count,unique,top,freq
tweet_text,9092,9065,RT @mention Marissa Mayer: Google Will Connect...,5
emotion_in_tweet_is_directed_at,3291,9,iPad,946
is_there_an_emotion_directed_at_a_brand_or_product,9093,4,No emotion toward brand or product,5389


In [15]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          9092 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [17]:
# checking for null values
data.isna().sum()

tweet_text                                               1
emotion_in_tweet_is_directed_at                       5802
is_there_an_emotion_directed_at_a_brand_or_product       0
dtype: int64

In [19]:
# Check if missing values correlate with specific products or sentiment patterns
print(data[data['emotion_in_tweet_is_directed_at'].isna()]['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts())


is_there_an_emotion_directed_at_a_brand_or_product
No emotion toward brand or product    5298
Positive emotion                       306
I can't tell                           147
Negative emotion                        51
Name: count, dtype: int64


In [20]:
# Check tweet patterns
print(data[data['emotion_in_tweet_is_directed_at'].isna()]['tweet_text'].head(20))

5     @teachntech00 New iPad Apps For #SpeechTherapy...
6                                                   NaN
16    Holler Gram for iPad on the iTunes App Store -...
32    Attn: All  #SXSW frineds, @mention Register fo...
33        Anyone at  #sxsw want to sell their old iPad?
34    Anyone at  #SXSW who bought the new iPad want ...
35    At #sxsw.  Oooh. RT @mention Google to Launch ...
37    SPIN Play - a new concept in music discovery f...
39    VatorNews - Google And Apple Force Print Media...
41    HootSuite - HootSuite Mobile for #SXSW ~ Updat...
42    Hey #SXSW - How long do you think it takes us ...
43    Mashable! - The iPad 2 Takes Over SXSW [VIDEO]...
44    For I-Pad ?RT @mention New #UberSocial for #iP...
46    Hand-Held Û÷HoboÛª: Drafthouse launches Û÷H...
48    Orly....? ÛÏ@mention Google set to launch new...
50    Khoi Vinh (@mention says Conde Nast's headlong...
51    ÛÏ@mention {link} &lt;-- HELP ME FORWARD THIS...
52    ÷¼ WHAT? ÷_ {link} ã_ #edchat #musedcha

In [22]:
# Check the relationship first
print(data.groupby('is_there_an_emotion_directed_at_a_brand_or_product')['emotion_in_tweet_is_directed_at'].value_counts())


is_there_an_emotion_directed_at_a_brand_or_product  emotion_in_tweet_is_directed_at
I can't tell                                        No emotion mentioned                147
                                                    iPad                                  4
                                                    Apple                                 2
                                                    Google                                1
                                                    Other Google product or service       1
                                                    iPhone                                1
Negative emotion                                    iPad                                125
                                                    iPhone                              103
                                                    Apple                                95
                                                    Google                              

In [23]:
# Then fill strategically
data['emotion_in_tweet_is_directed_at'].fillna('No emotion mentioned', inplace=True)


In [24]:
# Re-checking for null values
data.isna().sum()

tweet_text                                            1
emotion_in_tweet_is_directed_at                       0
is_there_an_emotion_directed_at_a_brand_or_product    0
dtype: int64

In [25]:
# Find the row with missing data
missing_row = data[data.isna().any(axis=1)]
print(missing_row)

  tweet_text emotion_in_tweet_is_directed_at  \
6        NaN            No emotion mentioned   

  is_there_an_emotion_directed_at_a_brand_or_product  
6                 No emotion toward brand or product  


In [27]:
# drop the mising row
data = data.dropna()

In [28]:
# Final check for null values
data.isna().sum()

tweet_text                                            0
emotion_in_tweet_is_directed_at                       0
is_there_an_emotion_directed_at_a_brand_or_product    0
dtype: int64

In [29]:
# Check for duplicates
duplicates = data[data.duplicated(keep=False)]
print(duplicates)

                                             tweet_text  \
7     #SXSW is just starting, #CTIA is around the co...   
9     Counting down the days to #sxsw plus strong Ca...   
17    I just noticed DST is coming this weekend. How...   
20    Need to buy an iPad2 while I'm in Austin at #s...   
21    Oh. My. God. The #SXSW app for iPad is pure, u...   
24    Really enjoying the changes in Gowalla 3.0 for...   
466      Before It Even Begins, Apple Wins #SXSW {link}   
468      Before It Even Begins, Apple Wins #SXSW {link}   
774   Google to Launch Major New Social Network Call...   
776   Google to Launch Major New Social Network Call...   
2230  Marissa Mayer: Google Will Connect the Digital...   
2232  Marissa Mayer: Google Will Connect the Digital...   
2559  Counting down the days to #sxsw plus strong Ca...   
3950  Really enjoying the changes in Gowalla 3.0 for...   
3962  #SXSW is just starting, #CTIA is around the co...   
4897  Oh. My. God. The #SXSW app for iPad is pure, u... 

## Now, from what I see above, I think that there are some similarities in the tweets because some of them carry the same elements such as hashtags whihc often tend to have the same words and Phrasings. 

# Exploratory Data Analysis (EDA)

# Feature Engineering

# Model Building 

# Conclusion and Evaluation