# Brand and Products Twitter Sentiments Modeling

## Understanding Data

In [248]:
import pandas as pd

In [249]:
df = pd.read_csv('./data/twitter_data.csv',encoding='unicode_escape')

In [250]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product


In [251]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          9092 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [252]:
df[df['emotion_in_tweet_is_directed_at']=='Apple'].head(10)

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
9,Counting down the days to #sxsw plus strong Ca...,Apple,Positive emotion
40,@mention - Great weather to greet you for #sx...,Apple,Positive emotion
47,HOORAY RT ÛÏ@mention Apple Is Opening A Pop-U...,Apple,Positive emotion
49,wooooo!!! ÛÏ@mention Apple store downtown Aus...,Apple,Positive emotion
62,#OMFG! RT @mention Heard about Apple's pop-up ...,Apple,Positive emotion
63,#Smile RT @mention I think Apple's &quot;pop-u...,Apple,No emotion toward brand or product
83,"Nice!! RT @mention Hey, Apple fans! Get a peek...",Apple,Positive emotion
109,Kawasaki: &quot;Not C.S. Lewis level reasoning...,Apple,Positive emotion
111,Kawasaki: &quot;pagemaker saved Apple.&quot; O...,Apple,Positive emotion
116,"At #SXSW, #Apple schools the marketing experts...",Apple,Positive emotion


In [253]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

- Tf-Idf: Probably the most commonly used. Useful when the goal is to distinguish the content of documents from others in the corpus.
- Count: Useful when the words themselves matter. If the goal is instead about identifying authors by their words, then the fact that some word appears in many documents of the corpus may be important.
- Hashing: The advantage here is speed and low memory usage. The disadvantage is that you lose the identities of the words being tokenized. Useful for very large datasets where the ultimate model may be a bit of a black box.

In [254]:
df['tweet_text'][1]

"@jessedee Know about @fludapp ? Awesome iPad/iPhone app that you'll likely appreciate for its design. Also, they're giving free Ts at #SXSW"

In [255]:
df['tweet_text'][28]

'The new #4sq3 looks like it is going to rock. Update for iPhone and Android should push tonight http://bit.ly/etsbZk #SXSW #KeepAustinWeird'

In [256]:
# df = df[df.line_race != 0]

df = df[df['is_there_an_emotion_directed_at_a_brand_or_product'] != "I can't tell"]
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product


In [257]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [258]:
df['tweet_text'][0].split(' ')

['.@wesley83',
 'I',
 'have',
 'a',
 '3G',
 'iPhone.',
 'After',
 '3',
 'hrs',
 'tweeting',
 'at',
 '#RISE_Austin,',
 'it',
 'was',
 'dead!',
 '',
 'I',
 'need',
 'to',
 'upgrade.',
 'Plugin',
 'stations',
 'at',
 '#SXSW.']

In [259]:
import string

def remove_punctuation(text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation,'')
    return text


In [260]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product


In [261]:
# manual_cleanup = [s.translate(str.maketrans('', '', '0123456789')) \
#                   for s in manual_cleanup]

df.tweet_text = df.tweet_text.str.replace('\d+', '')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [262]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product


In [263]:
# Cleaning tweet_text column - lowercase and removing punctuation 


df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8937 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          8936 non-null   object
 1   emotion_in_tweet_is_directed_at                     3282 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  8937 non-null   object
dtypes: object(3)
memory usage: 599.3+ KB


In [264]:
df['tweet_text'][0].lower()

'.@wesley i have a g iphone. after  hrs tweeting at #rise_austin, it was dead!  i need to upgrade. plugin stations at #sxsw.'

In [265]:
df['tweet_clean'] = df['tweet_text'].str.lower()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['tweet_clean'] = df['tweet_text'].str.lower()


In [266]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,tweet_clean
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion,.@wesley i have a g iphone. after hrs tweetin...
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,@jessedee know about @fludapp ? awesome ipad/i...
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion,@swonderlin can not wait for #ipad also. they...
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,@sxsw i hope this year's festival isn't as cra...
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,@sxtxstate great stuff on fri #sxsw: marissa m...
...,...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion,ipad everywhere. #sxsw {link}
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product,"wave, buzz... rt @mention we interrupt your re..."
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product,"google's zeiger, a physician never reported po..."
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product,some verizon iphone customers complained their...


In [275]:
# tweetscrypto['clean_text']=tweetscrypto['text'].str.replace('(@\w+.*?)',"")

def clean_text(X):
    X = X.split()
    X_new = [x for x in X if not x.startswith("@")]
    return ' '.join(X_new)

df['tweet_clean'] = df['tweet_clean'].apply(clean_text)

AttributeError: 'float' object has no attribute 'split'

In [247]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,tweet_clean,target
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion,wesley i have a g iphone after hrs tweeting a...,0
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,jessedee know about fludapp awesome ipadiphon...,1
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion,swonderlin can not wait for ipad also they sh...,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,i hope this years festival isnt as crashy as ...,0
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,sxtxstate great stuff on fri marissa mayer go...,1
...,...,...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion,ipad everywhere link,1
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product,wave buzz mention we interrupt your regularly...,2
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product,googles zeiger a physician never repoed potent...,2
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product,some verizon iphone customers complained their...,2


In [232]:
import string

string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [233]:
punctuation = '"#$%&\'()*+,-./:;<=>@[\\]^_`{|}~'

In [234]:
punctuation

'"#$%&\'()*+,-./:;<=>@[\\]^_`{|}~'

In [235]:
df['tweet_clean'] = df['tweet_clean'].astype(str).apply(remove_punctuation)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['tweet_clean'] = df['tweet_clean'].astype(str).apply(remove_punctuation)


In [236]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,tweet_clean
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion,wesley i have a g iphone after hrs tweeting a...
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,jessedee know about fludapp awesome ipadiphon...
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion,swonderlin can not wait for ipad also they sh...
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,sxsw i hope this years festival isnt as crashy...
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,sxtxstate great stuff on fri sxsw marissa maye...
...,...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion,ipad everywhere sxsw link
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product,wave buzz rt mention we interrupt your regular...
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product,googles zeiger a physician never reported pote...
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product,some verizon iphone customers complained their...


In [237]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [238]:
df['target'] = df['is_there_an_emotion_directed_at_a_brand_or_product']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['target'] = df['is_there_an_emotion_directed_at_a_brand_or_product']


In [239]:
df['target'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
Name: target, dtype: int64

In [240]:
# Renaming target values

df['target'] = df['target'].replace('No emotion toward brand or product',2)
df['target'] = df['target'].replace('Positive emotion', 1)
df['target'] = df['target'].replace('Negative emotion',0)
df['target'].value_counts()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['target'] = df['target'].replace('No emotion toward brand or product',2)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['target'] = df['target'].replace('Positive emotion', 1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['target'] = df['target'].replace('Negative emotion',0)


2    5389
1    2978
0     570
Name: target, dtype: int64

In [241]:
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,tweet_clean,target
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion,wesley i have a g iphone after hrs tweeting a...,0
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,jessedee know about fludapp awesome ipadiphon...,1
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion,swonderlin can not wait for ipad also they sh...,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,sxsw i hope this years festival isnt as crashy...,0
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,sxtxstate great stuff on fri sxsw marissa maye...,1


In [242]:
# Remove words that start with @
# Remove the word RT - Done
# Remove the hashtag #SXSW - Done
# Make column with product that's being mentioned
# Rename columns - Done
# Limit dataset to just apple df
# Remove non-english ASCII characters {link}

In [244]:
# Remove the word RT 
# Remove the hashtag #SXSW

df['tweet_clean'] = df['tweet_clean'].str.replace('rt','',regex=True)
df['tweet_clean'] = df['tweet_clean'].str.replace('sxsw','',regex=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['tweet_clean'] = df['tweet_clean'].str.replace('rt','',regex=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['tweet_clean'] = df['tweet_clean'].str.replace('sxsw','',regex=True)


In [245]:
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,tweet_clean,target
0,.@wesley I have a G iPhone. After hrs tweetin...,iPhone,Negative emotion,wesley i have a g iphone after hrs tweeting a...,0
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,jessedee know about fludapp awesome ipadiphon...,1
2,@swonderlin Can not wait for #iPad also. They...,iPad,Positive emotion,swonderlin can not wait for ipad also they sh...,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,i hope this years festival isnt as crashy as ...,0
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,sxtxstate great stuff on fri marissa mayer go...,1
...,...,...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion,ipad everywhere link,1
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product,wave buzz mention we interrupt your regularly...,2
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product,googles zeiger a physician never repoed potent...,2
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product,some verizon iphone customers complained their...,2
