# Sentiment and Emotion Analysis

Notebook 4 of 4

We will analyse the post from each of the subreddits as well as some of the major topics from each to get understanding on the communities sentiment and emotion. In order to do so, we will utilise the Hugging Face pre-trained models for sentiment analysis as well as emotion analysis.
 
The topics for each subreddit we will explore are:
- Dunkin Donuts
 1. dunkin donuts 
 2. pumpkin spice
 3. cold brew 
 4. iced coffee 
 5. butter pecan
 6. local dunkin
 7. mobile order
 8. drive thru
 9. dunkin worker/ employee/ job
 
- Starbucks
 1. dress code 
 2. pumpkin spice
 3. cold brew
 4. iced coffee
 5. apple crisp
 6. fall launch
 7. mobile order
 8. drive thru
 9. starbucks worker/ employee/ job
 
Dunkin Donuts and Starbucks are the brands name? These follows by the top 3 most popular products for each subreddit based on the frequency of the words appear in the subreddit. The local and upcoming launch of product are also hot topics in both subreddits.


## Import Clean Data

In [1]:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.tokenize import word_tokenize, RegexpTokenizer
from transformers import pipeline

In [2]:
combined_df = pd.read_csv('./datasets/combined_cleaned.csv')
combined_df.shape

(4997, 6)

In [3]:
combined_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4997 entries, 0 to 4996
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Unnamed: 0      4997 non-null   int64 
 1   subreddit       4997 non-null   int64 
 2   selftext        3279 non-null   object
 3   title           4997 non-null   object
 4   created_utc     4997 non-null   int64 
 5   title_selftext  4997 non-null   object
dtypes: int64(3), object(3)
memory usage: 234.4+ KB


In [4]:
combined_df.head(3)

Unnamed: 0.1,Unnamed: 0,subreddit,selftext,title,created_utc,title_selftext
0,0,1,,My coworker placing the hash browns like army ...,1663204910,my coworker placing the hash browns like army ...
1,1,1,,whats the deal with these?,1663196066,whats the deal with these? nan
2,2,1,I know I asked about this before but I'm just ...,Working for dunkin,1663193081,working for dunkin i know i asked about this b...


In [5]:
# drop unnecessary columns
combined_df = combined_df.drop(columns = ['Unnamed: 0', 'selftext', 'title'], axis=1)

In [6]:
# check for null values
combined_df.isnull().sum()

subreddit         0
created_utc       0
title_selftext    0
dtype: int64

In [7]:
# drop null
combined_df = combined_df.dropna()

In [8]:
# check for empty values
combined_df[combined_df['title_selftext'] == ''].sum()

subreddit         0.0
created_utc       0.0
title_selftext    0.0
dtype: float64

There is no missing values in dataset now.

### Tokenize words and join back into a sentence to remove unwanted characters

In [9]:
tokenizer = RegexpTokenizer(r'\w+')

In [10]:
combined_df['tokenized'] = combined_df['title_selftext'].apply(lambda x: tokenizer.tokenize(x.lower()))
combined_df.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized
0,1,1663204910,my coworker placing the hash browns like army ...,"[my, coworker, placing, the, hash, browns, lik..."
1,1,1663196066,whats the deal with these? nan,"[whats, the, deal, with, these, nan]"
2,1,1663193081,working for dunkin i know i asked about this b...,"[working, for, dunkin, i, know, i, asked, abou..."


In [11]:
combined_df['title_selftext'] = combined_df['tokenized'].apply(lambda x: " ".join(x))
combined_df.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized
0,1,1663204910,my coworker placing the hash browns like army ...,"[my, coworker, placing, the, hash, browns, lik..."
1,1,1663196066,whats the deal with these nan,"[whats, the, deal, with, these, nan]"
2,1,1663193081,working for dunkin i know i asked about this b...,"[working, for, dunkin, i, know, i, asked, abou..."


### Separate into Starbucks and Dunkin Donuts datasets for analysis

In [12]:
ddonuts_text_df = combined_df[combined_df['subreddit'] == 1]
ddonuts_text_df.shape

(2498, 4)

In [13]:
ddonuts_text_df.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized
0,1,1663204910,my coworker placing the hash browns like army ...,"[my, coworker, placing, the, hash, browns, lik..."
1,1,1663196066,whats the deal with these nan,"[whats, the, deal, with, these, nan]"
2,1,1663193081,working for dunkin i know i asked about this b...,"[working, for, dunkin, i, know, i, asked, abou..."


In [14]:
sbucks_text_df = combined_df[combined_df['subreddit'] == 0]
sbucks_text_df.shape

(2499, 4)

In [15]:
sbucks_text_df.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized
2498,0,1663212467,interview tips hi all hopefully this question ...,"[interview, tips, hi, all, hopefully, this, qu..."
2499,0,1663212017,we had horses come through the drive thru rece...,"[we, had, horses, come, through, the, drive, t..."
2500,0,1663211903,having horses in the drive thru makes everythi...,"[having, horses, in, the, drive, thru, makes, ..."


### Create separate dataframe for each of the subtopics for analysis

In [16]:
searchfor = ['dunkin donut', 'dunkin doughnut']
dunkin_donuts = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor))]
dunkin_donuts = dunkin_donuts.copy()
print(f'Dataframe of dunkin_donuts has shape {dunkin_donuts.shape}')
dunkin_c_brew = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('cold brew')]
dunkin_c_brew = dunkin_c_brew.copy()
print(f'Dataframe of dunkin_c_brew has shape {dunkin_c_brew.shape}')
dunkin_i_coffee = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('ice coff')]
dunkin_i_coffee = dunkin_i_coffee.copy()
print(f'Dataframe of dunkin_i_coffee has shape {dunkin_i_coffee.shape}')
dunkin_b_pecan = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('butter pecan')]
dunkin_b_pecan = dunkin_b_pecan.copy()
print(f'Dataframe of dunkin_b_pecan has shape {dunkin_b_pecan.shape}')
dunkin_local = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('local')]
dunkin_local = dunkin_local.copy()
print(f'Dataframe of dunkin_local has shape {dunkin_local.shape}')
searchfor_1 = ['free drink', 'free bev', 'free beverag', 'rewards', 'reward']
dunkin_reward = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_1))]
dunkin_reward = dunkin_reward.copy()
print(f'Dataframe of dunkin_reward has shape {dunkin_reward.shape}')
searchfor_2 = ['mobile order', 'mobile orders', 'drive thru', 'drive through','app', 'workers', 'staff']
dunkin_service = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_2))]
dunkin_service = dunkin_service.copy()
print(f'Dataframe of dunkin_service has shape {dunkin_service.shape}')
searchfor_3 = ['barista', 'baristas']
dunkin_barista = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_3))]
dunkin_barista = dunkin_barista.copy()
print(f'Dataframe of dunkin_barista has shape {dunkin_barista.shape}')

Dataframe of dunkin_donuts has shape (99, 4)
Dataframe of dunkin_c_brew has shape (126, 4)
Dataframe of dunkin_i_coffee has shape (13, 4)
Dataframe of dunkin_b_pecan has shape (60, 4)
Dataframe of dunkin_local has shape (77, 4)
Dataframe of dunkin_reward has shape (96, 4)
Dataframe of dunkin_service has shape (561, 4)
Dataframe of dunkin_barista has shape (9, 4)


In [39]:
dunkin_donuts = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('dunkin donut')]
dunkin_donuts = dunkin_donuts.copy()
print(f'Dataframe of dunkin_donuts has shape {dunkin_donuts.shape}')
dunkin_c_brew = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('cold brew')]
dunkin_c_brew = dunkin_c_brew.copy()
print(f'Dataframe of dunkin_c_brew has shape {dunkin_c_brew.shape}')
dunkin_i_coffee = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('ice coff')]
dunkin_i_coffee = dunkin_i_coffee.copy()
print(f'Dataframe of dunkin_i_coffee has shape {dunkin_i_coffee.shape}')
dunkin_b_pecan = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('butter pecan')]
dunkin_b_pecan = dunkin_b_pecan.copy()
print(f'Dataframe of dunkin_b_pecan has shape {dunkin_b_pecan.shape}')
dunkin_local = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('local')]
dunkin_local = dunkin_local.copy()
print(f'Dataframe of dunkin_local has shape {dunkin_local.shape}')
searchfor_1 = ['free drink', 'free bev', 'free beverag', 'rewards', 'reward']
dunkin_reward = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_1))]
dunkin_reward = dunkin_reward.copy()
print(f'Dataframe of dunkin_reward has shape {dunkin_reward.shape}')
searchfor_2 = ['mobile order', 'mobile orders', 'drive thru', 'drive through','app', 'workers', 'staff']
dunkin_service = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_2))]
dunkin_service = dunkin_service.copy()
print(f'Dataframe of dunkin_service has shape {dunkin_service.shape}')
searchfor_3 = ['barista', 'baristas']
dunkin_barista = ddonuts_text_df[ddonuts_text_df['title_selftext'].str.contains('|'.join(searchfor_3))]
dunkin_barista = dunkin_barista.copy()
print(f'Dataframe of dunkin_barista has shape {dunkin_barista.shape}')

Dataframe of dunkin_donuts has shape (97, 4)
Dataframe of dunkin_c_brew has shape (126, 4)
Dataframe of dunkin_i_coffee has shape (13, 4)
Dataframe of dunkin_b_pecan has shape (60, 4)
Dataframe of dunkin_local has shape (77, 4)
Dataframe of dunkin_reward has shape (96, 4)
Dataframe of dunkin_service has shape (561, 4)
Dataframe of dunkin_barista has shape (9, 4)


In [17]:
sbucks_dress = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('dress code')]
sbucks_dress = sbucks_dress.copy()
print(f'Dataframe of sbucks_dress has shape {sbucks_dress.shape}')
sbucks_p_spice = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('pumpkin spice')]
sbucks_p_spice = sbucks_p_spice.copy()
print(f'Dataframe of sbucks_p_spice has shape {sbucks_p_spice.shape}')
sbucks_c_brew = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('cold brew')]
sbucks_c_brew = sbucks_c_brew.copy()
print(f'Dataframe of sbucks_c_brew has shape {sbucks_c_brew.shape}')
sbucks_i_coffee = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('ice coff')]
sbucks_i_coffee = sbucks_i_coffee.copy()
print(f'Dataframe of dunkin_i_coffee has shape {sbucks_i_coffee.shape}')
sbucks_a_crisp = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('appl crisp')]
sbucks_a_crisp = sbucks_a_crisp.copy() 
print(f'Dataframe of sbucks_a_crisp has shape {sbucks_a_crisp.shape}')
searchfor_4 = ['fall launch', 'fall drink', 'fall refresher', 'fall drinks']
sbucks_f_launch = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('|'.join(searchfor_4))]
sbucks_f_launch = sbucks_f_launch.copy()
print(f'Dataframe of sbucks_f_launch has shape {sbucks_f_launch.shape}')
searchfor_5 = ['free drink', 'free bev', 'free beverag', 'rewards', 'reward']
sbucks_reward = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('|'.join(searchfor_1))]
sbucks_reward = sbucks_reward.copy()
print(f'Dataframe of sbucks_reward has shape {sbucks_reward.shape}')
searchfor_6 = ['mobile order', 'mobile orders', 'drive thru', 'drive through', 'app', 'workers', 'staff']
sbucks_service = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('|'.join(searchfor_6))]
sbucks_service = sbucks_service.copy()
print(f'Dataframe of sbucks_service has shape {sbucks_service.shape}')
searchfor_7 = ['barista', 'baristas']
sbucks_barista = sbucks_text_df[sbucks_text_df['title_selftext'].str.contains('|'.join(searchfor_7))]
sbucks_barista = sbucks_barista.copy()
print(f'Dataframe of sbucks_barista has shape {sbucks_barista.shape}')

Dataframe of sbucks_dress has shape (39, 4)
Dataframe of sbucks_p_spice has shape (72, 4)
Dataframe of sbucks_c_brew has shape (88, 4)
Dataframe of dunkin_i_coffee has shape (6, 4)
Dataframe of sbucks_a_crisp has shape (0, 4)
Dataframe of sbucks_f_launch has shape (29, 4)
Dataframe of sbucks_reward has shape (24, 4)
Dataframe of sbucks_service has shape (673, 4)
Dataframe of sbucks_barista has shape (335, 4)


## Sentiment Analysis

Model used: twitter-XLM-roBERTa-base for Sentiment Analysis. This model wastrained on about 124M tweets from Jan 2018 to Dec 2021, and finetuned for sentiment analysis with TweetEval benchmark. [source](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment)

In [18]:
from nltk.sentiment import SentimentIntensityAnalyzer
import operator

In [19]:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig

In [20]:
model_name = "cardiffnlp/twitter-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [21]:
senti_classifier = pipeline("sentiment-analysis", 
                            model=model,
                            tokenizer = tokenizer)

In [22]:
import pickle
import sys

In [23]:
p = pickle.dumps(senti_classifier)
print(sys.getsizeof(p))

500046474


In [24]:
# define function to extract sentiments
def sentiments(dataset):
    dataset['sentiment'] = dataset['title_selftext'].apply(senti_classifier)
    dataset['sentiments'] = dataset['sentiment'].apply(lambda x: x[0]['label'])
    dataset['sentiments'] = dataset['sentiments'].map({'LABEL_0': 'negative', 'LABEL_1': 'neutral', 'LABEL_2': 'positive'})
    return dataset

## Dunkin Sentiments

### Overall

In [25]:
dunkin_sample_senti = sentiments(ddonuts_text_df.sample(500))

In [26]:
dunkin_sample_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
804,1,1657716634,anyone know what the mixology stuff is appeare...,"[anyone, know, what, the, mixology, stuff, is,...","[{'label': 'LABEL_2', 'score': 0.8710981607437...",positive
240,1,1661398780,oatmilk or almondmilk psl are dunkin psl s mad...,"[oatmilk, or, almondmilk, psl, are, dunkin, ps...","[{'label': 'LABEL_1', 'score': 0.8990364670753...",neutral
495,1,1659987643,promo codes for new accounts i m starting my d...,"[promo, codes, for, new, accounts, i, m, start...","[{'label': 'LABEL_2', 'score': 0.5711829662322...",positive


In [27]:
dunkin_sample_senti['sentiments'].value_counts()

neutral     253
negative    166
positive     81
Name: sentiments, dtype: int64

In [28]:
dunkin_sentiments = pd.DataFrame(dunkin_sample_senti['sentiments'].value_counts())
dunkin_sentiments

Unnamed: 0,sentiments
neutral,253
negative,166
positive,81


### Dunkin Cold Brew

In [29]:
dunkin_c_brew_senti = sentiments(dunkin_c_brew)

In [30]:
dunkin_c_brew_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
22,1,1663020154,3 md lattes my app advertises 3 med lattes and...,"[3, md, lattes, my, app, advertises, 3, med, l...","[{'label': 'LABEL_0', 'score': 0.5559211373329...",negative
41,1,1662915853,this pumpkin cream cold brew is amazing weary_...,"[this, pumpkin, cream, cold, brew, is, amazing...","[{'label': 'LABEL_2', 'score': 0.9879993200302...",positive
52,1,1662825134,anyone know what s the next coffee promo after...,"[anyone, know, what, s, the, next, coffee, pro...","[{'label': 'LABEL_1', 'score': 0.9411011934280...",neutral


In [31]:
dunkin_c_brew_senti['sentiments'].value_counts()

neutral     57
negative    41
positive    28
Name: sentiments, dtype: int64

In [32]:
dunkin_c_brew_sentiments = pd.DataFrame(dunkin_c_brew_senti['sentiments'].value_counts())
dunkin_c_brew_sentiments

Unnamed: 0,sentiments
neutral,57
negative,41
positive,28


### Dunkin Donuts

In [40]:
ddonuts_senti = sentiments(dunkin_donuts)

RuntimeError: The expanded size of the tensor (3123) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 3123].  Tensor sizes: [1, 514]

In [33]:
ddonuts_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments


In [30]:
ddonuts_senti['sentiments'].value_counts()

neutral     373
negative     72
positive     55
Name: sentiments, dtype: int64

In [31]:
ddonuts_sentiments = pd.DataFrame(ddonuts_senti['sentiments'].value_counts())
ddonuts_sentiments

Unnamed: 0,sentiments
neutral,373
negative,72
positive,55


### Dunkin Iced Coffee

In [41]:
dunkin_i_coffee_senti = sentiments(dunkin_i_coffee)

RuntimeError: The expanded size of the tensor (653) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 653].  Tensor sizes: [1, 514]

In [29]:
dunkin_i_coffee_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
3,1,1663190691,make ice tea order door dash tast ice tea orde...,"[make, ice, tea, order, door, dash, tast, ice,...","[{'label': 'LABEL_0', 'score': 0.6549773812294...",negative
68,1,1662659909,blueberri crumbl ice coffe dunkin yesterday he...,"[blueberri, crumbl, ice, coffe, dunkin, yester...","[{'label': 'LABEL_1', 'score': 0.8312905430793...",neutral
69,1,1662652584,nutti pumpkin ice coffe good ive get ice coffe...,"[nutti, pumpkin, ice, coffe, good, ive, get, i...","[{'label': 'LABEL_0', 'score': 0.4688285291194...",negative


In [30]:
dunkin_i_coffee_senti['sentiments'].value_counts()

neutral     133
positive     29
negative     27
Name: sentiments, dtype: int64

In [31]:
dunkin_i_coffee_sentiments = pd.DataFrame(dunkin_i_coffee_senti['sentiments'].value_counts())
dunkin_i_coffee_sentiments

Unnamed: 0,sentiments
neutral,133
positive,29
negative,27


### Dunkin Butter Pecan

In [42]:
dunkin_b_pecan_senti = sentiments(dunkin_b_pecan)

RuntimeError: The expanded size of the tensor (653) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 653].  Tensor sizes: [1, 514]

In [40]:
dunkin_b_pecan_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
20,1,1663022152,discontinu flavor swirl took call morn ladi sa...,"[discontinu, flavor, swirl, took, call, morn, ...","[{'label': 'LABEL_1', 'score': 0.5559210181236...",neutral
69,1,1662652584,nutti pumpkin ice coffe good ive get ice coffe...,"[nutti, pumpkin, ice, coffe, good, ive, get, i...","[{'label': 'LABEL_0', 'score': 0.4688285291194...",negative
221,1,1661538738,tf lemonad season work dunkin also go frequent...,"[tf, lemonad, season, work, dunkin, also, go, ...","[{'label': 'LABEL_0', 'score': 0.5790102481842...",negative


In [41]:
dunkin_b_pecan_senti['sentiments'].value_counts()

neutral     39
positive    12
negative     8
Name: sentiments, dtype: int64

In [42]:
dunkin_b_pecan_sentiments = pd.DataFrame(dunkin_b_pecan_senti['sentiments'].value_counts())
dunkin_b_pecan_sentiments

Unnamed: 0,sentiments
neutral,39
positive,12
negative,8


### Dunkin Local

In [78]:
dunkin_local.head()

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized
34,1,1662958423,someon explain local dunkin dougnut longer ser...,"[someon, explain, local, dunkin, dougnut, long..."
104,1,1662306477,soup like dunkinrunsony survey free donut alwa...,"[soup, like, dunkinrunsony, survey, free, donu..."
145,1,1662028911,tip return local dunkin morn hand girl tip usu...,"[tip, return, local, dunkin, morn, hand, girl,..."
158,1,1661954305,don t know say someth went local dunkin that g...,"[don, t, know, say, someth, went, local, dunki..."
159,1,1661953274,drive person window get keep tip new ladi work...,"[drive, person, window, get, keep, tip, new, l..."


In [82]:
dunkin_local_senti = sentiments(dunkin_local.sample(21))

In [83]:
dunkin_local_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
1113,1,1655534248,im predict right one next drink matcha lemonad...,"[im, predict, right, one, next, drink, matcha,...","[{'label': 'LABEL_1', 'score': 0.6368399262428...",neutral
1597,1,1651601528,scream humili today b nice dunki employe ladi ...,"[scream, humili, today, b, nice, dunki, employ...","[{'label': 'LABEL_0', 'score': 0.6598921418190...",negative
2339,1,1645035383,muffin dont exactli go muffin wonder mayb some...,"[muffin, dont, exactli, go, muffin, wonder, ma...","[{'label': 'LABEL_1', 'score': 0.5718753337860...",neutral


In [84]:
dunkin_local_senti['sentiments'].value_counts()

neutral     15
negative     4
positive     2
Name: sentiments, dtype: int64

In [85]:
dunkin_local_sentiments = pd.DataFrame(dunkin_local_senti['sentiments'].value_counts())
dunkin_local_sentiments

Unnamed: 0,sentiments
neutral,15
negative,4
positive,2


### Summary

In [None]:
dunkin_cols = ["Overall", "Dunkin_Donuts", "Cold_Brew", "Iced_Coffee", "Butter_Pecan", "Dunkin_Local"]
dunkin_sentiments = pd.concat([ps5_sentiments, psplus_sentiments, pselden_sentiments, horizon_sentiments, ff_sentiments, pscont_sentiments], axis=1, ignore_index=True)
playstation_sentiments.columns = ps_cols

Remarks:
- most posts are neutral, most likely are questions and discussions over the topics
- 

## Starbucks Sentiments

### Overall

In [50]:
sbucks_sample_senti = sentiments(sbucks_text_df.sample(500))

In [51]:
sbucks_sample_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
2771,0,1662992585,lid dont stay ceram cup nan,"[lid, dont, stay, ceram, cup, nan]","[{'label': 'LABEL_1', 'score': 0.7076201438903...",neutral
4733,0,1661453752,question reshaken espresso vs latt sbux insid ...,"[question, reshaken, espresso, vs, latt, sbux,...","[{'label': 'LABEL_0', 'score': 0.5602721571922...",negative
4933,0,1661254898,starbuck dish sanit anyon els secur major dryn...,"[starbuck, dish, sanit, anyon, els, secur, maj...","[{'label': 'LABEL_1', 'score': 0.5626117587089...",neutral


In [52]:
sbucks_sample_senti['sentiments'].value_counts()

neutral     394
negative     63
positive     43
Name: sentiments, dtype: int64

In [53]:
sbucks_sentiments = pd.DataFrame(sbucks_sample_senti['sentiments'].value_counts())
sbucks_sentiments

Unnamed: 0,sentiments
neutral,394
negative,63
positive,43


### Dress Code

In [54]:
sbucks_dress_senti = sentiments(sbucks_dress)

In [55]:
sbucks_dress_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
2740,0,1663017656,dress code question realli cute black combat b...,"[dress, code, question, realli, cute, black, c...","[{'label': 'LABEL_1', 'score': 0.8963563442230...",neutral
2871,0,1662882825,still wear flat cap starbuck also plaintshirt ...,"[still, wear, flat, cap, starbuck, also, plain...","[{'label': 'LABEL_1', 'score': 0.8445830941200...",neutral
2960,0,1662808730,new hire bring first day train given list docu...,"[new, hire, bring, first, day, train, given, l...","[{'label': 'LABEL_2', 'score': 0.7134898900985...",positive


In [57]:
sbucks_dress_senti['sentiments'].value_counts()

neutral     30
positive     5
negative     4
Name: sentiments, dtype: int64

In [58]:
sbucks_dress_sentiments = pd.DataFrame(sbucks_dress_senti['sentiments'].value_counts())
sbucks_dress_sentiments

Unnamed: 0,sentiments
neutral,30
positive,5
negative,4


### Pumpkin Spice

In [59]:
sbucks_p_spice_senti = sentiments(sbucks_p_spice)

In [60]:
sbucks_p_spice_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
2636,0,1663109346,happi pumpkin spice season jackolantern nan,"[happi, pumpkin, spice, season, jackolantern, ...","[{'label': 'LABEL_1', 'score': 0.8681311607360...",neutral
2698,0,1663047373,kj correct know syrup heap calori kj pumpkin s...,"[kj, correct, know, syrup, heap, calori, kj, p...","[{'label': 'LABEL_1', 'score': 0.8142969608306...",neutral
2832,0,1662929371,allerg someth fall flavor what syrup i v three...,"[allerg, someth, fall, flavor, what, syrup, i,...","[{'label': 'LABEL_1', 'score': 0.8318116068840...",neutral


In [61]:
sbucks_p_spice_senti['sentiments'].value_counts()

neutral     55
positive    12
negative     5
Name: sentiments, dtype: int64

In [62]:
sbucks_p_spice_sentiments = pd.DataFrame(sbucks_p_spice_senti['sentiments'].value_counts())
sbucks_p_spice_sentiments

Unnamed: 0,sentiments
neutral,55
positive,12
negative,5


### Cold Brew

In [63]:
sbucks_c_brew_senti = sentiments(sbucks_c_brew)

RuntimeError: The expanded size of the tensor (1034) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 1034].  Tensor sizes: [1, 514]

In [55]:
sbucks_c_brew_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
2740,0,1663017656,dress code question realli cute black combat b...,"[dress, code, question, realli, cute, black, c...","[{'label': 'LABEL_1', 'score': 0.8963563442230...",neutral
2871,0,1662882825,still wear flat cap starbuck also plaintshirt ...,"[still, wear, flat, cap, starbuck, also, plain...","[{'label': 'LABEL_1', 'score': 0.8445830941200...",neutral
2960,0,1662808730,new hire bring first day train given list docu...,"[new, hire, bring, first, day, train, given, l...","[{'label': 'LABEL_2', 'score': 0.7134898900985...",positive


In [57]:
sbucks_dress_senti['sentiments'].value_counts()

neutral     30
positive     5
negative     4
Name: sentiments, dtype: int64

In [58]:
sbucks_dress_sentiments = pd.DataFrame(sbucks_dress_senti['sentiments'].value_counts())
sbucks_dress_sentiments

Unnamed: 0,sentiments
neutral,30
positive,5
negative,4


### Apple Crisp

In [72]:
sbucks_a_crisp_senti = sentiments(sbucks_a_crisp)

In [73]:
sbucks_a_crisp_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
2657,0,1663095901,ok i m curiou what correl someon like cilantro...,"[ok, i, m, curiou, what, correl, someon, like,...","[{'label': 'LABEL_1', 'score': 0.8949554562568...",neutral
2681,0,1663070793,typic went starbuck morn decid go insid drive ...,"[typic, went, starbuck, morn, decid, go, insid...","[{'label': 'LABEL_1', 'score': 0.7545986175537...",neutral
2720,0,1663031317,appl crisp tast like straight chemic loudlycry...,"[appl, crisp, tast, like, straight, chemic, lo...","[{'label': 'LABEL_1', 'score': 0.5060799121856...",neutral


In [74]:
sbucks_a_crisp_senti['sentiments'].value_counts()

neutral     21
positive     5
negative     3
Name: sentiments, dtype: int64

In [75]:
sbucks_a_crisp_sentiments = pd.DataFrame(sbucks_a_crisp_senti['sentiments'].value_counts())
sbucks_a_crisp_sentiments

Unnamed: 0,sentiments
neutral,21
positive,5
negative,3


### Fall Launch

In [68]:
sbucks_f_launch_senti = sentiments(sbucks_f_launch)

In [69]:
sbucks_f_launch_senti.head(3)

Unnamed: 0,subreddit,created_utc,title_selftext,tokenized,sentiment,sentiments
3502,0,1662343518,board fall launch nan,"[board, fall, launch, nan]","[{'label': 'LABEL_1', 'score': 0.8202300071716...",neutral
3612,0,1662263514,fall launch sign ft duolingo cake pop nan,"[fall, launch, sign, ft, duolingo, cake, pop, ...","[{'label': 'LABEL_1', 'score': 0.8723303079605...",neutral
3653,0,1662241595,swear last year fall launch didnt hit hard lik...,"[swear, last, year, fall, launch, didnt, hit, ...","[{'label': 'LABEL_1', 'score': 0.5231901407241...",neutral


In [70]:
sbucks_f_launch_senti['sentiments'].value_counts()

neutral     17
positive     3
Name: sentiments, dtype: int64

In [71]:
sbucks_f_launch_sentiments = pd.DataFrame(sbucks_f_launch_senti['sentiments'].value_counts())
sbucks_f_launch_sentiments

Unnamed: 0,sentiments
neutral,17
positive,3


In [None]:
# pip install --ignore-installed --upgrade tensorflow-gpu