# Real Feedback For Product-Focused Companies

<b>Authors:</b> John Newcomb, Doug Mill, Andrew Marinelli

## Overview

For large compaines, broad user feedback generally takes on a relatively stiff form: The company issues a feedback survey with some sort of coupon reward for completion, and the user has the option to complete or not complete. There are several problems with this approach. First of all, voluntary feedback often times does not accurate reflect true sentiment. People generally feel the need to be nice, so they do not give their true thoughts. Negative experiences become neutral, neutral experiences become positive, and positive experiences become legendary. 

However, new means of information aggregation enable us to on user feedback in a natural way, where users do not feel surveilled. Using information streams from twitter API, along with an in-house contructed sentiment analysis tool, we aim to provide real-time tweet flagging for copmanies interested in getting synchronous and asynchronous honest feedback on their products. 

## Business Understanding

Gathering user feedback can be both difficult and deceiving. Additionally, tracking down and tackling bad PR has always been anything but predictable. However, in the digital age, we are able to monitor social media sites for indications of bad press. The idea is simple: Hook up to the twitter API, search for tweets related to your company via keyword (so, for exmaple, Apple might be interested in any tweet containing "iPad", "iPhone", etc), then have your user experience team sort through them as to identify strengths and weaknesses in the copmany. However, with the application of NLP, it is actually possible not only to bin the tweets but also to analyse them for sentiment. The implication of this is that a user experience team could get a constant stream of information regarding the shortcoming of their product, enabling more rapid iterations and quicker responses. The overall result: decreased constomer churn, increased user experience productivity, which, as you might guess, leads to overall increase in bottom line revenues. 

## Methodology

How did we do it? We begin with a dataset of tweets from South By Southwest 2013, a popular music festival in Austin, TX, where high tech companies make a habit of appearing to flaunt their latest and trendiest updates. The tweets are compiled by hashtags (#sxsw as well as others, case insensitive). First, we generate metadata per tweet, and clean the dataset up overall. Next, we go through an iterative modeling process, finally agreeing on a model per pre-established metric evaulation. Finally, we cross validate our results, and eventually test our product on unseen data to confirm our results. 

## EDA

We will begin with the basic necessary EDA.

In [26]:
# import necessary packages
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.max_colwidth', -1)

import string
import re

from sklearn.feature_extraction.text import CountVectorizer

import nltk
nltk.download('wordnet')

  pd.set_option('display.max_colwidth', -1)
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/westonnewcomb/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

Let's take a look at what we're working with!

In [27]:
# load data in as pandas DataFrame object
df = pd.read_csv('data/data.csv', encoding='unicode_escape')
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,".@wesley83 I have a 3G iPhone. After 3 hrs tweeting at #RISE_Austin, it was dead! I need to upgrade. Plugin stations at #SXSW.",iPhone,Negative emotion
1,"@jessedee Know about @fludapp ? Awesome iPad/iPhone app that you'll likely appreciate for its design. Also, they're giving free Ts at #SXSW",iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. They should sale them down at #SXSW.,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as crashy as this year's iPhone app. #sxsw,iPad or iPhone App,Negative emotion
4,"@sxtxstate great stuff on Fri #SXSW: Marissa Mayer (Google), Tim O'Reilly (tech books/conferences) &amp; Matt Mullenweg (Wordpress)",Google,Positive emotion


Our data appears to consist of three columns, each relatively simple to understand. Our first data is a tweet, followed by the product at which the tweet is directed, and finally the sentiment of that tweet. 

Before we go any further, let's go ahead and rename the columns for the sake of simplicity. 

In [28]:
# rename columns for simplicity
columns_dict = {'tweet_text':'tweet',
                'emotion_in_tweet_is_directed_at':'product',
                'is_there_an_emotion_directed_at_a_brand_or_product':'emotion_response'}

df = df.rename(columns=columns_dict)

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   tweet             9092 non-null   object
 1   product           3291 non-null   object
 2   emotion_response  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [30]:
df.isna().sum()

tweet               1   
product             5802
emotion_response    0   
dtype: int64

Before we deal with this nightmare of a situation with the product column, let's take a look at the null-valued tweet.

In [31]:
df[df['tweet'].isna()]

Unnamed: 0,tweet,product,emotion_response
6,,,No emotion toward brand or product


This appears likely to be a data input error. We can drop that row from the dataset. 

In [32]:
df = df.dropna(subset=['tweet'])

Shall we take a look at the product column next?

In [33]:
df['product'].value_counts()

iPad                               946
Apple                              661
iPad or iPhone App                 470
Google                             430
iPhone                             297
Other Google product or service    293
Android App                        81 
Android                            78 
Other Apple product or service     35 
Name: product, dtype: int64

In [34]:
df['product'].isna().mean()

0.6380334359876815

It appears as though we are do not have much information about the products - we're missing values in nearly two-thirds of the rows in the 'product' column. Perhaps this is a mistake in the dataset. In order to circumvent this issue, let's fill in the values if the tweet contains words indicative of the company involved.

In [35]:
apple_words = ['iphone', 'ipad', 'apple']
google_words = ['google', 'android']

In [36]:
def brand_classifier(tweet):
    
    tweet = tweet.lower()
    
    google = any(g in tweet for g in google_words)
    apple = any(a in tweet for a in apple_words)
    
    if (apple & google):
        return 'both'
    elif apple:
        return 'apple'
    elif google:
        return 'google'
    else:
        return 'neither'

In [37]:
df['company'] = df['tweet'].map(lambda x: brand_classifier(x))

In [38]:
df['company'].value_counts()

apple      5275
google     2781
neither    786 
both       250 
Name: company, dtype: int64

Wow - significantly better. 

Since we are only interested in classifying negative vs. non-negative, let's reframe our data so that it only contains information about whether the tweet has negative sentiment or not. 

In [39]:
emotions_dict = {'Positive emotion':0,
                 'No emotion toward brand or product':0,
                 'I can\'t tell':0,
                 'Negative emotion':1}

df['sentiment'] = df['emotion_response'].replace(emotions_dict)

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9092 entries, 0 to 9092
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   tweet             9092 non-null   object
 1   product           3291 non-null   object
 2   emotion_response  9092 non-null   object
 3   company           9092 non-null   object
 4   sentiment         9092 non-null   int64 
dtypes: int64(1), object(4)
memory usage: 426.2+ KB


In [41]:
df['sentiment'].value_counts()

0    8522
1    570 
Name: sentiment, dtype: int64

In [42]:
df['sentiment'].mean()

0.06269247690277167

So we are most certainly dealing with a case of class imbalance, which means almost definitely that we will need to uppsample or downsample our data. Also, we should definitely be evaluating our models based on a metric different than accuracy. In the mean time, we'll drop the columns that are not necessary for our purposes.

In [43]:
df = df.drop(['product', 'emotion_response'], axis=1)

For our final preprocessing step, it is necessary to clean the tweet of all unnecessary characters. 

In [44]:
def tweet_cleaner(tweet):
    twtr_stopwords = ['rt','rts','retweet','quot','sxsw', 'amp']
    punctuation = set(string.punctuation)
    punctuation.remove('#')
    
    x = tweet
    x = re.sub(r'https?:\/\/\S+', '', x) #remove URLs
    x = re.sub(r'{link}', '', x) #placeholders
    x = re.sub(r'@[\w]*', '', x) #@mention users
    x = re.sub('[^A-Za-z0-9]+', ' ', x) #@mention users
    x = re.sub(r'\b[0-9]+\b', '', x) #remove stand-alone numbers
    x = re.sub(r'&[a-z]+;', '', x) #remove HTML ref chars
    x = re.sub(r'\d+', '', x) #removes all NUMERALS
    x = ''.join(ch for ch in x if ch not in punctuation) #remove punctuation
    x = x.replace("[^a-zA-z]#", " ") #remove special chars
    
    x = [word.lower() for word in x.split() if word.lower() not in twtr_stopwords]
    x = [w for w in x if len(w)>2]
    
    lemmatizer = nltk.stem.WordNetLemmatizer()
    
    x = [lemmatizer.lemmatize(token) for token in x]
    
    return x

In [45]:
df['tweet_clean'] = df['tweet'].map(lambda x: tweet_cleaner(x))

Finally, since we are interested in seeing if we can generalize our results, we will split the dataset according to brand. Additonally, we can drop all the columns except our tweet_clean column and our sentiment column, because those are the only relevant columns for building our model. 

In [52]:
df_apple = df[df['company']=='apple']
df_google = df[df['company'] == 'google']

## Modeling

In [53]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import RepeatedStratifiedKFold

X = df['tweet']
y = df['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)

In [54]:
from sklearn.pipeline import Pipeline

rf_pipe = Pipeline(steps=[('preprocessing', CountVectorizer(lowercase=False, 
                                                            tokenizer=tweet_cleaner,
                                                            max_features=100)),
                          ('rf', RandomForestClassifier(random_state=42))])

rf_grid = {'rf__n_estimators': [20, 120, 220, 500],
           'rf__max_depth': [3,6,9],
           'rf__min_samples_split': [2, 5, 10],
           'rf__min_samples_leaf': [1, 2, 4]
          }

rf_gs = GridSearchCV(estimator=rf_pipe,
                     param_grid=rf_grid,
                     cv=RepeatedStratifiedKFold(n_splits=3,
                                                n_repeats=1,
                                                random_state=42),
                     verbose=100,
                     n_jobs=-1)


In [55]:
rf_gs.fit(X_train, y_train)

Fitting 3 folds for each of 108 candidates, totalling 324 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    7.4s
[Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    8.0s
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    8.0s
[Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed:    8.0s
[Parallel(n_jobs=-1)]: Done   7 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    8.4s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    8.4s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    9.1s
[Parallel(n_jobs=-1)]: Done  11 tasks      | elapsed:    9.2s
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    9.4s
[Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed:    9.6s
[Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed: 

[Parallel(n_jobs=-1)]: Done 131 tasks      | elapsed:   34.1s
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:   34.3s
[Parallel(n_jobs=-1)]: Done 133 tasks      | elapsed:   35.1s
[Parallel(n_jobs=-1)]: Done 134 tasks      | elapsed:   35.2s
[Parallel(n_jobs=-1)]: Done 135 tasks      | elapsed:   35.2s
[Parallel(n_jobs=-1)]: Done 136 tasks      | elapsed:   35.6s
[Parallel(n_jobs=-1)]: Done 137 tasks      | elapsed:   35.7s
[Parallel(n_jobs=-1)]: Done 138 tasks      | elapsed:   35.9s
[Parallel(n_jobs=-1)]: Done 139 tasks      | elapsed:   36.7s
[Parallel(n_jobs=-1)]: Done 140 tasks      | elapsed:   36.7s
[Parallel(n_jobs=-1)]: Done 141 tasks      | elapsed:   36.8s
[Parallel(n_jobs=-1)]: Done 142 tasks      | elapsed:   36.9s
[Parallel(n_jobs=-1)]: Done 143 tasks      | elapsed:   37.0s
[Parallel(n_jobs=-1)]: Done 144 tasks      | elapsed:   37.0s
[Parallel(n_jobs=-1)]: Done 145 tasks      | elapsed:   37.9s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:   37.9s
[Paralle

[Parallel(n_jobs=-1)]: Done 264 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 265 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 266 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 267 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 268 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 269 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 270 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 271 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 272 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 273 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 274 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 275 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 276 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 277 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 278 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 279 tasks      | elapsed:  1.2min
[Paralle

GridSearchCV(cv=RepeatedStratifiedKFold(n_repeats=1, n_splits=3, random_state=42),
             estimator=Pipeline(steps=[('preprocessing',
                                        CountVectorizer(lowercase=False,
                                                        max_features=100,
                                                        tokenizer=<function tweet_cleaner at 0x7fa5e93af3a0>)),
                                       ('rf',
                                        RandomForestClassifier(random_state=42))]),
             n_jobs=-1,
             param_grid={'rf__max_depth': [3, 6, 9],
                         'rf__min_samples_leaf': [1, 2, 4],
                         'rf__min_samples_split': [2, 5, 10],
                         'rf__n_estimators': [20, 120, 220, 500]},
             verbose=100)

In [56]:
rf_gs.score(X_train, y_train)

0.9372341985628391

In [57]:
rf_gs.score(X_test, y_test)

0.9375274967003959

## Evaluation
---

## Conclusion
---

## Future Research
---