# Setup

### Imports

These imports are slightly less than in `con_sent.ipynb` since this notebook is mean to play around with the results of the sentiment analysis rather than actually run it

In [1]:
import pandas as pd
import bisect
from gensim import models,corpora,utils
from collections import defaultdict

### Data

Below imports `full_df` which is a pandas dataframe, this will take around 4-5 minutes

In [2]:
full_df = pd.read_csv("emojiTranslatedCleanedNoUnderscore.csv", na_filter= False, parse_dates = ['created_at'],
                      dtype = {'tweet_id': str,'in_response_to_tweet_id': str, 'inbound':bool, 'response_tweet_id':str })
full_df.set_index("tweet_id", inplace = True)

full_df.head()

Unnamed: 0_level_0,Unnamed: 0,author_id,inbound,created_at,text,response_tweet_id,in_response_to_tweet_id
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,0,sprintcare,False,2017-10-31 22:10:47+00:00,I understand. I would like to assist you. We ...,2.0,3.0
2,1,115712,True,2017-10-31 22:11:45+00:00,@sprintcare and how do you propose we do that,,1.0
3,2,115712,True,2017-10-31 22:08:27+00:00,@sprintcare I have sent several private messag...,1.0,4.0
4,3,sprintcare,False,2017-10-31 21:54:49+00:00,Please send us a Private Message so that we c...,3.0,5.0
5,4,115712,True,2017-10-31 21:49:35+00:00,@sprintcare I did.,4.0,6.0


Below imports `sentiment_pairs_df` which is a pandas dataframe which holds the three tweets that make up the (sub)thread in which the difference in sentiment is recorded 

In [3]:
sentiment_pairs_df = pd.read_csv("con_sent.csv")
sentiment_pairs_df.set_index('Unnamed: 0', inplace = True)

num_rows = sentiment_pairs_df.shape[0]
sentiment_pairs_df.head()

Unnamed: 0_level_0,first_tweet_id,second_tweet_id,third_tweet_id,company,sentiment_change
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,8,6,5,sprintcare,1.0
1,5,4,3,sprintcare,-1.0
2,3,1,2,sprintcare,1.0
3,18,17,16,sprintcare,1.333333
4,16,15,12,sprintcare,-0.5


# Functions

Below are a wealth of documented functions used both to analyze what you saw in our presentation as well as a few extras to play around with the data.

### Tweet Pair/Subthread Filtering

Functions in this section are meant to be used to grab a subset of the tweet response pairs. **They tend to take several minutes** so it's recommended that you always save the result when using them.

In [8]:
# Returns the sentiment change threshold that will return at least the top `percent` of all
# tweet response pairs. This may return more than just the specified percent since many pairs
# below the percent cutoff may have the same sentiment change
def lower_sent_for_top_p_responses(percent=.001):
    num_top = round(num_rows * percent)
    deltas = []
    
    for row in range(num_rows):  
        change = sentiment_pairs_df.iloc[row][4]
        bisect.insort(deltas, change)
        
        if len(deltas) > num_top:
            deltas.pop(0)
            
    return deltas[0]    

# Returns the sentiment change threshold that will return at least the bottom `percent` of all
# tweet response pairs. This may return more than just the specified percent since many pairs
# below the percent cutoff may have the same sentiment change
def upper_sent_for_bottom_p_responses(percent=.001):
    num_top = round(num_rows * percent)
    deltas = []
    
    for row in range(num_rows):  
        change = sentiment_pairs_df.iloc[row][4]
        bisect.insort(deltas, change)
        
        if len(deltas) > num_top:
            deltas.pop(num_top)
            
    return deltas[-1]    

# Returns the row ids of `sentiment_pairs_df` that are above (optionally inclusive) a desired
# sentiment change
def get_rows_above_delta(delta=2, inclusive=False):
    row_ids = []
    
    for row in range(num_rows):
        row_data = sentiment_pairs_df.iloc[row]
        change = row_data[4]
        if (change >= delta if inclusive else change > delta):
            row_ids.append(row_data)
        
    return row_ids

# Returns the row ids of `sentiment_pairs_df` that are below (optionally inclusive) a desired
# sentiment change
def get_rows_below_delta(delta=2, inclusive=False):
    row_ids = []
    
    for row in range(num_rows):
        row_data = sentiment_pairs_df.iloc[row]
        change = row_data[4]
        if (change <= delta if inclusive else change < delta):
            row_ids.append(row_data)
        
    return row_ids

### Analysis

Functions in this section to create frequencies/unigrams/multinomial distributions of a collection of tweets

In [5]:
stopwords = open('stopwords.txt',"r")
stoplist = stopwords.read().splitlines() 

# Not meant to be called on its own (just a helper function)
def sent_to_words(sentences):
    for sentence in sentences:
        yield(utils.simple_preprocess(str(sentence), deacc=True))  # deacc=True removes punctuations
        
# Not meant to be called on its own (just a helper function)
def get_texts(rows):
    texts = []
    for row in rows:
        cs_tweet_id = row[1]
        cs_tweet_text = full_df.loc[str(cs_tweet_id)]['text']
        cs_tweet_text = [word for word in cs_tweet_text.lower().split() if word not in stoplist]
        texts.append(cs_tweet_text)
        
    return texts

# Not meant to be called on its own (just a helper function)
def build_frequency_map(rows):
    tweet_list = list(sent_to_words(get_texts(rows)))
    
    freq = defaultdict(int)
    total = 0
    
    for text in tweet_list:
        for token in text:
            freq[token] += 1
            total += 1
            
    return freq, total

# Takes rows corresponding to `sentiment_pairs_df` and builds a map of the top `top` words by
# frequency, excluding common stop words
def top_words(rows, top):
    frequency, total = build_frequency_map(rows)
    measures = []
    
    for word in frequency:
        count = frequency[word]
        if len(measures) < top or count > measures[0]:
            bisect.insort(measures, count)
            if len(measures) > top:
                measures.pop(0)
    ret = []
    for word in frequency:
        count = frequency[word]
        if count >= measures[0]:
            ret.append([word, count*100.0/total])
            
    return ret

### Utility

These functions are meant to help visualize output or transform data

In [6]:
# Will somewhat-pretty print the tweet response pairs of `rows`. When num_to_print is specified,
# only the first `num_to_print` rows will be printed, otherwise all will be printed.
separator = '----------------------------------------'
def print_tweets(rows, num_to_print=None):
    print(separator)
    for i in (range(len(rows)) if num_to_print == None else num_to_print):
        row = rows[i]
        first_text = full_df.loc[str(row[0])]['text']
        second_text = full_df.loc[str(row[1])]['text']
        third_text = full_df.loc[str(row[2])]['text']
        
        print("customer:", first_text)
        print("response:", second_text)
        print("customer:", third_text)
        print(separator)

# Playground!

A few usage examples are given if you are unsure where to get started

In [9]:
# The really great tweets! There are actually no changes above 3 or below -3
my_tweet_rows = get_rows_above_delta(3, True)
print_tweets(my_tweet_rows)

----------------------------------------
customer: somebody from @VerizonSupport please help meeeeee  weary face  weary face  weary face  weary face  I'm having the worst luck with your customer service
response:  Help has arrived! We are sorry to see that you are having trouble. How can we help?
customer: @VerizonSupport I finally got someone that helped me, thanks!
----------------------------------------
customer: @AppleSupport Thanks, thing is I still have like 81 cents in credit and won't let me do that until I have zero credit
response:  Try contacting our iTunes Store team here for more help: 
customer: @AppleSupport Awesome, thanks
----------------------------------------
customer: @Uber_Support Thanks - our baby is &lt;12mos and we need our car seat to fly so I was wondering if we can bring our own car seat and install via belt buckle in any UberX to go to the airport
response:  Hi there! Yes, you're always welcome to bring your own car seat along for the ride.
customer: @Uber

In [10]:
top_words(my_tweet_rows, 15)

[['help', 0.8547008547008547],
 ['sorry', 2.5641025641025643],
 ['team', 1.0683760683760684],
 ['hi', 2.7777777777777777],
 ['yes', 1.0683760683760684],
 ['service', 1.9230769230769231],
 ['please', 1.9230769230769231],
 ['dm', 1.9230769230769231],
 ['email', 0.8547008547008547],
 ['let', 0.8547008547008547],
 ['know', 0.8547008547008547],
 ['look', 1.0683760683760684],
 ['it', 0.8547008547008547],
 ['lapse', 1.2820512820512822],
 ['general', 1.2820512820512822],
 ['nature', 1.2820512820512822],
 ['concern', 1.2820512820512822],
 ['clarissa', 1.2820512820512822],
 ['thank', 0.8547008547008547]]