The objective is to calculate the difference in sentiment between the First Customer Tweet and the Last Customer Tweet of each conversation thread.
Then that difference (improvement) in sentiment is summarized by company and by time (year-month).

In [94]:
import pandas as pd


In [95]:
senti_improve = pd.read_csv('thread_first_last_sentiment.csv', dtype = {'tweet_id': str})
senti_improve.head(50)

Unnamed: 0,tweet_id,author_id,company_name,tweet_l,author_l,inbound_l,time_l,length,verify_thread,verify_time,verify_alternance,inbound_first,first_sentim_l,last_sentim_l,first_tweet_text,last_tweet_text
0,2,115712,sprintcare,8|6|5|4|3|1|2,115712|sprintcare|115712|sprintcare|115712|spr...,True|False|True|False|True|False|True,2017-10-31 21:45:10+00:00|2017-10-31 21:46:24+...,7,True,True,True,True,Negative,Neutral,@sprintcare is the worst customer service,@sprintcare and how do you propose we do that
1,11,sprintcare,sprintcare,18|17|16|15|12|11,115713|sprintcare|115713|sprintcare|115713|spr...,True|False|True|False|True|False,2017-10-31 19:56:01+00:00|2017-10-31 19:59:13+...,6,True,True,True,True,Neutral|Neutral|Neutral,Neutral|Neutral,y’all lie about your “great” connection. 5 ba...,@sprintcare You gonna magically change your co...
2,27,Ask_Spectrum,Ask_Spectrum,29|28|24|21|22|25|26|27,115716|Ask_Spectrum|115716|Ask_Spectrum|115716...,True|False|True|False|True|False|True|False,2017-10-31 22:01:35+00:00|2017-10-31 22:05:37+...,8,True,True,True,True,Negative,Neutral,actually that's a broken link you sent me and ...,@Ask_Spectrum I received this from your corpor...
3,23,115716,Ask_Spectrum,29|28|24|21|23,115716|Ask_Spectrum|115716|Ask_Spectrum|115716,True|False|True|False|True,2017-10-31 22:01:35+00:00|2017-10-31 22:05:37+...,5,True,True,True,True,Negative,Negative,actually that's a broken link you sent me and ...,@Ask_Spectrum The correct way to do it is via ...
4,37,VerizonSupport,VerizonSupport,36|34|35|37,115719|VerizonSupport|115719|VerizonSupport,True|False|True|False,2017-10-31 22:10:46+00:00|2017-10-31 22:13:33+...,4,True,True,True,True,Negative,Very positive,somebody from @VerizonSupport please help meee...,@VerizonSupport I finally got someone that hel...
5,50,VerizonSupport,VerizonSupport,59|58|57|56|55|54|53|52|51|50,115723|VerizonSupport|115723|VerizonSupport|11...,True|False|True|False|True|False|True|False|Tr...,2017-10-31 19:54:51+00:00|2017-10-31 19:57:30+...,10,True,True,True,True,Negative,Neutral|Negative,is the worst ISP I’ve ever had,"@VerizonSupport Don’t know, router is downstai..."
6,65,115728,ChipotleTweets,66|64|65,115728|ChipotleTweets|115728,True|False|True,2017-10-31 22:03:38+00:00|2017-10-31 22:14:28+...,3,True,True,True,True,Neutral,Positive,@ChipotleTweets @28 I don't fit in my Veggie B...,@ChipotleTweets @ChipotleTweets Becky is very ...
7,73,ChipotleTweets,ChipotleTweets,76|75|74|73,115731|ChipotleTweets|115731|ChipotleTweets,True|False|True|False,2017-10-31 20:21:10+00:00|2017-10-31 20:37:31+...,4,True,True,True,True,Neutral,Neutral|Positive,When you're the only one in costume #boorito @...,@ChipotleTweets I had excellent service tonigh...
8,160,ChipotleTweets,ChipotleTweets,163|162|161|160,115737|ChipotleTweets|115737|ChipotleTweets,True|False|True|False,2017-10-31 19:43:47+00:00|2017-10-31 19:51:00+...,4,True,True,True,True,Neutral,Negative|Negative,@ChipotleTweets can I dress up as myself and s...,"@ChipotleTweets Tried, didn't work. How rude :/"
9,179,115744,AskPlayStation,180|178|179,115743|AskPlayStation|115744,True|False|True,2017-10-31 08:17:37+00:00|2017-10-31 22:14:49+...,3,True,True,True,True,Neutral,Neutral,"@AskPlayStation So, what's the november ps plu...",@AskPlayStation Can I get help already??


In [96]:
senti_improve.set_index('tweet_id', inplace = True)
senti_improve["first_senti_avg"] = 0
senti_improve["last_senti_avg"] = 0
senti_improve["senti_improve"] = 0
senti_improve["year_month"] = ""

The function below average_senti will average the sentiment of every sentence of a tweet. There may be more than one sentence per tweet, therefore more than one sentiment score per tweet. If the list is empty returns 2 = Neutral.

In [97]:
senti_value_order = ["Very negative", "Negative", "Neutral", "Positive", "Very positive"]

def average_senti(senti_list_text):
    if type(senti_list_text) == str:
        senti_list = senti_list_text.split("|")
        senti_totalizer = 0
        for sentiment in senti_list:
            senti_totalizer += senti_value_order.index(sentiment)
        return senti_totalizer / len(senti_list)
    else:
        return 2

In [98]:
first_avg_col, last_avg_col, improve_col, year_month_col = senti_improve.columns.get_indexer(["first_senti_avg","last_senti_avg","senti_improve","year_month"])
first_sentiment_col, last_sentiment_col, timestamp_col = senti_improve.columns.get_indexer(["first_sentim_l","last_sentim_l","time_l"])

for row in range(len(senti_improve)):
    first_sentiment, last_sentiment, timestamp_list = senti_improve.iloc[row,[first_sentiment_col,last_sentiment_col, timestamp_col]]
    first_average = average_senti(first_sentiment)
    last_average = average_senti(last_sentiment)
    improvement = last_average - first_average
    year_month = timestamp_list[0:7]
    senti_improve.iloc[row,[first_avg_col,last_avg_col,improve_col,year_month_col ]]= [first_average,last_average, improvement, year_month]

In [99]:
companies = senti_improve.groupby("company_name")
output = companies.agg({"length":"mean", "first_senti_avg":"mean", "last_senti_avg":"mean", "senti_improve":"mean", "verify_thread":"size"})

In [100]:
senti_improve.head()

Unnamed: 0_level_0,author_id,company_name,tweet_l,author_l,inbound_l,time_l,length,verify_thread,verify_time,verify_alternance,inbound_first,first_sentim_l,last_sentim_l,first_tweet_text,last_tweet_text,first_senti_avg,last_senti_avg,senti_improve,year_month
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2,115712,sprintcare,8|6|5|4|3|1|2,115712|sprintcare|115712|sprintcare|115712|spr...,True|False|True|False|True|False|True,2017-10-31 21:45:10+00:00|2017-10-31 21:46:24+...,7,True,True,True,True,Negative,Neutral,@sprintcare is the worst customer service,@sprintcare and how do you propose we do that,1.0,2.0,1.0,2017-10
11,sprintcare,sprintcare,18|17|16|15|12|11,115713|sprintcare|115713|sprintcare|115713|spr...,True|False|True|False|True|False,2017-10-31 19:56:01+00:00|2017-10-31 19:59:13+...,6,True,True,True,True,Neutral|Neutral|Neutral,Neutral|Neutral,y’all lie about your “great” connection. 5 ba...,@sprintcare You gonna magically change your co...,2.0,2.0,0.0,2017-10
27,Ask_Spectrum,Ask_Spectrum,29|28|24|21|22|25|26|27,115716|Ask_Spectrum|115716|Ask_Spectrum|115716...,True|False|True|False|True|False|True|False,2017-10-31 22:01:35+00:00|2017-10-31 22:05:37+...,8,True,True,True,True,Negative,Neutral,actually that's a broken link you sent me and ...,@Ask_Spectrum I received this from your corpor...,1.0,2.0,1.0,2017-10
23,115716,Ask_Spectrum,29|28|24|21|23,115716|Ask_Spectrum|115716|Ask_Spectrum|115716,True|False|True|False|True,2017-10-31 22:01:35+00:00|2017-10-31 22:05:37+...,5,True,True,True,True,Negative,Negative,actually that's a broken link you sent me and ...,@Ask_Spectrum The correct way to do it is via ...,1.0,1.0,0.0,2017-10
37,VerizonSupport,VerizonSupport,36|34|35|37,115719|VerizonSupport|115719|VerizonSupport,True|False|True|False,2017-10-31 22:10:46+00:00|2017-10-31 22:13:33+...,4,True,True,True,True,Negative,Very positive,somebody from @VerizonSupport please help meee...,@VerizonSupport I finally got someone that hel...,1.0,4.0,3.0,2017-10


In [101]:
senti_improve.to_csv(r'senti_improve_company_yearmonth.csv')  

In [102]:
output.to_csv(r'summary.csv')

In [104]:
companies_months = senti_improve.groupby(["company_name","year_month"])
output = companies_months.agg({"length":"mean", "first_senti_avg":"mean", "last_senti_avg":"mean", "senti_improve":"mean", "verify_thread":"size"})

In [106]:
output.to_csv(r'summary month.csv')