## Sentiment Scoring

This chapter shows the whole process that **how we score the sentiment for each weibo**, containing 3 parts, Word Dictionary,  Class of City Weibo Sentiment Scoring: City_Weibo, and City Weibo Sentiment Scoring. 

The first 2 parts are about the preparation for scoring process and the third part generates the sentiment scores of each city successfully

In [1]:
%matplotlib inline
import graphviz
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as col
import sys
import jieba
np.set_printoptions(suppress=True, precision = 3)

# Part 1. Words Dictionary 

This part will design the words dictionary that we will use in furhter sentiment scoring, including 4 words dictionaries, 
**level (adverbs of degree), positive sentiments, negative sentiments, and deny words** 

All words dictionaries are from National Taiwan University, Tsinghua University and 知网Hownet

National Taiwan University (NTUSD): "NTUSD_negative_simplified.txt" and "NTUSD_positive_simplified.txt".

Tsinghua University: "tsinghua.negative.gb.txt" and "tsinghua.positive.gb.txt".

知网Hownet: "程度级别词语（中文）.txt", "否定词（中文）.txt","负面评价词语（中文）.txt","负面情感词语（中文）.txt","正面评价词语（中文）.txt", and "正面情感词语（中文）.txt".

## Part 1.1 Input Level words dictionary (six levels)

Step 1: Read the level degree words in txt file as a list, called **level_listlines** 

Step 2: There are 6 levels, to record the index range for each level. Recording the begining index for each level at first, called **level_cate_index** 

Step 3: Then recording the Chinese Name of the adverb level as level dictionrary keys (**dict_level**), called **level_cate**

Step 4: Use the index range for each level, generating the list of all the words in each level and set them as the values of the level dictionary corresponding to their level

In [2]:
with open("程度级别词语（中文）.txt","r+",encoding = 'gb18030') as fo_Level:
    level_listlines = fo_Level.readlines() # Read the level degree words in txt file as a list
level_cate_index = []
for word in level_listlines:
    if(word[0]) in ['1','2','3','4','5','6']:
        level_cate_index.append(level_listlines.index(word)) #There are 6 levels, to record the index range for each level. Recording the begining index for each level at first
#level_cate_index
level_cate = [level_listlines[i][4] for i in level_cate_index] #Then recording the Chinese Name of the adverb level as level dictionrary keys
#level_cate
dict_level = {}
for i in range(len(level_cate)-1):
    dict_level[level_cate[i]] = [level_listlines[j][:-1] for j in range(level_cate_index[i]+1,level_cate_index[i+1])][:-1]
    #Use the index range for each level, generating the list of all the words in each level and set them as the values of the level dictionary corresponding to their level
dict_level[level_cate[-1]] = [level_listlines[j][:-1] for j in range(level_cate_index[-1]+1,len(level_listlines))][:-1]
dict_level ##Level words dictionary

{'极': ['百分之百',
  '倍加',
  '备至',
  '不得了',
  '不堪',
  '不可开交',
  '不亦乐乎',
  '不折不扣',
  '彻头彻尾',
  '充分',
  '到头',
  '地地道道',
  '非常',
  '极',
  '极度',
  '极端',
  '极其',
  '极为',
  '截然',
  '尽',
  '惊人地',
  '绝',
  '绝顶',
  '绝对',
  '绝对化',
  '刻骨',
  '酷',
  '满',
  '满贯',
  '满心',
  '莫大',
  '奇',
  '入骨',
  '甚为',
  '十二分',
  '十分',
  '十足',
  '死',
  '滔天',
  '痛',
  '透',
  '完全',
  '完完全全',
  '万',
  '万般',
  '万分',
  '万万',
  '无比',
  '无度',
  '无可估量',
  '无以复加',
  '无以伦比',
  '要命',
  '要死',
  '已极',
  '已甚',
  '异常',
  '逾常',
  '贼',
  '之极',
  '之至',
  '至极',
  '卓绝',
  '最为',
  '佼佼',
  '郅',
  '綦',
  '齁',
  '最'],
 '很': ['不过',
  '不少',
  '不胜',
  '惨',
  '沉',
  '沉沉',
  '出奇',
  '大为',
  '多',
  '多多',
  '多加',
  '多么',
  '分外',
  '格外',
  '够瞧的',
  '够戗',
  '好',
  '好不',
  '何等',
  '很',
  '很是',
  '坏',
  '可',
  '老',
  '老大',
  '良',
  '颇',
  '颇为',
  '甚',
  '实在',
  '太',
  '太甚',
  '特',
  '特别',
  '尤',
  '尤其',
  '尤为',
  '尤以',
  '远',
  '着实',
  '曷',
  '碜'],
 '较': ['大不了',
  '多',
  '更',
  '更加',
  '更进一步',
  '更为',
  '还',
  '还要',
  '较',
  '较比',
  '较为',
  '进一步',
  '那般'

## Part 1.2 Input  sentiment words dictionary

Define the function **Union_dict** to union all the postive/negative sentiment dictionary we have, enlarging the words volume for our words dictionary

Step 1 : Input each dictionary and append them into a list, called **li_fo_li**

Step 2 : Union all dictionaries 

Step 3 : Output the words dictionary as a list, called **sent_list**

In [3]:
def Union_dict(li_file_name, li_encode):
    li_fo_li = []
    for i in range(len(li_file_name)): # input each dictionary 
        with open(li_file_name[i]+".txt","r+",encoding = li_encode[i]) as fo_i:
            fo_i_listlines = fo_i.readlines()
        li_fo_li.append(fo_i_listlines)
    
    # change list to set after doing the previous clean (some marks and change row notation "\n")
    sent1_set = set([word[:-2] for word in li_fo_li[0][2:]])
    sent2_set = set([word[:-2] for word in li_fo_li[1][2:]])
    sent3_set = set([word[:-1] for word in li_fo_li[2]])
    sent4_set = set([word[:-1] for word in li_fo_li[3]])
    
    # Union all dictionaries 
    sent_set = sent1_set | sent2_set 
    sent_set = sent_set | sent3_set
    sent_set = sent_set | sent4_set
    sent_set = sent_set - {''}
    sent_list = list(sent_set) # Output the words dictionary as a list  
    return (sent_list)  

### Part 1.2.1 Negative words dictionary

In [4]:
li_neg_name = ["负面评价词语（中文）","负面情感词语（中文）","NTUSD_negative_simplified","tsinghua.negative.gb"]
li_neg_encode = ['gb18030','gb18030','utf-8','gb18030']

neg_list = Union_dict(li_neg_name,li_neg_encode)
neg_list

['交战',
 '不快',
 '不亲切',
 '自惭形秽',
 '没觉察到',
 '变黑',
 '白区',
 '被抑制状态',
 '拘执',
 '桀纣',
 '暗地',
 '代罪羔羊',
 '愧对',
 '丢丑的人',
 '惶惧',
 '恶念',
 '贱男人',
 '落泪',
 '逆',
 '阴湿',
 '爱挑剔',
 '不甘',
 '沉吟',
 '极度悲哀',
 '无动於衷',
 '私底下',
 '虚设',
 '别有用心',
 '刮掉',
 '三天打鱼两天晒网',
 '诡谲',
 '吝啬鬼',
 '溃不成军',
 '挖掘地基',
 '苟延残喘',
 '真假可疑的',
 '有差别地对待',
 '垂涎',
 '惹恼',
 '落後',
 '害虫',
 '重现',
 '世故',
 '贪婪的',
 '从中作梗',
 '陋规',
 '憎恨',
 '卖弄风骚的人',
 '鲁莽的',
 '眩惑',
 '舛',
 '嘲弄',
 '黑茫茫',
 '风浪',
 '精神错乱',
 '闷沉沉',
 '荒凉的',
 '无计划',
 '搁下',
 '二流子',
 '消耗性',
 '无表情',
 '极端',
 '撬动',
 '恶运',
 '自求多福',
 '昏迷',
 '负面',
 '愁肠',
 '亵渎',
 '机八',
 '占着',
 '暴徒',
 '耗尽的',
 '歪歪扭扭',
 '指鸡骂狗',
 '打偏',
 '不习惯的',
 '低声',
 '粗糙',
 '疯狂地',
 '妄言',
 '作秀',
 '嘴皮子',
 '无深虑的',
 '找碴',
 '气壮如牛',
 '懔',
 '痛心',
 '幺麽',
 '冤仇',
 '大杂烩',
 '庸碌',
 '免职',
 '饥饿的',
 '乖僻',
 '犹豫',
 '衰颓',
 '抱恨终天',
 '弱智',
 '噘起',
 '失色',
 '基卖',
 '求饶',
 '骗人的',
 '凶悍',
 '拒绝承认',
 '骂出来',
 '浪费时间',
 '荒芜',
 '混事',
 '夹痛',
 '生疑',
 '撕破脸',
 '窘困',
 '自我中心',
 '谎报',
 '眼泪',
 '腥臊',
 '发出撞击声',
 '夷平',
 '惯犯',
 '纷杂',
 '逃走的人',
 '艰深',
 '掴..耳光',
 '伪',
 '岌岌不可终日',
 '一贫如洗',
 

In [5]:
len(neg_list)

13765

There are total **13765** negative sentiment words in our negative words dictionary (**neg_list**)

### Part 1.2.2 Positive words dictionary

In [6]:
li_pos_name = ["正面评价词语（中文）","正面情感词语（中文）","NTUSD_positive_simplified","tsinghua.positive.gb"]
li_pos_encode = ['gb18030','gb18030','utf-8','gb18030']

pos_list = Union_dict(li_pos_name,li_pos_encode)
pos_list

['广博',
 '料事如神',
 '熟识',
 '爽直',
 '硬棒',
 '决断',
 '胆力',
 '义士',
 '准确',
 '报答的',
 '春潮',
 '有能力的',
 '盯紧',
 '美化',
 '有用的',
 '全盛',
 '康强',
 '雅致',
 '一致的',
 '洞明',
 '欢欣鼓舞',
 '如花似玉',
 '孝子',
 '同心协力',
 '侠气',
 '垂涎',
 '大雅',
 '发展',
 '奖牌',
 '乖',
 '纪念品',
 '够格',
 '太好',
 '崭新的',
 '口快',
 '讽诵',
 '扶弱抑强',
 '直截了当',
 '明了',
 '酣畅',
 '纪念',
 '膏腴',
 '自拔',
 '绝妙',
 '敞开儿',
 '叹',
 '教坛',
 '创演',
 '佳期',
 '精采',
 '快马加鞭',
 '典型性',
 '赤色',
 '载歌载舞',
 '叮咛',
 '想望',
 '灿烂夺目',
 '刚勇',
 '前驱',
 '有望',
 '秀巧',
 '翕',
 '懔',
 '有能力',
 '察访',
 '学成',
 '嫩绿',
 '鼎助',
 '才气横溢',
 '勇敢的事迹',
 '大公无私',
 '合法的',
 '至尊',
 '精进',
 '盈溢',
 '欢庆胜利',
 '精力充沛的',
 '宁为玉碎',
 '凶悍',
 '慎',
 '自由自在',
 '匡助',
 '动人心弦',
 '好奇的',
 '发现',
 '婉媚',
 '饶有余韵',
 '深',
 '神勇',
 '抚养',
 '沃土',
 '超凡技术',
 '服服帖帖',
 '慈悲为怀',
 '亲热亲热',
 '送往迎来',
 '满勤',
 '劲草',
 '康平',
 '楷式',
 '实则',
 '豁免',
 '春色满园',
 '伟岸',
 '凤凰',
 '豪华',
 '免除的',
 '浩大',
 '有板有眼',
 '善理家',
 '同仇敌忾',
 '强劲',
 '心血',
 '进益',
 '戏剧化',
 '高峰',
 '屏声息气',
 '拓展',
 '执著',
 '诗兴',
 '报效',
 '璀璨夺目',
 '瑞雪',
 '巧扮',
 '恒心',
 '筹办',
 '意境',
 '发育',
 '才略过人',
 '秀外惠中',
 '咏赞',
 '时尚',
 '雅趣'

In [7]:
len(pos_list)

10219

There are total **10219** positive sentiment words in our positive words dictionary (**pos_list**)

## Part  1.3 Input Deny words dictionary

In [8]:
with open("否定词（中文）.txt","r+",encoding = 'utf-8') as fo_inv:
    inv_listlines = fo_inv.readlines()
inv_list = [word[:-1] for word in inv_listlines]
inv_list

['不',
 '不是',
 '没',
 '没有',
 '无',
 '非',
 '莫 ',
 '弗',
 '毋',
 '未',
 '否',
 '别',
 '无',
 '不够',
 '不是',
 '不曾',
 '未必',
 '不要',
 '难以',
 '未']

# Part 2. Class of City Weibo Sentiment Scoring: City_Weibo

This part is also one of the preparation, that define a class especially for scoring sentiment. The class is called **City_Weibo**

The object generated from this class represent a object of city with its weibo information, and it also contains some transformation methods that can do the text analysis, processing the weibo text to sentiment score. More details can be seen as below:

### Part 2.1 City_Weibo Attributes:
There are 5 attributes for each object generated from this class.

### _"Private"  Attributes:_

**Attribute 1, dict_level**: The level dictionary (adverbs of degree), the same as previous

**Attribute 2, neg_list**: The negative sentiment dictionary, the same as previous

**Attribute 3, pos_list**: The positive sentiment dictionary, the same as previous

**Attribute 4, inv_list**: The deny words dictionary, the same as previous

### _"Public" Attributes:_

**Attribute 5, df_City_weibo**: The pandas dataframe that contains all weibo information from Jan 1st to Mar 25th, read from Excel (crawed in previous documentation)

### Part 2.2 City_Weibo Methods:
Except constructor, there are 7 methods for each object generated from this class.

### _"Private" Methods:_

**__init__**: Constructor, initialize the 5 attributes for object

**Method 1, judgeodd**: This method is to count the number of deny words between the last sentiment word and present sentiment. We suppose that the odd number of deny words will inverse the positive to negative or negative to positive, the even number of deny words will not change the original sentiment. Of course **judgeodd** method will be used in **sentence_scoring**.

**Method 2, sentence_scoring**: This method is the **key of sentiment analysis**, it requires one sentence (1 weibo) as the input and will give both positive sentiment score and negative sentiment score of this input sentence. The algorithm is based on **iteration** for the whole sentence and the details of this method can be seen below in comments.

**Method 3, label_score**: After obtaining the postive score and negative score of one sentence, we can use this method to find out the sentiment score and sentiment label for corresponding sentence (weibo), where sentiment score = positive score - negative score and sentiment label = 1 if sentiment score > 0, sentiment label = -1 if sentiment score < 0, else sentiment label = 0.



### _"Public" Methods:_
**Method 4, showDF**: This is to return **df_City_weibo**

**Method 5, data_clean**: This method will be first called after the constructor, doing the data cleaning for the attribute **df_City_weibo**. It will delete the repeated weibo（which some are repeated because of unexpanded, called "未展开"）, and the date as the index.

**Method 6, output_weibo_score_summary**: This method will output the pandas dataframe containing all the information,correponding to the city object, for each weibo and its positive score, negative score, sentiment socre, and sentiment label. That means, it will keep calling the methods **sentence_scoring** and **label_score** with a iteration for all weibo sentences in that city.

**Method 7, output_daily_score_summary**: This method requires the output of **output_weibo_score_summary** as input. It will calculate a kind of mean of sentiment score for each day. That means for each day, there will be lots of weibo and each weibo will have a sentiment label (calculated by method 6), this method will calculate the mean of all sentiment labels  in that corresponding day (**groupby** Date), and regarding this mean as the daily sentiment score for that city. This will standardize the daily sentiment score within -1 to 1 (closer to -1 means more negative and closer to 1 means more positive). And the labeling for each day's sentiment is the same as before, basing on whether sentiment score is positive or negative.


In [9]:
class City_Weibo:
    
    import numpy as np
    import pandas as pd
    import jieba
    
    def __init__(self, excel_name, li_words_dictionary): # Constructor: initialize the 4 attributes
        self.dict_level = li_words_dictionary[0]
        self.neg_list = li_words_dictionary[1]
        self.pos_list = li_words_dictionary[2]
        self.inv_list = li_words_dictionary[3]
        
        self.df_City_weibo = pd.read_excel(excel_name)
    
    def data_clean(self): # delete the repeated weibo（which some are repeated because of unexpanded, called "未展开"）
        for i in range(self.df_City_weibo.shape[0]):
            weibo_day = self.df_City_weibo.loc[i].dropna()
            for j in range(len(weibo_day)-1):
                if weibo_day[j][:5] == weibo_day[j+1][:5]: # This judgement is to judge whether the weibo is repeated. As the weibo only 
                                                            # repeat is adjacent, while the 1st one is unexpended and the 2nd one is what we
                                                            # need. We regard the first five words repeated as the same weibo.
                                                        
                    weibo_day[j] = np.nan #  so we set NaN to the previous unexpanded repeated weibo.
                self.df_City_weibo.loc[i] = weibo_day
        self.df_City_weibo = self.df_City_weibo.set_index("date")
    
    def showDF(self):
        return(self.df_City_weibo)
    
    def judgeodd(self,num): # Judge whether num is odd or even
        if (num % 2) == 0:
            return 'even'
        else:
            return 'odd'
    
    def sentence_scoring(self,sentence):
        li_word = jieba.lcut(sentence, cut_all = False)
        i = 0 # recording the location of the word we are at the ergodic 
        a = 0 # recording the location of the latest sentiment word we have found 
        poscount = 0 # the score of particular 1 positive word 
        poscount2 = 0 # the inverse score of particular 1 positive word 
        poscount3 = 0 # the positive score of the whole sentence
        negcount = 0 # the score of particular 1 negative word 
        negcount2 = 0 # the inverse score of particular 1 negative word
        negcount3 = 0 # the negative score of the whole sentence
        for word in li_word:
            # Only sentiment words will be discussed

            if word in self.pos_list: # whether positivme sentiment word
                poscount += 1
                c = 0 # recording number of deny words 
            
                for w in li_word[a:i]: # why a to i: a is the index of last sentimenr word and i is the index of current sentiment word,
                                       # so we consider the adverbs degree and deny word between last sentimenr word and current sentiment  
                                       # word is describing for the current sentiment word
                    # the weight of each degree of adverb can be seen in report
                    if w in self.dict_level['极']:
                        poscount *= 4
                    elif w in self.dict_level['很']:
                        poscount *= 3
                    elif w in self.dict_level['较']:
                        poscount *= 2
                    elif w in self.dict_level['稍']:
                        poscount *= 0.5
                    elif w in self.dict_level['欠']:
                        poscount *= -2
                    elif w in self.dict_level['超']:
                        poscount *= -4
                    elif w in self.inv_list:
                        c += 1
                if self.judgeodd(c) == 'odd': # judge whether the number of deny words is odd or even
                    poscount *= -1.0 # if odd, that means the positive meanning will be inversed to negative, so we multiple -1
                    poscount2 += poscount # the poscount2 is used when inversed
                    poscount = 0 # as we said poscount is the score of particular 1 positive word, the current one, now it has been passed
                                 # to poscount2, so set it to 0 for the future positive word. 
                    poscount3 = poscount + poscount2 + poscount3 # Add it into the cumulative sentence positive sentiment recording

                    poscount2 = 0 # poscount 2 is also for 1 particular word, now it has been used, so also set it to 0 for future word.
                else:
                    # if even, that means it is positive 
                    poscount3 = poscount + poscount2 + poscount3 # directly add it into cumulative sentence positive sentiment recording

                    poscount = 0 #as we said poscount is the score of particular 1 positive word, now it has been used,
                                #so also set it to 0 for future positive word.
                a = i+1 #as the index searching in python is left close but right open, to aviod searching last sentiment word again,
                        # we move the index of last sentiment forward 1 step.
                
            # the procedure for negative sentiment words is exactly the same as positve sentiment words.
            elif word in self.neg_list: # whether negative sentiment word
                negcount += 1
                d = 0 # recording number of deny words 
                for w in li_word[a:i]:# why a to i: a is the index of last sentimenr word and i is the index of current sentiment word,
                                       # so we consider the adverbs between last sentimenr word and current sentiment word is describing for
                                       # the current sentiment word
                    if w in self.dict_level['极']:
                        negcount *= 4
                    elif w in self.dict_level['很']:
                        negcount *= 3
                    elif w in self.dict_level['较']:
                        negcount *= 2
                    elif w in self.dict_level['稍']:
                        negcount *= 0.5
                    elif w in self.dict_level['欠']:
                        negcount *= -2
                    elif w in self.dict_level['超']:
                        negcount *= -4
                    elif w in self.inv_list:
                        d += 1
                if self.judgeodd(d) == 'odd': # judge whether the number of deny words is odd or even
                    negcount *= -1.0 #if odd, that means the negative meanning will be inversed to positive, so we multiple -1
                    negcount2 += negcount # the negcount2 is used when inversed
                    negcount = 0 # as we said negcount is the score of particular 1 negative word, the current one, now it has been passed
                                 # to negcount2, so set it to 0 for the future negative word. 
                    negcount3 = negcount + negcount2 + negcount3 # Add it into the cumulative sentence negative sentiment recording
                    negcount2 = 0 # negcount 2 is also for 1 particular word, now it has been used, so also set it to 0 for future word.
                else:
                    negcount3 = negcount + negcount2 + negcount3 # directly add it into cumulative sentence negative sentiment recording
                    negcount = 0 #as we said negcount is the score of particular 1 negative word, now it has been used,
                                #so also set it to 0 for future negative word.
                a = i+1 #as the index searching in python is left close but right open, to aviod searching last sentiment word again,
                        # we move the index of last sentiment forward 1 step.
            i += 1 # no matter what kind of this word is, after this word, before searching to the next word,
                   #the index of current word needs to plus 1. 

        count1=[poscount3,negcount3] # finally return the postive score and negative score of this sentence (weibo), both of them are cumulative.
        return count1 
    
    def label_score(self,score):
        diff_score = score[0]-score[1] # sentiment score = positive socre - negative score
        if diff_score > 0: # if sentiment score > 0, sentimenr label = 1
            lab = 1 
        elif diff_score < 0: # if sentiment score < 0, sentimenr label = -1
            lab = -1
        else: #otherwise sentimenr label = 0
            lab = 0 
        return [diff_score ,lab]
    
    def output_weibo_score_summary(self):
        # initialize the dicationary and its values (6 lists)
        # corresponding to 6 keys, Data, Weibo text, positive score, negative score, sentiment score and sentiment label
        dict_City = {}
        dict_City["Date"] = []
        dict_City["Weibo"] = []
        dict_City["Positive Score"] = []
        dict_City["Negative Score"] = []
        dict_City["Sentiment Score"] = []
        dict_City["Sentiment Label"] = []

        for i in range(self.df_City_weibo.shape[0]): # this is for i-th row, representing i-th day
            dayweibo_i = self.df_City_weibo.iloc[i].dropna()
            date_i = list(self.df_City_weibo.index)[i]

            for j in range(dayweibo_i.shape[0]): # this is for j-th columnm in i-th row, representing j-th weibo in i-th day 
                wb_i_j = dayweibo_i.iloc[j] #j-th weibo in i-th day is called wb_i_j
                wb_score_i_j = self.sentence_scoring(wb_i_j) # output the positive score and negative score of thie weribo into wb_score_i_j
                wb_senti_i_j = self.label_score(wb_score_i_j) # output the sentiment score and sentiment label of thie weribo into wb_senti_i_j
                dict_City["Date"].append(date_i)
                dict_City["Weibo"].append(wb_i_j )
                dict_City["Positive Score"].append(wb_score_i_j[0])
                dict_City["Negative Score"].append(wb_score_i_j[1])
                dict_City["Sentiment Score"].append(wb_senti_i_j[0])
                dict_City["Sentiment Label"].append(wb_senti_i_j[1])

        df_City_senti = pd.DataFrame(dict_City).set_index("Date")
        #self.df_City_senti = df_City_senti
        return (df_City_senti)
    
    def output_daily_score_summary(self, df_City_weibo_senti):
        df_City_daily_score = pd.DataFrame(df_City_weibo_senti.groupby("Date").mean()["Sentiment Label"]) 
        #This is the groupby method of pandas dataframe,so that calculating the daily sentiment score
                                                                          
        df_City_daily_score.columns = ["Sentiment Score"]
        arr_all_1 = np.ones(df_City_daily_score.shape[0])
        df_City_daily_score["Sentiment Label"] =  arr_all_1 # initial daily sentiment label all to be 1
        for i in range(df_City_daily_score.shape[0]): # use iteration by each day, modifying the daily label if the daily sentiment score is negative or neutral

            if df_City_daily_score["Sentiment Score"].iloc[i] < 0:
                df_City_daily_score["Sentiment Label"].iloc[i] = -1
            elif df_City_daily_score["Sentiment Score"].iloc[i] == 0:
                df_City_daily_score["Sentiment Label"].iloc[i] = 0
        #self.df_City_daily_score = df_City_daily_score
        return(df_City_daily_score)

# Part 3.  Weibo Sentiment Scoring

In this part we will obatain the dataframe of the sentiment information both for each weibo and for each day

Step 1: Genreate the object for each city from City_Weibo class by inputing their weibo text data. The objects of each city are called **Obj_BJ, Obj_SH, Obj_GZ, and Obj_SZ**

Step 2: Call **data_clean** for each city weibo information object

Step 3: Call **output_weibo_score_summary** for each city weibo information object, so that obtaining the sentiment informaton for each weibo. We displey these dataframes and save them into excel, for other group members to do further analysis.

Step 4: Call **output_daily_score_summary** for each city weibo information object, so that obtaining the sentiment informaton for each day. We displey these dataframes and save them into excel, for other group members to do further analysis.

### Part 3.1 For each weibo

In [10]:
li_words_dict = [dict_level, neg_list,pos_list,inv_list]
Obj_BJ = City_Weibo("北京疫情-20220101-20220325.xls",li_words_dict)
Obj_SH = City_Weibo("上海疫情-20220101-20220325.xls",li_words_dict)
Obj_GZ = City_Weibo("广州疫情-20220101-20220325.xls",li_words_dict)
Obj_SZ = City_Weibo("深圳疫情-20220101-20220325.xls",li_words_dict)

In [11]:
Obj_BJ.data_clean()
df_BJ_weibo_Senti = Obj_BJ.output_weibo_score_summary()
df_BJ_weibo_Senti

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\hp\AppData\Local\Temp\jieba.cache
Loading model cost 0.849 seconds.
Prefix dict has been built successfully.


Unnamed: 0_level_0,Weibo,Positive Score,Negative Score,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-01-01,打卡了航旅纵横看到了去年的记录疫情当前但是去年确实飞的不少经历了国内最长的一次飞行北京到喀什...,7.0,1.0,6.0,1
2022-01-01,昆明富民发布今起宁波进京航班全部取消恢复时间另行通知据宁波机场消息受疫情影响年月日宁波飞往北...,12.0,4.0,8.0,1
2022-01-01,今日头秃计算了好久好久和同去澳门的商量了好久怎样能在核酸检测小时内保证落地澳门的时间也在内去...,9.0,6.5,2.5,1
2022-01-01,是娜小娜丫疫情赶快结束希望摘掉口罩去一趟北京平安喜乐一听就是一个很浪漫的年份对吧爱你爱你爱你...,14.0,1.0,13.0,1
2022-01-01,肝为人先我说疫情会结束这个不是希望疫情结束也不是觉得疫情会结束也不是认为疫情应该结束了是年疫...,9.0,11.0,-2.0,-1
...,...,...,...,...,...
2022-03-25,文明北京文明祭扫疫情防控弦紧绷要去祭扫先预约又逢一年清明至慎终追远祭先人北京市清明指挥部提醒...,27.0,4.0,23.0,1
2022-03-25,沙野轻食官方优秀的人难掩光芒优秀的店也如此呀疫情下的北京东直门店在疫情之下的东直门店也保持着...,4.0,4.0,0.0,0
2022-03-25,中科联中科联中国小康网北京疫情仍有新增月日北京疫情最新消息今天月日时至时无新增本土确诊病例疑...,8.0,13.0,-5.0,-1
2022-03-25,劳动午报午报关注疫情北京疫情北京一例感染者曾临时居住交道口北头条号已拉起警戒线月日东城区通报...,5.0,4.0,1.0,1


In [12]:
Obj_SH.data_clean()
df_SH_weibo_Senti = Obj_SH.output_weibo_score_summary()
df_SH_weibo_Senti

Unnamed: 0_level_0,Weibo,Positive Score,Negative Score,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-01-01,大胖洁大事记月薅了疫情的羊毛去了上海苏州鼋头渚看了月认识了一个过了注会的学霸会受这个老师的影...,12.0,7.0,5.0,1
2022-01-01,快乐小狗耶嘿的遗憾好几次计划去上海的行程都被疫情拦下了因此错失了几部很想看的音乐剧暑假和年底...,35.0,6.0,29.0,1
2022-01-01,招财进宝小熊仔我觉得上海的魔都名号应该换给西安魔幻现实主义都市西安疫情以来两年了还能一夜梦回...,7.0,4.0,3.0,1
2022-01-01,亦素公子上海今年冬天真是太冷了再加上疫情衰衰只想懒懒地在室内吹空调探店干饭计划也搁置小伙伴们...,1.0,1.0,0.0,0
2022-01-01,车报导北京同步开通条段地铁新线总运营里程达公里年起北京全面禁放烟花爆竹大力打击非法存放销售等...,13.0,4.0,9.0,1
...,...,...,...,...,...
2022-03-25,遗落的叁行情书上海疫情今天有点累满负荷的工作了一天晚上七点多到家洗菜做饭吃饭洗碗等把厨房收拾...,37.0,12.0,25.0,1
2022-03-25,力禾力禾上海的疫情防控效果让人看不出一点精准和科学,3.5,1.0,2.5,1
2022-03-25,小红帽的后妈上海疫情被封天问居委会什么时候解封居委会说等上面通知我不知道上面是谁我也不知道为...,3.0,13.0,-10.0,-1
2022-03-25,财张江上海疫情封控下的居民朋友们是不是小区里有几朵花几棵树都快数遍啦好想再回到月初时在滨江森...,5.0,2.0,3.0,1


In [13]:
Obj_GZ.data_clean()
df_GZ_weibo_Senti = Obj_GZ.output_weibo_score_summary()
df_GZ_weibo_Senti

Unnamed: 0_level_0,Weibo,Positive Score,Negative Score,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-01-01,儿花新年第一跳第一次这么期待元旦心心从来没有这么大强度的上课虽然全程放炮关是记动作就去了我半...,13.0,8.0,5.0,1
2022-01-01,片甲不刘广州疫情现在广州是完全解放了吗我们公司感觉无所畏惧号将近两百人聚在一块搞培训还不允许...,8.0,-1.0,9.0,1
2022-01-01,渡边星美再见啦充实的还记得去年元旦写下的新年愿望大部分都实现了吧去了好多城市旅拍三月的三亚四...,28.0,16.0,12.0,1
2022-01-01,很酷回家路上因为是拼车司机去机场接人在机场门口我在想到底什么时候我才能回广州或者去其他地方玩...,4.0,3.0,1.0,1
2022-01-01,少吃零食希望疫情真的不要再在广州反反复复的了我想我的朋友,3.0,3.0,0.0,0
...,...,...,...,...,...
2022-03-25,中国市场监管报广州公布涉疫情防控违法典型案例月日广东省广州市市场监管局公布全市查办违反市场监...,12.0,15.0,-3.0,-1
2022-03-25,广东建设职业技术学院防疫抗疫进行时广州校区开展实战抗疫演练月日下午广州校区开展疫情防控应急处...,6.0,7.0,-1.0,-1
2022-03-25,小糖罐儿再不解封姐要疯了微笑微笑微笑微笑微笑微笑广州的疫情到底是有多严重要把所有人都关着微笑...,3.0,4.0,-1.0,-1
2022-03-25,成永明教授容易得前列腺炎的人一般都是免疫比较低下的人炎弟就是免疫力低下的男性多发身体多处炎症...,5.0,10.0,-5.0,-1


In [14]:
Obj_SZ.data_clean()
df_SZ_weibo_Senti = Obj_SZ.output_weibo_score_summary()
df_SZ_weibo_Senti

Unnamed: 0_level_0,Weibo,Positive Score,Negative Score,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-01-01,深圳新闻网疫情浙江宁波新增例确诊浙江宁波北仑区新增例新冠肺炎确诊病例年月日时分浙江省宁波市召...,6.0,10.0,-4.0,-1
2022-01-01,沧海心岛年第一条微博的心愿愿疫情快点结束可以自由穿行于合各座城市拉上行李去南京去深圳去大东北...,5.0,1.0,4.0,1
2022-01-01,怡钱锟锟哥生日快乐新的一年你愿望成真一切顺利希望疫情早日好转我们能在国内的演唱会上见心心威神...,9.0,1.0,8.0,1
2022-01-01,一听罐头在家过的第一天回家就变成现充冷面鸡可以卖完冰粉和熏鱼不能少很可爱的章鱼抱回家养了所以...,9.0,3.0,6.0,1
2022-01-01,鱼粥夹心糯叽叽挤吧挤吧下次你看布莱恩还帮不帮他接深圳行程他还没戴口罩疫情期间往他身上挤什么还...,2.0,5.0,-3.0,-1
...,...,...,...,...,...
2022-03-25,大菇噜噜想念那些可以随心所欲出门踏青的日子送花花送花花深圳疫情快点过去行吗深圳,2.0,2.0,0.0,0
2022-03-25,会打代码的扫地王大爷我年后初八寄去深圳换的智能门铃今天终于寄新的过来了该事件标志着深圳疫情的...,4.0,2.0,2.0,1
2022-03-25,橘猫少女深圳疫情青春才几年疫情占三年所有计划都变成了等疫情结束以后深圳,1.0,3.0,-2.0,-1
2022-03-25,十点吧就睡计划赶不上变化泪泪原本计划去深圳玩结果发现疫情有点严重那就改去广州吃早茶但是深圳跟...,4.0,5.5,-1.5,-1


In [15]:
df_BJ_weibo_Senti.to_excel("Beijing_weibo_sentiment.xlsx")
df_SH_weibo_Senti.to_excel("Shanghai_weibo_sentiment.xlsx")
df_GZ_weibo_Senti.to_excel("Guangzhou_weibo_sentiment.xlsx")
df_SZ_weibo_Senti.to_excel("Shenzhen_weibo_sentiment.xlsx")

### Part 3.2 For each day

In [16]:
df_BJ_daily_Senti = Obj_BJ.output_daily_score_summary(df_BJ_weibo_Senti)
df_BJ_daily_Senti

Unnamed: 0_level_0,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,0.473684,1.0
2022-01-02,0.252525,1.0
2022-01-03,0.242424,1.0
2022-01-04,0.160000,1.0
2022-01-05,0.333333,1.0
...,...,...
2022-03-21,0.204082,1.0
2022-03-22,0.040404,1.0
2022-03-23,0.242424,1.0
2022-03-24,0.191919,1.0


In [17]:
df_SH_daily_Senti = Obj_SH.output_daily_score_summary(df_SH_weibo_Senti)
df_SH_daily_Senti

Unnamed: 0_level_0,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,0.446809,1.0
2022-01-02,0.111111,1.0
2022-01-03,0.096774,1.0
2022-01-04,0.157895,1.0
2022-01-05,0.114583,1.0
...,...,...
2022-03-21,0.030000,1.0
2022-03-22,-0.220000,-1.0
2022-03-23,-0.224490,-1.0
2022-03-24,-0.103093,-1.0


In [18]:
df_GZ_daily_Senti = Obj_GZ.output_daily_score_summary(df_GZ_weibo_Senti)
df_GZ_daily_Senti

Unnamed: 0_level_0,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,0.656250,1.0
2022-01-02,0.390805,1.0
2022-01-03,0.266667,1.0
2022-01-04,0.152778,1.0
2022-01-05,0.136842,1.0
...,...,...
2022-03-21,0.260870,1.0
2022-03-22,0.129032,1.0
2022-03-23,0.010204,1.0
2022-03-24,-0.020833,-1.0


In [19]:
df_SZ_daily_Senti = Obj_SZ.output_daily_score_summary(df_SZ_weibo_Senti)
df_SZ_daily_Senti

Unnamed: 0_level_0,Sentiment Score,Sentiment Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,0.375000,1.0
2022-01-02,0.444444,1.0
2022-01-03,0.304348,1.0
2022-01-04,0.338983,1.0
2022-01-05,0.230769,1.0
...,...,...
2022-03-21,0.102041,1.0
2022-03-22,0.185567,1.0
2022-03-23,0.020000,1.0
2022-03-24,0.204082,1.0


In [20]:
df_BJ_daily_Senti.to_excel("Beijing_daily_sentiment.xlsx")
df_SH_daily_Senti.to_excel("Shanghai_daily_sentiment.xlsx")
df_GZ_daily_Senti.to_excel("Guangzhou_daily_sentiment.xlsx")
df_SZ_daily_Senti.to_excel("Shenzhen_daily_sentiment.xlsx")