## Network Analysis of Twitter User on the 2019 Hong Kong Protest Movement 

In the summer of 2019, lacking confidence in China’s judiciary system and human rights protection due to its history of suppressing political dissent, Hong Kong citizens demonstrated against the local government led by Chief Executive Carrier Lam to legalize the extradition of suspects to mainland China. Following an escalation in the severity of policing tactics on 12 June 2019, the protester’s objectives were becoming the five demands: fully withdraw the extradition bill; implement universal suffrage in Hong Kong, set up an inquiry to probe police brutality; withdraw a characterization of early protests as “riots and release those arrested at protest. 

Aligned with the decentralized nature of network-movement, Hong Kong 2019 protests are characterized as “formless, shapeless, like water”, as the hardcore protesters organizing in small cells with no formal hierarchy and crowdsourcing their tactics and slogans through social media in a highly dispersed way. This phenomenon is accordance with the theory developed by Swann (2020) in the book of Anarchist Cybernetics: Control and Communication in Radical Politics, which explores how large-scale coordination and communication can happen without central direction, and how social media platforms can facilitate interactive communication and participatory and democratic forms of organization. 

Given the mutation of global scenario of social mobilization towards higher intensity of digital network and higher degree of decentralization and autonomy, it’s critical to understand how the evolution spreads on social media and synchronize with the demonstration. 

This project argue that the decision-making process tend to be more decentralized, the identity can be formed from symbols, and synchrony between network and real-world activities. Using the Twitter data from 2019 Hong Kong Protests caused by the anti-extradition law amendment bill, this paper would use network analysis including K-core decomposition and community detection and sentimental analysis to examine the emotion cascade and propagate from one node to another, the user-hashtag interaction, the evolution of opinion groups and emergence of opinion leaders. 

In [16]:
# Generic ones
import numpy as np
import pandas as pd
import os

# Word processing libraries
import re
from nltk.corpus import wordnet
import string
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.tokenize import WhitespaceTokenizer
from nltk.stem import WordNetLemmatizer

# Widen the size of each cell
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

# import necessary packages for network analysis
import networkx as nx
import matplotlib.pyplot as plt

  from IPython.core.display import display, HTML


# Import dataset 

The .csv files each contains 15k tweets and the search parameters used are stated below:
1. search_words = "#hongkong OR #hkprotests OR #freehongkong OR #hongkongprotests OR #hkpolicebrutality OR #antichinazi OR #standwithhongkong OR #hkpolicestate OR #HKpoliceterrorist OR #standwithhk OR #hkpoliceterrorism"
2. date_since = ["2019-10-28","2019-10-29","2019-10-30"]


Source: 
- https://towardsdatascience.com/creating-the-twitter-sentiment-analysis-program-in-python-with-naive-bayes-classification-672e5589a7ed
- https://towardsdatascience.com/extracting-twitter-data-pre-processing-and-sentiment-analysis-using-python-3-0-7192bd8b47cf

In [17]:
tweets_1st = pd.read_csv(os.getcwd() + '/data/raw' + '/20191103_131218_sahkprotests_tweets.csv')
tweets_2nd = pd.read_csv(os.getcwd() + '/data/raw' + '/20191103_153932_sahkprotests_tweets.csv')
tweets_3rd = pd.read_csv(os.getcwd() + '/data/raw' + '/20191105_010011_sahkprotests_tweets.csv')
tweets_4th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191105_222815_sahkprotests_tweets.csv')
tweets_5th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191107_000333_sahkprotests_tweets.csv')
tweets_6th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191108_001436_sahkprotests_tweets.csv')
tweets_7th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191109_030106_sahkprotests_tweets.csv')
tweets_8th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191109_120954_sahkprotests_tweets.csv')
tweets_9th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191110_021422_sahkprotests_tweets.csv', engine='python')
tweets_10th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191110_134433_sahkprotests_tweets.csv', engine='python')
tweets_11th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191111_223912_sahkprotests_tweets.csv', engine='python')
tweets_12th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191112_231846_sahkprotests_tweets.csv', engine='python')
tweets_13th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191113_225544_sahkprotests_tweets.csv', engine='python')
tweets_14th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191115_000508_sahkprotests_tweets.csv', engine='python')
tweets_15th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191116_121136_sahkprotests_tweets.csv', engine='python')
tweets_16th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191117_001011_sahkprotests_tweets.csv', engine='python')
tweets_17th = pd.read_csv(os.getcwd() + '/data/raw' + '/20191120_015710_sahkprotests_tweets.csv', engine='python')

In [18]:
# Shape of dataset
print('Size of 1st set is:', tweets_1st.shape)
print('Size of 2nd set is:', tweets_2nd.shape)
print('Size of 3rd set is:', tweets_3rd.shape)
print('Size of 4th set is:', tweets_4th.shape)
print('Size of 5th set is:', tweets_5th.shape)
print('Size of 6th set is:', tweets_6th.shape)
print('Size of 7th set is:', tweets_7th.shape)
print('Size of 8th set is:', tweets_8th.shape)
print('Size of 9th set is:', tweets_9th.shape)
print('Size of 10th set is:', tweets_10th.shape)
print('Size of 11th set is:', tweets_11th.shape)
print('Size of 12th set is:', tweets_12th.shape)
print('Size of 13th set is:', tweets_13th.shape)
print('Size of 14th set is:', tweets_14th.shape)
print('Size of 15th set is:', tweets_15th.shape)
print('Size of 16th set is:', tweets_16th.shape)
print('Size of 17th set is:', tweets_17th.shape)

Size of 1st set is: (15000, 11)
Size of 2nd set is: (15000, 11)
Size of 3rd set is: (15000, 11)
Size of 4th set is: (15001, 11)
Size of 5th set is: (15000, 11)
Size of 6th set is: (15000, 11)
Size of 7th set is: (15000, 11)
Size of 8th set is: (15000, 11)
Size of 9th set is: (15001, 11)
Size of 10th set is: (15001, 11)
Size of 11th set is: (15000, 11)
Size of 12th set is: (15000, 11)
Size of 13th set is: (15000, 11)
Size of 14th set is: (15000, 11)
Size of 15th set is: (15000, 11)
Size of 16th set is: (15000, 11)
Size of 17th set is: (15000, 11)


In [19]:
# Summary statistics
print(tweets_1st.info())
print('\n')
print(tweets_2nd.info())
print('\n')
print(tweets_3rd.info())
print('\n')
print(tweets_4th.info())
print('\n')
print(tweets_5th.info())
print('\n')
print(tweets_6th.info())
print('\n')
print(tweets_7th.info())
print('\n')
print(tweets_8th.info())
print('\n')
print(tweets_9th.info())
print('\n')
print(tweets_10th.info())
print('\n')
print(tweets_11th.info())
print('\n')
print(tweets_12th.info())
print('\n')
print(tweets_13th.info())
print('\n')
print(tweets_14th.info())
print('\n')
print(tweets_15th.info())
print('\n')
print(tweets_16th.info())
print('\n')
print(tweets_17th.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   username        15000 non-null  object
 1   acctdesc        10515 non-null  object
 2   location        5694 non-null   object
 3   following       15000 non-null  int64 
 4   followers       15000 non-null  int64 
 5   totaltweets     15000 non-null  int64 
 6   usercreatedts   15000 non-null  object
 7   tweetcreatedts  15000 non-null  object
 8   retweetcount    15000 non-null  int64 
 9   text            15000 non-null  object
 10  hashtags        15000 non-null  object
dtypes: int64(4), object(7)
memory usage: 1.3+ MB
None


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   username        15000 non-null  object
 1   acctdesc        

In [20]:
# Concat the two dataset together:
data = pd.concat([tweets_1st, tweets_2nd, tweets_3rd, tweets_4th, tweets_5th, tweets_6th, tweets_7th,
                  tweets_8th, tweets_9th, tweets_10th, tweets_11th, tweets_12th, tweets_13th, tweets_14th,
                 tweets_15th, tweets_16th, tweets_17th], axis = 0)

print('Size of concatenated dataset is:', data.shape)

# Reset_index
data.reset_index(inplace = True, drop = True)
data.head()
print(data.info())

Size of concatenated dataset is: (255003, 11)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 255003 entries, 0 to 255002
Data columns (total 11 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   username        255003 non-null  object 
 1   acctdesc        185074 non-null  object 
 2   location        94066 non-null   object 
 3   following       255000 non-null  float64
 4   followers       255000 non-null  float64
 5   totaltweets     255000 non-null  object 
 6   usercreatedts   255000 non-null  object 
 7   tweetcreatedts  255000 non-null  object 
 8   retweetcount    255000 non-null  object 
 9   text            255000 non-null  object 
 10  hashtags        254997 non-null  object 
dtypes: float64(2), object(9)
memory usage: 21.4+ MB
None


### Checking for Duplicated Entries and Removing Them
Since we are performing the scraping close to each other, it is possible to scrape the same tweets as long as they fall within the search window of 7 days from the search_date.
We will remove these duplicated rows from our dataset.

In [21]:
# Check out the number of unique users in the dataset
# It can be seen that some users are more proactive on the social media platform than others.
data['username'].value_counts()

username
xzxzanalazy        651
belleng324         540
wdamidoinhere      399
natalie_hoyin      286
hky2147            264
                  ... 
evangelion01241      1
ST87699605           1
Alice_Man_Man        1
littletaekook01      1
Candy96023268        1
Name: count, Length: 51955, dtype: int64

In [22]:
# It is acceptable to have repeated username since they can tweet multiple times throughout the day.
data[data['username'] == 'Phy32833861']

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags
10271,Phy32833861,,,7.0,7.0,418,2019-09-28 03:09:49,2019-11-03 04:24:41,12,@JoachimWatson @Treerisu @SolomonYue @HawleyMO @amnesty @AmnestyUK @amnestyusa @BonnieGlaser @cafreeland \n\nAnyone can save these girls?\n\n#StandWithHongKong https://t.co/okzc6lTEyz,[]
10292,Phy32833861,,,7.0,7.0,418,2019-09-28 03:09:49,2019-11-03 04:24:33,2,@JoachimWatson @SolomonYue @HawleyMO SOS! Please save our kids 🙏\n#StandWithHongKong https://t.co/gdQdiTwBzD,"[{'text': 'StandWithHongKong', 'indices': [86, 104]}]"
10436,Phy32833861,,,7.0,7.0,418,2019-09-28 03:09:49,2019-11-03 04:23:16,15,More &amp; more cases like Hoi-lam Chan will happen if #hkpolice aren't held in control and international supervision.\n\n#HKPoliceState #HKPoliceTerrorism https://t.co/Dfi7rnd28T,"[{'text': 'hkpolice', 'indices': [74, 83]}]"
10463,Phy32833861,,,7.0,7.0,418,2019-09-28 03:09:49,2019-11-03 04:23:04,34,"This clip is a bit old but this happens on every protests. I sincerely don't want to believe what happens behind any cameras or press, in the detention centers/ police stations, but the answer is pretty straight-forward...\n\n#HKPolice #HKPoliceState https://t.co/7Uft3CNV7B",[]
10549,Phy32833861,,,7.0,7.0,418,2019-09-28 03:09:49,2019-11-03 04:22:28,2818,"Arrestees ALL GIRLS. \nHard to believe they're not caught to serve the cops/PLAs in #HK especially when that's happening in #Xinjiang, and there're so many sexual abuse/ra_e allegations against #hkpolice.\n\n#HongKongProtests\n#HKPoliceState\n#HKPoliceTerrorism \n@SolomonYue @HawleyMO https://t.co/2wX803P3lH","[{'text': 'HK', 'indices': [102, 105]}]"
...,...,...,...,...,...,...,...,...,...,...,...
71978,Phy32833861,,,11.0,9.0,1112,2019-09-28 03:09:49,2019-11-06 14:53:28,2,"@SenRickScott Other than interest, #ccp and its puppet #CarrieLam are restricting hongkongers to restore the human rights and universal suffrage, Senator Scott\n#HongKongHumanRightsAndDemocracyAct \n#HKHumanRightsandDemocracyAct \n#StandWithHongKong","[{'text': 'ccp', 'indices': [56, 60]}, {'text': 'CarrieLam', 'indices': [76, 86]}]"
72020,Phy32833861,,,11.0,9.0,1112,2019-09-28 03:09:49,2019-11-06 14:52:52,590,"“The UK Government says that it is “fully committed to upholding Hong Kong’s high degree of autonomy and its rights and freedoms as enshrined in the ‘one country, two systems’ framework”. Where, though, is the evidence for this?” @agcolehamilton #standwithHK https://t.co/b0wClOORsC",[]
72062,Phy32833861,,,11.0,9.0,1112,2019-09-28 03:09:49,2019-11-06 14:52:07,378,"Top Pro-CCP figure in #HongKong, Junius Ho was attacked and mildly injured.\n\nMy guess: Staged.\nNicely filmed, ineffective stab, &amp; very quick response to subdue.\n\nJust a pretext to cancel the upcoming District Council Election which they know they'll lose.\nhttps://t.co/ZwsbFrqYQM","[{'text': 'HongKong', 'indices': [39, 48]}]"
72153,Phy32833861,,,11.0,9.0,1112,2019-09-28 03:09:49,2019-11-06 14:50:59,132,"Just like the pro-Beijing HK lawmaker Junius Ho #何君堯, my friend Jenital Ho was attacked today as well！\nBut unlike Mr Junius，Mr Jenital was much more unfortunate. Because he has to wash off all the ketchup by himself.\n#StandWithHongKong #FreedomHK #HKprotests https://t.co/dDJSILprx1","[{'text': '何君堯', 'indices': [62, 66]}]"


In [23]:
# Let's drop duplicated rows:
print('Initial size of dataset before dropping duplicated rows:', data.shape)


Initial size of dataset before dropping duplicated rows: (255003, 11)


In [24]:
data.drop_duplicates(keep = False, inplace = True)
print('Current size of dataset after dropping duplicated rows, if any, is:', data.shape)


Current size of dataset after dropping duplicated rows, if any, is: (248652, 11)


In [25]:
print(data.head())

          username  \
0     five5demands   
1   Dejavu53328974   
2         tsksimon   
3  JimmyWo67187904   
4         ARCHI418   

                                                                                        acctdesc  \
0                                            HKer Fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk   
1  #followbackhongkong\n#StandwithHK\n#HKHumanRightsandDemocracyAct\n #科勞手足\n#hongkongprotesters   
2                                                            #FightForFreedom #StandWithHongKong   
3                                                                                            NaN   
4                                                                                            NaN   

    location  following  followers totaltweets        usercreatedts  \
0        NaN      437.0      260.0         466  2019-10-05 13:40:37   
1  Hong Kong      581.0      360.0        7835  2019-08-27 15:47:34   
2        NaN      825.0      456.0        6728  2014-

### Removing Non-English Words/Tokens

Since it might be possible to remove non-english words that are used in daily english conversations such as names etc, it might be better to filter by the chinese language.

##### Remove empty tweets first! 
If there are empty tweets or 'nan', the following codes will break

In [26]:
data.dropna(subset = ['text'], inplace = True)

In [27]:
# The unicode accounts for chinese characters and punctuations.
def strip_chinese_words(string):
    # list of english words
    en_list = re.findall(u'[^\u4E00-\u9FA5\u3000-\u303F]', str(string))
    
    # Remove word from the list, if not english
    for c in string:
        if c not in en_list:
            string = string.replace(c, '')
    return string

In [28]:
# Apply strip_chinese_words(...) on the column 'text'
data['text'] = data['text'].apply(lambda x: strip_chinese_words(x))
data.head()

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags
0,five5demands,HKer Fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN","[{'text': 'HongKong', 'indices': [36, 45]}]"
1,Dejavu53328974,#followbackhongkong\n#StandwithHK\n#HKHumanRightsandDemocracyAct\n #科勞手足\n#hongkongprotesters,Hong Kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[{'text': 'China', 'indices': [11, 17]}, {'text': 'PoliceState', 'indices': [118, 130]}]"
2,tsksimon,#FightForFreedom #StandWithHongKong,,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V","[{'text': 'HKPoliceBrutality', 'indices': [42, 60]}]"
3,JimmyWo67187904,,,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL","[{'text': 'HongKongPolice', 'indices': [41, 56]}]"
4,ARCHI418,,,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[{'text': 'police', 'indices': [0, 7]}, {'text': 'Teargas', 'indices': [16, 24]}, {'text': 'grenade', 'indices': [30, 38]}, {'text': 'hkger', 'indices': [155, 161]}, {'text': 'PoliceState', 'indices': [163, 175]}, {'text': 'PoliceTerrorism', 'indices': [176, 192]}, {'text': 'HKprotests', 'indices': [193, 204]}]"


### Collect @Users mentioned in each tweet

We want to obtain this useful information from each tweet because it could allow us to analyse what are the popular figures in the protest movement.

In [29]:
# Define function to sieve out @users in a tweet:
def mentioned_users(string):
    usernames = re.findall('@[^\s]+', string)
    return usernames


# Create a new column and apply the function on the column 'text'
data['mentioned_users'] = data['text'].apply(lambda x: mentioned_users(x))
data.head()

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users
0,five5demands,HKer Fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN","[{'text': 'HongKong', 'indices': [36, 45]}]",[]
1,Dejavu53328974,#followbackhongkong\n#StandwithHK\n#HKHumanRightsandDemocracyAct\n #科勞手足\n#hongkongprotesters,Hong Kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[{'text': 'China', 'indices': [11, 17]}, {'text': 'PoliceState', 'indices': [118, 130]}]",[@hkgetv]
2,tsksimon,#FightForFreedom #StandWithHongKong,,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V","[{'text': 'HKPoliceBrutality', 'indices': [42, 60]}]",[]
3,JimmyWo67187904,,,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL","[{'text': 'HongKongPolice', 'indices': [41, 56]}]",[]
4,ARCHI418,,,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[{'text': 'police', 'indices': [0, 7]}, {'text': 'Teargas', 'indices': [16, 24]}, {'text': 'grenade', 'indices': [30, 38]}, {'text': 'hkger', 'indices': [155, 161]}, {'text': 'PoliceState', 'indices': [163, 175]}, {'text': 'PoliceTerrorism', 'indices': [176, 192]}, {'text': 'HKprotests', 'indices': [193, 204]}]",[]


### Main Text Cleaning
We had to remove chinese words and @users from our tweets first before we do the usual text cleaning and processing. Otherwise, we will not be able to obtain these information.

In [30]:
# Define Emoji_patterns
emoji_pattern = re.compile("["
         u"\U0001F600-\U0001F64F"  # emoticons
         u"\U0001F300-\U0001F5FF"  # symbols & pictographs
         u"\U0001F680-\U0001F6FF"  # transport & map symbols
         u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
         u"\U00002702-\U000027B0"
         u"\U000024C2-\U0001F251"
         "]+", flags=re.UNICODE)

In [31]:
# Define the function to implement POS tagging:
def get_wordnet_pos(pos_tag):
    if pos_tag.startswith('J'):
        return wordnet.ADJ
    elif pos_tag.startswith('V'):
        return wordnet.VERB
    elif pos_tag.startswith('N'):
        return wordnet.NOUN
    elif pos_tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN


# Define the main function to clean text in various ways:
def clean_text(text):
    
    # Apply regex expressions first before converting string to list of tokens/words:
    # 1. remove @usernames
    text = re.sub('@[^\s]+', '', text)
    
    # 2. remove URLs
    text = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', '', text)
    
    # 3. remove hashtags entirely i.e. #hashtags
    text = re.sub(r'#([^\s]+)', '', text)
    
    # 4. remove emojis
    text = emoji_pattern.sub(r'', text)
    
    # 5. Convert text to lowercase
    text = text.lower()
    
    # 6. tokenize text and remove punctuation
    text = [word.strip(string.punctuation) for word in text.split(" ")]
    
    # 7. remove numbers
    text = [word for word in text if not any(c.isdigit() for c in word)]
    
    # 8. remove stop words
    stop = stopwords.words('english')
    text = [x for x in text if x not in stop]
    
    # 9. remove empty tokens
    text = [t for t in text if len(t) > 0]
    
    # 10. pos tag text and lemmatize text
    pos_tags = pos_tag(text)
    text = [WordNetLemmatizer().lemmatize(t[0], get_wordnet_pos(t[1])) for t in pos_tags]
    
    # 11. remove words with only one letter
    text = [t for t in text if len(t) > 1]
    
    # join all
    text = " ".join(text)
    
    return(text)

In [32]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/shiruizhou/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [33]:
# Apply function on the column 'text':
data['cleaned_text'] = data['text'].apply(lambda x: clean_text(x))
data.head()

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text
0,five5demands,HKer Fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN","[{'text': 'HongKong', 'indices': [36, 45]}]",[],disgust police mark protestors pen write number hand like nazi germany
1,Dejavu53328974,#followbackhongkong\n#StandwithHK\n#HKHumanRightsandDemocracyAct\n #科勞手足\n#hongkongprotesters,Hong Kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[{'text': 'China', 'indices': [11, 17]}, {'text': 'PoliceState', 'indices': [118, 130]}]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit
2,tsksimon,#FightForFreedom #StandWithHongKong,,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V","[{'text': 'HKPoliceBrutality', 'indices': [42, 60]}]",[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence
3,JimmyWo67187904,,,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL","[{'text': 'HongKongPolice', 'indices': [41, 56]}]",[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away
4,ARCHI418,,,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[{'text': 'police', 'indices': [0, 7]}, {'text': 'Teargas', 'indices': [16, 24]}, {'text': 'grenade', 'indices': [30, 38]}, {'text': 'hkger', 'indices': [155, 161]}, {'text': 'PoliceState', 'indices': [163, 175]}, {'text': 'PoliceTerrorism', 'indices': [176, 192]}, {'text': 'HKprotests', 'indices': [193, 204]}]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill


In [34]:
# Check out the shape again and reset_index
print(data.shape)
data.reset_index(inplace = True, drop = True)

# Check out data.tail() to validate index has been reset
data.tail()

(248651, 13)


Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text
248646,zizihui1,"Five demands, not one less 🇭🇰#FollowBackHongKong #科勞手足",Hong Kong,505.0,326.0,1301,2019-09-25 17:05:21,2019-11-19 17:19:37,72,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy
248647,Candy96023268,#手足互科 #StandWithHongKong,Hong Kong,15.0,155.0,116,2019-09-12 06:32:09,2019-11-19 17:19:36,796,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom
248648,Vincent03231883,Hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,,248.0,170.0,362,2019-09-28 14:29:13,2019-11-19 17:19:36,1162,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read
248649,terryyipomni,Hongkonger 🇭🇰 Stand with Hong Kong! Fight for freedom! 香港人加油!! #FollowBackHongKong #FiveDemandsNotOneLess #StandwithHongKong #科勞手足 #FightForFreedom,Hong Kong,4653.0,2490.0,4627,2015-03-09 06:57:20,2019-11-19 17:19:36,9,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[{'text': 'SOSHK', 'indices': [74, 80]}, {'text': 'StandWithHongKong', 'indices': [81, 99]}, {'text': 'HongKongProtests', 'indices': [101, 118]}]",[],huh excuse hong kong government lie \n.\n\n
248650,lajiaer1,言論自由是一切自由之母。川普总统支持者；支持爆料革命；,,348.0,45.0,5042,2019-06-01 23:58:30,2019-11-19 17:19:35,10,@y345a678 @YuriStoyanov I can’t stop crying watching clips showing how our young protesters are being assaulted and heavily injured by #HKpolice. This is utterly unlawful for police to think they can use this kind of violence to anyone! #SOSHK #SOSPolyU may the world please #StandWithHongKong https://t.co/0OsOrGI55l,[],"[@y345a678, @YuriStoyanov]",can’t stop cry watch clip show young protester assault heavily injure utterly unlawful police think use kind violence anyone may world please


### Process the Column 'hashtags'

In [35]:
# Import ast to convert a string representation of list to list
# The column 'hashtags' is affected
import ast

# Define a function to convert a string rep. of list to list
## Function should also handle NaN values after conversion
def strlist_to_list(text):
    
    # Remove NaN
    if pd.isnull(text) == True: # if true
        text = ''
    else:
        text = ast.literal_eval(text)
    
    return text

In [36]:
# Apply strlist_to_list(...) to the column 'hashtags'
# Note that doing so will return a list of dictionaries, where there will be one dictionary for each hashtag in a single tweet.
data['hashtags'] = data['hashtags'].apply(lambda x: strlist_to_list(x))
data.head()

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text
0,five5demands,HKer Fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN","[{'text': 'HongKong', 'indices': [36, 45]}]",[],disgust police mark protestors pen write number hand like nazi germany
1,Dejavu53328974,#followbackhongkong\n#StandwithHK\n#HKHumanRightsandDemocracyAct\n #科勞手足\n#hongkongprotesters,Hong Kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[{'text': 'China', 'indices': [11, 17]}, {'text': 'PoliceState', 'indices': [118, 130]}]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit
2,tsksimon,#FightForFreedom #StandWithHongKong,,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V","[{'text': 'HKPoliceBrutality', 'indices': [42, 60]}]",[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence
3,JimmyWo67187904,,,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL","[{'text': 'HongKongPolice', 'indices': [41, 56]}]",[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away
4,ARCHI418,,,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[{'text': 'police', 'indices': [0, 7]}, {'text': 'Teargas', 'indices': [16, 24]}, {'text': 'grenade', 'indices': [30, 38]}, {'text': 'hkger', 'indices': [155, 161]}, {'text': 'PoliceState', 'indices': [163, 175]}, {'text': 'PoliceTerrorism', 'indices': [176, 192]}, {'text': 'HKprotests', 'indices': [193, 204]}]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill


In [37]:
# Since each 'hashtag' contain a list of dictionaries, we need to loop through the list to extract the actual hashtags in the tweets.
# Define a function to perform this extraction:
def extract_hashtags(hashtag_list):
    # argument:
    # hashtag_list - a list of dictionary(ies), each containing a hashtag
    
    # Create a list to store the hashtags
    hashtags = []
    
    # Loop through the list:
    for i in range(0, len(hashtag_list)):
        # extract the hashtag value using the key - 'text'
        # For our purposes, we can ignore the indices, which tell us the position of the hashtags in the string of tweet
        # lowercase the text as well
        hashtags.append(hashtag_list[i]['text'].lower())
        
    return hashtags

In [38]:
# Apply function on the column - data['hashtags']
data['hashtags'] = data['hashtags'].apply(lambda x: extract_hashtags(x))

# Check out the updated column 'hashtags'
print(data.head()['hashtags'])

0                                                                     [hongkong]
1                                                           [china, policestate]
2                                                            [hkpolicebrutality]
3                                                               [hongkongpolice]
4    [police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]
Name: hashtags, dtype: object


### Cleaning up the Column 'location'

In [39]:
# Replace NaN (empty) values with n.a to indicate that the user did not state his location
# Define a function to handle this:
def remove_nan(text):
    if pd.isnull(text) == True: # entry is NaN
        text = 'n.a'
    else:
        # lowercase text for possible easy handling
        text = text.lower()
        
    return text

In [40]:
# Apply function on column - data['location']
data['location'] = data['location'].apply(lambda x: remove_nan(x))

# Check out the updated columns
print(data.head()['location'])

0          n.a
1    hong kong
2          n.a
3          n.a
4          n.a
Name: location, dtype: object


In [41]:
# Let's take a quick look at the value_counts()
data['location'].value_counts()

location
n.a                  157316
hong kong             49093
香港                     6829
united states          1058
hk                      912
                      ...  
standing rock             1
setúbal, portugal         1
tarragona                 1
^.^===-~                  1
rio de janeiro 🇧🇷         1
Name: count, Length: 6218, dtype: int64

### Cleaning up the Column 'acctdesc'

Likewise, we will clean up this column by removing NaN values and replacing them with 'n.a'.
For now, this is all we do to this column.

In [42]:
# Apply the function already defined above: remove_nan(...)
# Apply function on column - data['location']
data['acctdesc'] = data['acctdesc'].apply(lambda x: remove_nan(x))

# Check out the updated columns
print(data.head()['acctdesc'])

0                                              hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk
1    #followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters
2                                                              #fightforfreedom #standwithhongkong
3                                                                                              n.a
4                                                                                              n.a
Name: acctdesc, dtype: object


In [43]:
data

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away
4,ARCHI418,n.a,n.a,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill
...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301,2019-09-25 17:05:21,2019-11-19 17:19:37,72,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116,2019-09-12 06:32:09,2019-11-19 17:19:36,796,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362,2019-09-28 14:29:13,2019-11-19 17:19:36,1162,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627,2015-03-09 06:57:20,2019-11-19 17:19:36,9,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n


## Feature Engineering - Rule-based Word Processing
So far, we have removed duplicated rows, extract important information such as hashtags, mentioned users and users' locations, and also cleaned up the tweets in the previous section. In the coming section, we will focus on Rule-based word processing for our sentiment analysis. We will postpone some exploratory data visualization till later once we have all the ingredients.

### Generating Sentiments from Tweets with NLTK Vader_Lexicon Library
We will be using the Vader_lexicon library from NLTK to generate sentiment for each review. Vader uses lexicon of words to determine which words in the reviews are positive or negative. It will return a set of 4 scores on the positivity, negativity, neutrality of a text, and also an overall score whether a text is positive or negative.
1. Positivity - 'pos'
2. Negativity - 'neg'
3. Neutrality - 'neu'
4. Overall Score - 'compound'

In [44]:
# Importing VADER from NLTK
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create a sid object called SentimentIntensityAnalyzer()
sid = SentimentIntensityAnalyzer()

# Apply polarity_score method of SentimentIntensityAnalyzer()
data['sentiment'] = data['cleaned_text'].apply(lambda x: sid.polarity_scores(x))

# Keep only the compound scores under the column 'Sentiment'
data = pd.concat([data.drop(['sentiment'], axis = 1), data['sentiment'].apply(pd.Series)], axis = 1)

In [45]:
data 

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany,0.218,0.391,0.391,0.2732
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit,0.256,0.543,0.202,-0.1779
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence,0.339,0.590,0.070,-0.8074
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430
4,ARCHI418,n.a,n.a,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill,0.203,0.519,0.277,0.1779
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301,2019-09-25 17:05:21,2019-11-19 17:19:37,72,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy,0.000,0.882,0.118,0.5574
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116,2019-09-12 06:32:09,2019-11-19 17:19:36,796,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom,0.337,0.358,0.305,-0.1265
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362,2019-09-28 14:29:13,2019-11-19 17:19:36,1162,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read,0.107,0.785,0.107,0.0000
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627,2015-03-09 06:57:20,2019-11-19 17:19:36,9,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.000,0.794,0.206,0.0772


#### Additional Features: no. of characters and no. of words

In [46]:
# New column: number of characters in 'review'
data['numchars'] = data['cleaned_text'].apply(lambda x: len(x))

# New column: number of words in 'review'
data['numwords'] = data['cleaned_text'].apply(lambda x: len(x.split(" ")))

# Check the new columns:
data.tail(2)

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627,2015-03-09 06:57:20,2019-11-19 17:19:36,9,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.0,0.794,0.206,0.0772,40,7
248650,lajiaer1,言論自由是一切自由之母。川普总统支持者；支持爆料革命；,n.a,348.0,45.0,5042,2019-06-01 23:58:30,2019-11-19 17:19:35,10,@y345a678 @YuriStoyanov I can’t stop crying watching clips showing how our young protesters are being assaulted and heavily injured by #HKpolice. This is utterly unlawful for police to think they can use this kind of violence to anyone! #SOSHK #SOSPolyU may the world please #StandWithHongKong https://t.co/0OsOrGI55l,[],"[@y345a678, @YuriStoyanov]",can’t stop cry watch clip show young protester assault heavily injure utterly unlawful police think use kind violence anyone may world please,0.378,0.458,0.163,-0.8176,141,22


#### Word Embeddings - Training Word2Vec using Gensim

Word embeddings involve the mapping of words in the corpus to numerical vectors, where similar words sharing similar contexts will have similar vectors as well. It will involve a shallow two-layer neural network that will train a matrix/tensor called the embedding matrix. By taking the matrix product of the embedding matrix and one-hot vector representation of each word in the corpus, we will arrive at the embedding vector.

We will be using Gensim - an open-source Python library - to generate doc2vec. <br/>

To confirm: apparently, doc2vec should be used over word2vec to obtain the vector representation of a 'document', in this case, our 'reviews' from the travellers. Word2vec will only give us the vector representation of a 'word'.

In [47]:
# # Import the Gensim package
# from gensim.test.utils import common_texts
# from gensim.models.doc2vec import Doc2Vec, TaggedDocument

# documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(data["cleaned_text"].apply(lambda x: x.split(" ")))]

# # Train a Doc2Vec model with our text data
# model = Doc2Vec(documents, vector_size = 10, window = 2, min_count = 1, workers = 4)

# # Transform each document into a vector data
# doc2vec_df = data["cleaned_text"].apply(lambda x: model.infer_vector(x.split(" "))).apply(pd.Series)
# doc2vec_df.columns = ["doc2vec_vector_" + str(x) for x in doc2vec_df.columns]
# data = pd.concat([data, doc2vec_df], axis = 1)

# # Check out the newly added columns:
# data.tail(2)

#### TD-IDF Columns
Next, we will compute the TD-IDF of the reviews using the sklearn library. TD-IDF stands for Term Frequency-Inverse Document Frequency, which is used to reflect how important a word is to a document in a collection or corpus. The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.

1. Term Frequency - the number of times a term occurs in a document.
2. Inverse Document Frequency - an inverse document frequency factor that diminishes the weight of terms that occur very frequently in the document set and increases the weight of terms that occur rarely.

Since NLTK does not support TF-IDF, we will use the tfidfvectorizer function from the Python sklearn library.

In [48]:
pip install --upgrade scikit-learn


Note: you may need to restart the kernel to use updated packages.


In [49]:
# from sklearn.feature_extraction.text import TfidfVectorizer
# import pandas as pd

# # Assuming 'data' is your DataFrame containing the 'cleaned_text' column.

# tfidf = TfidfVectorizer(
#     max_features=100,
#     min_df=10,
#     stop_words='english'
# )

# tfidf_result = tfidf.fit_transform(data['cleaned_text']).toarray()

# # Extract the feature names using the get_feature_names_out() method.
# feature_names = tfidf.get_feature_names_out()

# # Create a DataFrame from tfidf_result using the feature names
# tfidf_df = pd.DataFrame(tfidf_result, columns=feature_names)

# # Rename the column names and index
# tfidf_df.columns = ["word_" + str(x) for x in tfidf_df.columns]
# tfidf_df.index = data.index

# # Concatenate the two DataFrames - 'data' and 'tfidf_df'
# data = pd.concat([data, tfidf_df], axis=1)

# # Check out the new 'data' DataFrame
# print(data.tail(2))


#### Expected Vocabulary 

Looking at the wordcloud, the only obvious word that has a strong negative connotation would be 'disgust', appearing at the bottom. Most of the remaining words thrown out align with what one has been seeing regularly on news and social media. From our wordcloud, we see that these words are - police, tear gas cannisters, protest, protestors etc. 

Outdated as more tweets are taken into consideration: Interestingly, 'good' appears in the wordcloud. But as of now, it still remains to be seen as to what or who is mentioned or associated numerously with 'good'. Let's see if we can find out more.

### 2 No. of Positive Sentiments vs No. of Negative Sentiments
Next, we look at what is the overall distribution of positive and negative tweets. Based on the SentimentIntensityAnalyzer from the NLTK Vader-Lexicon library, this analyzer examines the sentiment of a sentence, on how positive, neutral or negative it is. We can interpret the sentiment in the following manner. If a sentiment is positive, it could mean that it is pro-government and/or police. Whereas, a negative sentiment could mean that it is anti- government and/or police, and supportive towards the protesters.

The analyzer returns 4 scores for each sentence namely, 'positive', 'negative', 'neutral' and 'compound'. The score 'compound' returns the overall sentiment of a sentence with range of [-1, 1]. For our current purpose, we want to classify each tweet into 5 classes and assign a range of values for each of them:
1. Very positive '5' - [0.55, 1.00]
2. Positive '4' - [0.10, 0.55)
3. Neutral '3' - (-0.10, 0.10)
4. Negative '2' - (-0.55, -0.10]
5. Very negative '1' - [-1.00, -0.55]

Note: the range of values for a neutral sentiment is more stringent.

In [50]:
# Focus on 'compound' scores
# Create a new column called 'sentiment_class'
sentimentclass_list = []

for i in range(0, len(data)):
    
    # current 'compound' score:
    curr_compound = data.iloc[i,:]['compound']
    
    if (curr_compound <= 1.0 and curr_compound >= 0.55):
        sentimentclass_list.append(5)
    elif (curr_compound < 0.55 and curr_compound >= 0.10):
        sentimentclass_list.append(4)
    elif (curr_compound < 0.10 and curr_compound > -0.10):
        sentimentclass_list.append(3)
    elif (curr_compound <= -0.10 and curr_compound > -0.55):
        sentimentclass_list.append(2)
    elif (curr_compound <= -0.55 and curr_compound >= -1.00):
        sentimentclass_list.append(1)

# Add the new column 'sentiment_class' to the dataframe
data['sentiment_class'] = sentimentclass_list

# Check out the new column
data.tail()['sentiment_class']

248646    5
248647    2
248648    3
248649    3
248650    1
Name: sentiment_class, dtype: int64

In [51]:
# Verify if the classification assignment is correct:
data.iloc[0:5, :][['compound', 'sentiment_class']]

Unnamed: 0,compound,sentiment_class
0,0.2732,4
1,-0.1779,2
2,-0.8074,1
3,-0.743,1
4,0.1779,4


In [52]:
data['sentiment_class']

0         4
1         2
2         1
3         1
4         4
         ..
248646    5
248647    2
248648    3
248649    3
248650    1
Name: sentiment_class, Length: 248651, dtype: int64

In [53]:
# Display full text:
pd.set_option('display.max_colwidth', None)

In [54]:
pip install tabulate

Note: you may need to restart the kernel to use updated packages.


In [55]:
from tabulate import tabulate

In [56]:
# Look at some examples of negative, neutral and positive tweets

# Filter 10 negative original tweets:
print("10 random negative original tweets and their sentiment classes:")
data[(data['sentiment_class'] == 1) | (data['sentiment_class'] == 2)].sample(n=10)[['text', 'sentiment_class']]

10 random negative original tweets and their sentiment classes:


Unnamed: 0,text,sentiment_class
102609,"Hong Kong lawmaker Eddie Chu arrested over scuffle at legislature in May, with police set to nab 6 other democrats https://t.co/yc2RpblajY #HongKong #China #antiELABhk #antiELAB #HongKongProtests @ChuHoiDick @holmeschan",2
14837,Police officers have been filmed shoving a fireman around in Central after he complained they had shot tear gas straight at his truck. The officers also pepper sprayed a journalist who dared to report on the bust up. #HongKong\n\nhttps://t.co/rubbu7eQe2 https://t.co/Yt1FKU4BRm,2
59944,This photo was taken when the 22 y/o boy was brain-bleeding and waiting for ambulance.\n\nFour police cars blocked the way and didn’t allow the ambulance to pick up the casualty.\n\nNo CCTV are not released.\n\nNo one knows why he fell frok height.\n\n#SOSHK #HongKong #HongKongPolice https://t.co/4TpkHjdqfD,1
146594,[1631 Sha Tin]\n\n#HongKongPolice was trying to obstruct the arrestee from yelling his name &amp; identity no. to the bystanders trying to help him.\n\nLook into the boy's eyes only panic &amp; despration could be seen.\n\n[h/t SocRec Channel]\n\n#HongKongPoliceState #HongKong\n#HongKongProtests https://t.co/rNa6GGtOEA,2
79605,"When the #YuenLongAttack took place, cops who patrolled at the station turn their back on the civilians, local police station closed, so when victims calling 999 for help that’s an abuse use of emergency call?! \n#HongKong\n#PoliceBrutality https://t.co/5Aay74ugr0",1
3651,"A #Porsche driver was playing #GloryToHongKong the song in car in HK. Riot police aren't happy about it, brutally pulled driver out of his car &amp; arrested him for unknown cause. #HKPolice also did car search without consent/presence of owner. \n\n#StandWithHK #FightForFreedom https://t.co/LdCr604BCq",1
124954,"#HongKong people are gathering at Tamar Park today for the memorial prayer meeting, to mourning the death of 22-yr-old college student Chow Tsz-Lok, and remonstrate the #PoliceBrutality. \n\n#HumanitarianCrisis \n#HumanRights \n\nhttps://t.co/vWiUmXsWSe",1
60817,"The men was actually wanted to rescue the students who fell from 3/F. So he tried to stop #HKPolice, in order to let the ambulance in. Unfortunately, he was arrested and got hurt at the end.\n\n🥺🙏\n\nCredit: appledaily\n\n#HongKongProtests #StandWithHongKong https://t.co/JfqEvyPY0g",2
128463,Did #HongKong Apple Daily report the name(s) of the arresting officer(s)? https://t.co/CSYVHoaMOe,2
2584,"Council candidate conducted election rallies today yet #hkpolice claimed they were participating an unlawful assembly, and violently arrested the candidate. \n#HKprotests #HKPoliceTerrorism #HKRioters #HKPoliceState https://t.co/0BnOM7r51e",1


In [57]:
# Filter 10 neutral original tweets:
print("10 random neutral original tweets and their sentiment classes:")
data[(data['sentiment_class'] == 3)].sample(n=10)[['text', 'sentiment_class']]

10 random neutral original tweets and their sentiment classes:


Unnamed: 0,text,sentiment_class
37853,This is a photo deleted by facebook for several times. \n#HKPoliceTerrorism #HongKong #SOSHK \nsource : internet https://t.co/SiSGTSyV3c,3
203814,"The tear gas made in China which has higher burning point (450°c) is even more toxic than that of the USA. The highest burning point is said to be 3000°c! It contain more than80%of CS, when the CS break down, more dioxins and cyanide gas may be released.#StandWithHongKong #SOSHK https://t.co/RxiZGYnvET",3
14070,This rat was not moving. People thought it might have been teargassed too. #HKprotests #HKPolice #HongKong #HongKongProtests #HongKongProtesters #HongKongProtest https://t.co/HLvSg7U7Uc,3
96958,Lunch break in Central. These are not the typical young protesters. You're looking at people in dress shirts and business attire speaking up against #HKPolice during their lunch hour in memory of Chow\n\n#StandWithHongKong #HongKongProtests https://t.co/wRqW4THqMS,3
181285,Today we’re wearing black to rehearsals to #StandWithHongKong #HKWEARSBLACK \n#HongKongProtests https://t.co/trZ4NNe828,3
77632,A mainland student held a knife towards the graduates of CUHK and sang the Chinese National Anthem.\n\nto b honest i think his singing is more scare than the little kinfe…🀄️🔪🀄️\n\nfrom @cuhkcampusradio \n #HongKongProtests #CUHK\nhttps://t.co/MM1siFPIlf,3
151978,"This is Central in #HongKong\n\nCan you believe #HongKongPolice shot tear gas in random towards most office ladies, man in suit?\n\nWill #HongKong still exist tomorrow? https://t.co/KoN0JGnFED",3
235557,"When dictatorship becomes a fact, revolution becomes a duty.\n\n#StandWithHongKong\n#SOSPolyU \n#HongKongProtests https://t.co/A1Qh9Vr6hy",3
114622,"#hongkongpolice basically aren’t professional anymore, just doing everything based on their emotion. no police will be held accountable though.\n-\n@SolomonYue @arnohb112 #freedomhk #freehk #hongkongprotest #policebrutality #standwithhongkong # #hongkongprotests #hongkong https://t.co/4GLFEMzJS7",3
172772,Photo description from Apple Daily\n#HongKong #CUHK https://t.co/fNakKugNF4,3


In [58]:
# Filter 20 positive original tweets:
print("10 random positive original tweets and their sentiment classes:")
data[(data['sentiment_class'] == 4) | (data['sentiment_class'] == 5)].sample(n=10)[['text', 'sentiment_class']]

10 random positive original tweets and their sentiment classes:


Unnamed: 0,text,sentiment_class
205629,US Congress urged to suspend Hong Kong’s special trade status if Chinese troops used in city https://t.co/YrUHNSi0S9 @krislc #HongKong #China,4
136970,"Thanks again for this #Japanese who concern about the movement in #HongKong, we appreciated every single effort from all over the world to stand in solidarity, #StandWithHongKong \n\nhttps://t.co/riJLhLpYn1",5
49736,Speaking to parliamentarians this morning about an UK #HongKong Human Rights and Democracy Act. Lots of enthusiasm. UK based lawyers who love HK and want to help should DM me. #StandWithHongKong,5
238532,This is the beginning of the end for #HongKong. Beijing is usurping the court system of the special administrative region. The rule of law is all Hong Kong had to differentiate itself from the mainland. Now we will have rule by fiat as in China. https://t.co/sK7Ske5wv3 via @FT,4
74720,@limlouisa @unimelb The CCP is surely a threat to universal values. Please #StandWithHongKong to defend freedom and human rights https://t.co/ncNlyAUp7d,5
211655,@revmahoney @Andychanhotin @FreedomHKG @Fight4HongKong @Stand_with_HK @hk_watch @HKWORLDCITY @mschlapp We do need some spiritual supports...Thank you very much for your kindness. #StandWithHongKong,4
21534,"Police have fired multiple rounds of tear gas into Victoria Park, where hundreds, if not thousand of people were gathering peacefully to support the ongoing pro-democracy movement. #HongKong https://t.co/YgAxoQnRSy",5
192309,@marcorubio We need your help🙏🏻🙏🏻\n#StandWithHongKong https://t.co/YQWnjcuOKT,4
132754,Suzuko Hirano who has been organizing weekly demonstrations in Tokyo since June to show solidarity with the protesters in #HongKong.\nI’m glad to see people from all over the world stand in solidarity with Hong Kong.\nThanks #Japan\n\n#StandWithHongKong \n\nhttps://t.co/lOyjE06sjc,5
79011,"In the past 7 days, the Hong Kong Human Rights and Democracy Act got 2 more cosponsors. Thank you Senators @SenMcSallyAZ and @CoryBooker! #StandWithHongKong https://t.co/2zA718sNTf",4


In [59]:
print(data.columns)


Index(['username', 'acctdesc', 'location', 'following', 'followers',
       'totaltweets', 'usercreatedts', 'tweetcreatedts', 'retweetcount',
       'text', 'hashtags', 'mentioned_users', 'cleaned_text', 'neg', 'neu',
       'pos', 'compound', 'numchars', 'numwords', 'sentiment_class'],
      dtype='object')


### Transform data 

In [60]:
pd.set_option('display.max_columns', None)

In [61]:
data

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466,2019-10-05 13:40:37,2019-11-03 02:57:49,292,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany,0.218,0.391,0.391,0.2732,70,11,4
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835,2019-08-27 15:47:34,2019-11-03 02:57:49,1315,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit,0.256,0.543,0.202,-0.1779,58,8,2
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728,2014-10-05 15:57:07,2019-11-03 02:57:49,105,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence,0.339,0.590,0.070,-0.8074,148,20,1
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656,2019-10-08 12:58:51,2019-11-03 02:57:48,660,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430,152,23,1
4,ARCHI418,n.a,n.a,19.0,11.0,121,2014-06-19 16:19:36,2019-11-03 02:57:46,0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill,0.203,0.519,0.277,0.1779,84,15,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301,2019-09-25 17:05:21,2019-11-19 17:19:37,72,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy,0.000,0.882,0.118,0.5574,177,27,5
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116,2019-09-12 06:32:09,2019-11-19 17:19:36,796,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom,0.337,0.358,0.305,-0.1265,184,26,2
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362,2019-09-28 14:29:13,2019-11-19 17:19:36,1162,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read,0.107,0.785,0.107,0.0000,184,21,3
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627,2015-03-09 06:57:20,2019-11-19 17:19:36,9,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.000,0.794,0.206,0.0772,40,7,3


In [62]:
data['retweetcount'] = pd.to_numeric(data['retweetcount'], errors='coerce')
data['totaltweets'] = pd.to_numeric(data['totaltweets'], errors='coerce')
data['followers'] = pd.to_numeric(data['followers'], errors='coerce')

In [63]:
data

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466.0,2019-10-05 13:40:37,2019-11-03 02:57:49,292.0,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany,0.218,0.391,0.391,0.2732,70,11,4
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835.0,2019-08-27 15:47:34,2019-11-03 02:57:49,1315.0,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit,0.256,0.543,0.202,-0.1779,58,8,2
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728.0,2014-10-05 15:57:07,2019-11-03 02:57:49,105.0,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence,0.339,0.590,0.070,-0.8074,148,20,1
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656.0,2019-10-08 12:58:51,2019-11-03 02:57:48,660.0,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430,152,23,1
4,ARCHI418,n.a,n.a,19.0,11.0,121.0,2014-06-19 16:19:36,2019-11-03 02:57:46,0.0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill,0.203,0.519,0.277,0.1779,84,15,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301.0,2019-09-25 17:05:21,2019-11-19 17:19:37,72.0,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy,0.000,0.882,0.118,0.5574,177,27,5
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116.0,2019-09-12 06:32:09,2019-11-19 17:19:36,796.0,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom,0.337,0.358,0.305,-0.1265,184,26,2
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362.0,2019-09-28 14:29:13,2019-11-19 17:19:36,1162.0,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read,0.107,0.785,0.107,0.0000,184,21,3
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627.0,2015-03-09 06:57:20,2019-11-19 17:19:36,9.0,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.000,0.794,0.206,0.0772,40,7,3


In [64]:
# Writing the DataFrame to a CSV file
file_path = 'data_processed.csv'
data.to_csv(file_path, index=False)

In [65]:
data

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466.0,2019-10-05 13:40:37,2019-11-03 02:57:49,292.0,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany,0.218,0.391,0.391,0.2732,70,11,4
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835.0,2019-08-27 15:47:34,2019-11-03 02:57:49,1315.0,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit,0.256,0.543,0.202,-0.1779,58,8,2
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728.0,2014-10-05 15:57:07,2019-11-03 02:57:49,105.0,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence,0.339,0.590,0.070,-0.8074,148,20,1
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656.0,2019-10-08 12:58:51,2019-11-03 02:57:48,660.0,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430,152,23,1
4,ARCHI418,n.a,n.a,19.0,11.0,121.0,2014-06-19 16:19:36,2019-11-03 02:57:46,0.0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill,0.203,0.519,0.277,0.1779,84,15,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301.0,2019-09-25 17:05:21,2019-11-19 17:19:37,72.0,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy,0.000,0.882,0.118,0.5574,177,27,5
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116.0,2019-09-12 06:32:09,2019-11-19 17:19:36,796.0,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom,0.337,0.358,0.305,-0.1265,184,26,2
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362.0,2019-09-28 14:29:13,2019-11-19 17:19:36,1162.0,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read,0.107,0.785,0.107,0.0000,184,21,3
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627.0,2015-03-09 06:57:20,2019-11-19 17:19:36,9.0,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.000,0.794,0.206,0.0772,40,7,3


In [66]:
data[data['username'] == 'SolomanYue']

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
10040,SolomanYue,n.a,n.a,15.0,4.0,25.0,2019-10-29 05:41:44,2019-11-03 04:26:16,2817.0,"Arrestees ALL GIRLS. \nHard to believe they're not caught to serve the cops/PLAs in #HK especially when that's happening in #Xinjiang, and there're so many sexual abuse/ra_e allegations against #hkpolice.\n\n#HongKongProtests\n#HKPoliceState\n#HKPoliceTerrorism \n@SolomonYue @HawleyMO https://t.co/2wX803P3lH",[hk],"[@SolomonYue, @HawleyMO]",arrestees girl \nhard believe they're catch serve cops/plas especially that's happen there're many sexual abuse/ra_e allegation \n\n\n\n,0.085,0.915,0.0,-0.1027,131,17,2
10061,SolomanYue,n.a,n.a,15.0,4.0,25.0,2019-10-29 05:41:44,2019-11-03 04:26:06,2684.0,This is why the Senate should vote on the Hong Kong Human Rights &amp; Democracy Act - and my Be Water Act - now #StandWithHongKong #HongKong https://t.co/DokQaccB2d,[],[],senate vote hong kong human right amp democracy act water act,0.0,1.0,0.0,0.0,61,11,3
10137,SolomanYue,n.a,n.a,15.0,4.0,26.0,2019-10-29 05:41:44,2019-11-03 04:25:39,1498.0,"Disturbing reports that Beijing is planning new steps to “safeguard national security” in #HongKong. If so, that will only make things worse.\n\n@Senatemajldr McConnell: bring the #HKHumanRightsandDemocracyAct to a vote as soon as possible to support #HK!\nhttps://t.co/CtAx5vr2Rj",[hongkong],[@Senatemajldr],disturb report beijing plan new step “safeguard national security” make thing worse.\n\n mcconnell bring vote soon possible support,0.247,0.638,0.115,-0.4767,129,18,2
46879,SolomanYue,n.a,n.a,21.0,7.0,42.0,2019-10-29 05:41:44,2019-11-05 12:36:55,4358.0,"#LIVE: The ambience at New Town Plaza has intensified again as police have attempted to storm into the mall a 2nd time today. The chaotic scene has led to an arrest of a few. In this footage, a cop covers a man’s face as he tries to shout out his name. #antiRLAB #HongKongProtests https://t.co/vRhNYhtgPR",[live],[],ambience new town plaza intensified police attempt storm mall time today chaotic scene lead arrest footage cop cover man’s face try shout name,0.211,0.789,0.0,-0.6808,142,23,1
50060,SolomanYue,n.a,n.a,21.0,7.0,42.0,2019-10-29 05:41:44,2019-11-05 12:36:55,4380.0,"#LIVE: The ambience at New Town Plaza has intensified again as police have attempted to storm into the mall a 2nd time today. The chaotic scene has led to an arrest of a few. In this footage, a cop covers a man’s face as he tries to shout out his name. #antiRLAB #HongKongProtests https://t.co/vRhNYhtgPR",[live],[],ambience new town plaza intensified police attempt storm mall time today chaotic scene lead arrest footage cop cover man’s face try shout name,0.211,0.789,0.0,-0.6808,142,23,1


In [67]:
data[data['username'] == 'Stand_with_HK']

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
6407,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,191.0,66942.0,1404.0,2017-11-24 21:37:26,2019-11-03 03:31:18,87.0,New York City in support of Hong Kong Democracy! \n\n#FightForFreedom\n#StandwithHK https://t.co/5qTBnT2uIj,"[fightforfreedom, standwithhk]",[],new york city support hong kong democracy \n\n\n,0.0,0.69,0.31,0.4019,45,8,4
12243,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,191.0,66963.0,1406.0,2017-11-24 21:37:26,2019-11-03 04:09:53,90.0,"""HongKongers, we will fight alongside you in the #FightForFreedom"" -- From Sydney, Australia with love\n\n#StandwithHK https://t.co/hu8DUmjhb3","[fightforfreedom, standwithhk]",[],hongkongers fight alongside sydney australia love\n\n,0.241,0.37,0.389,0.3818,51,6,4
30399,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,191.0,67706.0,1422.0,2017-11-24 21:37:26,2019-11-04 14:38:54,21.0,"#HongKong supporters in London came together en masse last weekend! Thank you for standing up for democracy, and with #Hongkongers\n\n#StandwithHK \n\nCredit: @InkyWonders (Perseus) https://t.co/vFz5wVmv4J","[hongkong, hongkongers, standwithhk]",[@InkyWonders],supporter london come together en masse last weekend thank stand democracy \n\n \n\ncredit perseus,0.0,0.581,0.419,0.7351,94,14,5
49700,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,191.0,68046.0,1428.0,2017-11-24 21:37:26,2019-11-05 12:43:52,87.0,House of Commons debates Hong Kong! @DesmondSwayne standing up for the Sino-British Joint Agreement and @joshuawongcf's right to stand as a Council candidate. #StandWithHK #FightForFreedom https://t.co/UXhiqor7CS,"[standwithhk, fightforfreedom]","[@DesmondSwayne, @joshuawongcf's]",house common debate hong kong stand sino-british joint agreement right stand council candidate,0.0,0.789,0.211,0.4939,94,13,4
53332,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,191.0,68050.0,1429.0,2017-11-24 21:37:26,2019-11-05 12:43:52,201.0,House of Commons debates Hong Kong! @DesmondSwayne standing up for the Sino-British Joint Agreement and @joshuawongcf's right to stand as a Council candidate. #StandWithHK #FightForFreedom https://t.co/UXhiqor7CS,"[standwithhk, fightforfreedom]","[@DesmondSwayne, @joshuawongcf's]",house common debate hong kong stand sino-british joint agreement right stand council candidate,0.0,0.789,0.211,0.4939,94,13,4
73262,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,194.0,68802.0,1468.0,2017-11-24 21:37:26,2019-11-07 13:49:14,23.0,"If this Beijing official really believes the @CommonsForeign report is 'fictitious' then why doesn't his Ministry prove it? \n\nThere is no room for foreign interference at UK academic institutions, Mr Shuang. \n\n#StandwithHK https://t.co/umeu1vx8H5",[standwithhk],[@CommonsForeign],beijing official really believe report fictitious ministry prove \n\nthere room foreign interference uk academic institution mr shuang \n\n,0.0,1.0,0.0,0.0,135,18,3
77283,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,194.0,68815.0,1468.0,2017-11-24 21:37:26,2019-11-07 13:49:14,41.0,"If this Beijing official really believes the @CommonsForeign report is 'fictitious' then why doesn't his Ministry prove it? \n\nThere is no room for foreign interference at UK academic institutions, Mr Shuang. \n\n#StandwithHK https://t.co/umeu1vx8H5",[standwithhk],[@CommonsForeign],beijing official really believe report fictitious ministry prove \n\nthere room foreign interference uk academic institution mr shuang \n\n,0.0,1.0,0.0,0.0,135,18,3
87358,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,194.0,69176.0,1478.0,2017-11-24 21:37:26,2019-11-08 16:45:25,15.0,"""We fear that... the Chinese government’s approach to Hong Kong is moving closer to 'One Country, One System' than it is to maintaining its treaty commitments under the Joint Declaration”, says the UK's Foreign Affairs Committee. #StandWithHK #HKSOS https://t.co/8wqFnjDZqQ","[standwithhk, hksos]",[],fear chinese government’s approach hong kong move close one country one system maintain treaty commitment joint declaration” say uk's foreign affair committee,0.124,0.775,0.101,-0.1531,158,22,2
96518,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,194.0,69203.0,1479.0,2017-11-24 21:37:26,2019-11-08 18:16:27,0.0,"//“Rest well on the other side. We will continue walking down this road for you.”\n\nChow will not be at Legco, but we will remember him. Hongkongers, keep fighting.//\n\nOur Op-Ed at @HongKongFP \nhttps://t.co/fWRNFiMKan\n\n#FightForFreedom \n#StandwithHK\n#IfNotNowWhen \n#DemocracyNow https://t.co/F3lbzHkqPV","[fightforfreedom, standwithhk, ifnotnowwhen, democracynow]",[@HongKongFP],“rest well side continue walk road you.”\n\nchow legco remember hongkongers keep fighting.//\n\nour op-ed \n\n\n \n\n,0.0,0.87,0.13,0.2732,108,15,4
99763,Stand_with_HK,a group of ordinary hongkongers urging the world to safeguard hong kong’s rights and freedoms. #standwithhk (rt ≠ endorsement),hong kong,194.0,69210.0,1479.0,2017-11-24 21:37:26,2019-11-08 18:20:48,16.0,"//“Rest well on the other side. We will continue walking down this road for you.”\n\nChow will not be at Legco, but we will remember him. Hongkongers, keep fighting.//\n\nOur Op-Ed at @HongKongFP \nhttps://t.co/fWRNFiMKan\n\n#FightForFreedom \n#StandwithHK\n#IfNotNowWhen \n#DemocracyNow","[fightforfreedom, standwithhk, ifnotnowwhen, democracynow]",[@HongKongFP],“rest well side continue walk road you.”\n\nchow legco remember hongkongers keep fighting.//\n\nour op-ed \n\n\n \n\n,0.0,0.87,0.13,0.2732,108,15,4


In [68]:
username_frequency = data['username'].value_counts()

In [69]:
username_frequency[username_frequency > 7]


username
xzxzanalazy      643
belleng324       540
wdamidoinhere    395
natalie_hoyin    272
hky2147          252
                ... 
Dawnwillcome1      8
yeung08572660      8
J_oktokki_k        8
irisluv3           8
HandelnH           8
Name: count, Length: 7498, dtype: int64

In [70]:
data[data['username'] == 'xzxzanalazy']

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
5483,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. #followbackhongkong",hong kong,708.0,680.0,27446.0,2018-03-04 14:47:07,2019-11-03 03:37:49,33.0,"Here are what happened yesterday in HK. \nIf you think it is ridiculous, it happens in HK almost everyday. \n@SolomonYue @SouthPark \n#HKprotests #HKPoliceState #StandwithHonKong #antichinazi #hongkongpolicebrutality #HKPoliceTerrorism https://t.co/ysaVesiAkg",[],"[@SolomonYue, @SouthPark]",happen yesterday hk \nif think ridiculous happens hk almost everyday,0.217,0.783,0.000,-0.3612,67,10,2
5716,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. #followbackhongkong",hong kong,708.0,680.0,27446.0,2018-03-04 14:47:07,2019-11-03 03:36:05,59.0,"#HKPolice attacked a firefighter, after they fired tear gas at the fire engine he was operating\n\nAbsolutely disgusting by the #HKPoliceTerrorists\n\nYou should support the Emergency service personnel (Fire Service and Hospital workers and Medics)\n\nSource: FB\n#SOSHK\n#HKPoliceState https://t.co/HiQJzLdbdf",[hkpolice],[],attack firefighter fire tear gas fire engine operating\n\nabsolutely disgust \n\nyou support emergency service personnel fire service hospital worker medics)\n\nsource fb\n\n,0.488,0.428,0.085,-0.9206,166,20,1
5730,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. #followbackhongkong",hong kong,708.0,680.0,27446.0,2018-03-04 14:47:07,2019-11-03 03:36:01,22.0,2 Nov 2019 Mongkok\n#HKPolice bombed people on the street with tear gas right over their heads\n#Murderer #HKRioters #HKPoliceBrutality #HKPoliceTerrorism #HKHumanRightsandDemocracyAct \n\n@benedictrogers @HawleyMO @WhiteHouse @senatemajldr @SolomonYue https://t.co/cIIzZ4PbHP,"[hkpolice, murderer, hkrioters]","[@benedictrogers, @HawleyMO, @WhiteHouse, @senatemajldr, @SolomonYue]",nov mongkok\n bomb people street tear gas right heads\n \n\n,0.286,0.714,0.000,-0.4939,56,10,2
5734,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. #followbackhongkong",hong kong,708.0,680.0,27446.0,2018-03-04 14:47:07,2019-11-03 03:35:59,636.0,#SOSHK It is worrying that #HongKong police allegedly target young female protestors for unjustified mass arrests. \nShot in HK in the city centre near Southern Hotel https://t.co/DFKewXWFYs,"[soshk, hongkong]",[],worry police allegedly target young female protestors unjustified mass arrest \nshot hk city centre near southern hotel,0.261,0.739,0.000,-0.6486,118,17,1
5754,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. #followbackhongkong",hong kong,708.0,680.0,27446.0,2018-03-04 14:47:07,2019-11-03 03:35:51,678.0,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430,152,23,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
240453,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. 🖐🏻😷🎗#followbackhongkong",hong kong,847.0,849.0,40113.0,2018-03-04 14:47:07,2019-11-19 16:15:14,7.0,"This is a video of last midnight,\nHong Kong Police arrest citizens on a roof top.hand tied behind their backs.\n\nWhat did they do?You need to tied them like this??unbelievable!!\n\n#HKPoliceTerrorism \n#soshongkong https://t.co/Ft6DLec2BL",[],[],"video last midnight,\nhong kong police arrest citizen roof top.hand tie behind backs.\n\nwhat do?you need tie like this??unbelievable!!\n\n",0.203,0.689,0.108,-0.3013,134,17,2
240553,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. 🖐🏻😷🎗#followbackhongkong",hong kong,847.0,849.0,40113.0,2018-03-04 14:47:07,2019-11-19 16:14:49,7.0,@hklemontea Tear gas was shot by #HKPoliceTerrorism https://t.co/NcvNlcK2cP,[hkpoliceterrorism],[@hklemontea],tear gas shot,0.000,1.000,0.000,0.0000,13,3,3
240580,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. 🖐🏻😷🎗#followbackhongkong",hong kong,847.0,849.0,40113.0,2018-03-04 14:47:07,2019-11-19 16:14:44,6293.0,"Under the Sino-Brit Joint Declaration China agreed to respect Hong Kong’s sovereignty. This weekend, mainland Chinese soldiers were deployed to Hong Kong's streets\n\nTrump admin officials should loudly oppose China's aggression &amp; support #HongKongProtests\nhttps://t.co/jxNBI1pRDt",[],[],sino-brit joint declaration china agree respect hong kong’s sovereignty weekend mainland chinese soldier deploy hong kong's streets\n\ntrump admin official loudly oppose china's aggression amp support,0.068,0.677,0.255,0.7269,198,25,5
240756,xzxzanalazy,"hongkonger, my love is for liverpool!!!! fight for freedom, please stand with hong kong. 🖐🏻😷🎗#followbackhongkong",hong kong,847.0,849.0,40113.0,2018-03-04 14:47:07,2019-11-19 16:14:05,20.0,The first of the group that made a last minute dash for freedom from Polytechnic University are taken away in handcuffs. The standoff is nearing an end. #HongKong https://t.co/F4yRI0k5hg,[],[],first group make last minute dash freedom polytechnic university take away handcuffs standoff near end,0.000,0.769,0.231,0.6369,102,15,5


In [71]:
data

Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
0,five5demands,hker fight for freedom 當獨裁成為事實 革命就是義務 #followbackhk,n.a,437.0,260.0,466.0,2019-10-05 13:40:37,2019-11-03 02:57:49,292.0,"Disgusting: Police in #HongKong are marking protestors with a pen, and writing a number on their hands just like the Nazis did in Germany. #China https://t.co/aAF6Jw7yXN",[hongkong],[],disgust police mark protestors pen write number hand like nazi germany,0.218,0.391,0.391,0.2732,70,11,4
1,Dejavu53328974,#followbackhongkong\n#standwithhk\n#hkhumanrightsanddemocracyact\n #科勞手足\n#hongkongprotesters,hong kong,581.0,360.0,7835.0,2019-08-27 15:47:34,2019-11-03 02:57:49,1315.0,"#China-made teargas canister stuck in the road. Citizen tried to pull it out, but failed\nCredit - @hkgetv \n#PoliceState #StandWithHongKong #FreeHongKong #HongKong #HongKongProtests https://t.co/IkyHfFtrPH","[china, policestate]",[@hkgetv],teargas canister stick road citizen try pull failed\ncredit,0.256,0.543,0.202,-0.1779,58,8,2
2,tsksimon,#fightforfreedom #standwithhongkong,n.a,825.0,456.0,6728.0,2014-10-05 15:57:07,2019-11-03 02:57:49,105.0,"This is only tiny bits of #HKPoliceBrutality.\nWe've been protesting for 5 months, yet the govn't still haven't established an Independent Commission of Inquiry into police conduct -- #HKPolice are allowed to torture and insult civilians w/o consequences. #HongKongProtests #SOSHK https://t.co/Ew6yVG5p8V",[hkpolicebrutality],[],tiny bit \nwe've protest month yet govn't still establish independent commission inquiry police conduct allow torture insult civilian w/o consequence,0.339,0.590,0.070,-0.8074,148,20,1
3,JimmyWo67187904,n.a,n.a,22.0,1.0,656.0,2019-10-08 12:58:51,2019-11-03 02:57:48,660.0,"A fireman complained to the #HongKongPolice for the unnecessary firing of Tear Gas that landed on the fire engine. Fireman was affected by the TG as well.\n\nA few police had lost their cool; confronted, ”we are dispersing crowd” and pushed the fireman away. \n#hkpolicestate https://t.co/AAGFkuzmxL",[hongkongpolice],[],fireman complain unnecessary firing tear gas land fire engine fireman affect tg well.\n\na police lose cool confront ”we disperse crowd” push fireman away,0.364,0.498,0.137,-0.7430,152,23,1
4,ARCHI418,n.a,n.a,19.0,11.0,121.0,2014-06-19 16:19:36,2019-11-03 02:57:46,0.0,"#police threw a #Teargas hand #grenade. It blasted on a citizen’s back and made a large area burn. It is not tear gas, it is read hand grenade and killing #hkger. #PoliceState #PoliceTerrorism #HKprotests https://t.co/hcDG6FNd7z","[police, teargas, grenade, hkger, policestate, policeterrorism, hkprotests]",[],threw hand blast citizen’s back make large area burn tear gas read hand grenade kill,0.203,0.519,0.277,0.1779,84,15,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
248646,zizihui1,"five demands, not one less 🇭🇰#followbackhongkong #科勞手足",hong kong,505.0,326.0,1301.0,2019-09-25 17:05:21,2019-11-19 17:19:37,72.0,"@marcorubio CCP is sending PLA to Hong Kong already, using exactly the same tactics as it invaded East Turkestan\nCCP has no intention to keep ""one country, two systems"", otherwise HKers would have genuine autonomy 22 years ago, CCP is not trustworthy \n#antichinazi https://t.co/l5FQVFBFC1",[],[@marcorubio],ccp send pla hong kong already use exactly tactic invade east turkestan\nccp intention keep one country two system otherwise hkers would genuine autonomy year ago ccp trustworthy,0.000,0.882,0.118,0.5574,177,27,5
248647,Candy96023268,#手足互科 #standwithhongkong,hong kong,15.0,155.0,116.0,2019-09-12 06:32:09,2019-11-19 17:19:36,796.0,"As violence continues to escalate in Hong Kong, the cause for freedom has never been greater. The world is watching as police shamefully attack students. I will continue to support &amp; stand w the students &amp; brave activists fighting to preserve freedom in #HongKong. https://t.co/mMncxgAehL",[],[],violence continue escalate hong kong cause freedom never great world watch police shamefully attack student continue support amp stand student amp brave activist fight preserve freedom,0.337,0.358,0.305,-0.1265,184,26,2
248648,Vincent03231883,hongkonger never surrender\n願榮光歸香港\n#followbackhongkong #科勞手足,n.a,248.0,170.0,362.0,2019-09-28 14:29:13,2019-11-19 17:19:36,1162.0,“We call on Chief Exec. Carrie Lam to promote accountability by supplementing the Independent Police Complains Council review with an independent investigation into the protest-related incidents.” - @SecPompeo on the political unrest in #HongKong. READ: https://t.co/RZMCdcgLQN https://t.co/i00X8iM8ky,[],[@SecPompeo],“we call chief exec carrie lam promote accountability supplement independent police complains council review independent investigation protest-related incidents.” political unrest read,0.107,0.785,0.107,0.0000,184,21,3
248649,terryyipomni,hongkonger 🇭🇰 stand with hong kong! fight for freedom! 香港人加油!! #followbackhongkong #fivedemandsnotoneless #standwithhongkong #科勞手足 #fightforfreedom,hong kong,4653.0,2490.0,4627.0,2015-03-09 06:57:20,2019-11-19 17:19:36,9.0,"Huh excuse me😃, Hong Kong government lie again and again \n.\n#SOSHK\n#StandWithHongKong \n#HongKongProtests \n#HongKongPoliceTerrorists \n#PolyU https://t.co/W5vRNMePOR","[soshk, standwithhongkong, hongkongprotests]",[],huh excuse hong kong government lie \n.\n\n,0.000,0.794,0.206,0.0772,40,7,3


In [72]:
original_df = data[(data['retweetcount'] > 20) & (data['totaltweets'] > 20) &  (data['followers'] > 20)]

In [73]:
# Now, let's find the rows where user_name is 'SolomonYue'
original_df[original_df['username'] == 'SolomonYue']



Unnamed: 0,username,acctdesc,location,following,followers,totaltweets,usercreatedts,tweetcreatedts,retweetcount,text,hashtags,mentioned_users,cleaned_text,neg,neu,pos,compound,numchars,numwords,sentiment_class
107827,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",240.0,87014.0,10967.0,2013-04-09 15:38:48,2019-11-09 02:26:03,581.0,Picking up my passport to travel to Hong Kong to pray and be a witness for freedom! I also invite you to join me in Chater Garden Nov. 15-18 at 9 AM and 7 PM! #StandWithHongKong #HongKong ⁦@Andychanhotin⁩ ⁦@FreedomHKG⁩ ⁦@Fight4HongKong⁩ ⁦@mschlapp⁩ https://t.co/F91UmnT1s2,[],"[@Andychanhotin⁩, @FreedomHKG⁩, @Fight4HongKong⁩, @mschlapp⁩]",pick passport travel hong kong pray witness freedom also invite join chater garden nov pm,0.0,0.516,0.484,0.8519,89,15,5
110216,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",240.0,87028.0,10969.0,2013-04-09 15:38:48,2019-11-09 02:47:15,45.0,"Queen Elizabeth Hospital staff set up a temporary memorial last night with flowers, paper cranes and a Lennon wall in memory of HKUST student Alex Chow. Source: USP Photo: Kevin Cheng #StandWithHongKong #RIPChow https://t.co/Mlen9hh8om",[],[],queen elizabeth hospital staff set temporary memorial last night flower paper crane lennon wall memory hkust student alex chow source usp photo kevin cheng,0.0,1.0,0.0,0.0,155,24,3
110665,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",240.0,87030.0,10969.0,2013-04-09 15:38:48,2019-11-09 02:43:18,688.0,Watch #HongKong protesters completely dismantle a road barricade in 22 seconds\n😭😭😭😭😭 \nhttps://t.co/2IMjHNBQ5T https://t.co/USVf4nQ8Dz,[hongkong],[],watch protester completely dismantle road barricade seconds\n,0.0,1.0,0.0,0.0,60,7,3
122329,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",240.0,87304.0,11000.0,2013-04-09 15:38:48,2019-11-09 16:24:40,38.0,IPCC hires intl experts.\nIntl experts complaints IPCC lack of capacity. #antiELAB #AntiMaskLaw #HongKongProtests https://t.co/WabhmtLaWw,"[antielab, antimasklaw, hongkongprotests]",[],ipcc hire intl experts.\nintl expert complaint ipcc lack capacity,0.36,0.64,0.0,-0.5423,64,9,2
154610,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",241.0,89385.0,11136.0,2013-04-09 15:38:48,2019-11-11 13:50:21,35.0,@SolomonYue 9/6/1987 Yonsei University and 11/11/2019 Hong Kong Chinese University 😢🇨🇳🔪🇭🇰☠️ WHERE IS JUSTICE? #StandwithHK https://t.co/hdmqXIPiS2,[standwithhk],[@SolomonYue],yonsei university hong kong chinese university justice,0.0,0.638,0.362,0.5267,54,7,4
173026,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",241.0,90521.0,11263.0,2013-04-09 15:38:48,2019-11-12 14:53:37,447.0,".@standnewshk reporter: ""Why can't the police inspector directly order frontline police to stop firing?""\n\nDennis Ng, Pro-VC of #CUHK, ""They did already, a few times,"" shaking his head. Au Nok-hin adds that there's a disconnect between inspector &amp; frontline police. \n#HongKong #HK https://t.co/qQC9uM0nUc",[],[@standnewshk],"reporter can't police inspector directly order frontline police stop firing?""\n\ndennis ng pro-vc already time shake head au nok-hin add there's disconnect inspector amp frontline police",0.14,0.86,0.0,-0.4404,184,25,2
219520,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",244.0,97986.0,11965.0,2013-04-09 15:38:48,2019-11-16 13:49:16,218.0,"According to the #BasicLaw and the underlying #1C2S principles, #PLA can *ONLY* be deployed under #HongKong Government's REQUEST for 2 reasons:\n1. Maintain public order\n2. Disaster relief\n \nIs PLA picking up bricks requested by HK Government? Is it about public order / disaster? https://t.co/Y8mFo5SR0I https://t.co/mFdQE9gZ2U","[basiclaw, 1c2s, pla, hongkong]",[],accord underlying principle deploy government's request maintain public disaster relief\n \nis pla pick brick request hk government public order disaster,0.29,0.601,0.11,-0.7269,151,20,1
220867,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",244.0,97991.0,11965.0,2013-04-09 15:38:48,2019-11-16 13:42:51,406.0,"Pan-dems say Saturday's PLA deployment in #HongKong breached the law one way or another, while a group of protesters warn ""voluntary service"" today can easily be ""violent suppression"" tomorrow https://t.co/Cv9RowPkHS",[hongkong],[],pan-dems say saturday's pla deployment breach law one way another group protesters warn voluntary service today easily violent suppression tomorrow,0.281,0.625,0.094,-0.5859,147,20,1
221360,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",244.0,98028.0,11969.0,2013-04-09 15:38:48,2019-11-16 14:15:09,835.0,"Highly contentious move as the PLA directly intervenes to clear protester barricades. Under the Basic Law, the PLA “shall not interfere in the local affairs” of the SAR. Carrie Lam silent on this so far #HK #HongKongProtests #StandWithHongKong https://t.co/Ba0YNtxevt",[],[],highly contentious move pla directly intervene clear protester barricade basic law pla “shall interfere local affairs” sar carrie lam silent far,0.103,0.789,0.108,0.0276,144,21,3
229015,SolomonYue,"vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee","salem, oregon, usa",244.0,98231.0,11980.0,2013-04-09 15:38:48,2019-11-16 15:22:13,81.0,Hong Kong journalism groups condemn alleged police firing of projectile at reporter as officer put on leave https://t.co/zXwqbgzpWu @krislc #HongKong #China #antielab #antiELABhk #HongKongProtests,[],[@krislc],hong kong journalism group condemn allege police fire projectile reporter officer put leave,0.383,0.617,0.0,-0.6369,91,13,1


In [74]:
original_df.shape

(123104, 20)

In [75]:
def find_users(df):
    # df: dataframe to look at
    # returns a list of usernames
    
    # Create empty list
    list_users = []
    
    for i in range(0, len(df)):
        users_ith_text = re.findall('@[^\s]+', df.iloc[i,:]['text'])
        # returns a list
        # append to list_users by going through a for-loop:
        for j in range(0, len(users_ith_text)):
            list_users.append(users_ith_text[j])
    
    return list_users

In [76]:
# Apply on dataframe data['text']
list_users = find_users(data)

mentioned_users_df = pd.DataFrame({
    'mentioned_users': list_users
})

mentioned_users_df.head()

Unnamed: 0,mentioned_users
0,@hkgetv
1,@SenateMajLdr
2,@SolomonYue
3,@SenRickScott
4,@seariousforhkg


In [77]:
# Save the DataFrame to a CSV file
original_df.to_csv('original_df.csv', index=False)


In [78]:
original_df = pd.read_csv('/Users/shiruizhou/Desktop/network analysis/hk_2019_social network/original_df.csv')

In [None]:
original_df.shape

(123104, 20)

In [1]:
unique_username_count = original_df['username'].nunique()
print(unique_username_count)

NameError: name 'original_df' is not defined

In [81]:
import pandas as pd

# Assuming 'original_df' is your original DataFrame containing tweet and user information

# Create a dictionary to store user information and their associated tweets
user_data = {}

# Iterate through the original DataFrame
for index, row in original_df.iterrows():
    username = row['username']
    user_info = {
        'following': row['following'],
        'followers': row['followers'],
        'totaltweets': row['totaltweets'],
        'usercreatedts': row['usercreatedts'],
        'location': row['location'],
        'acctdesc': row['acctdesc']
    }
    tweet_info = {
        'tweetcreatedts': row['tweetcreatedts'],
        'retweetcount': row['retweetcount'],
        'hashtags': row['hashtags'],
        'mentioned_users': row['mentioned_users'],
        'cleaned_text': row['cleaned_text'],
        'compound': row['compound'],    # Added 'compound' field
        'sentiment_class': row['sentiment_class']    # Added 'sentiment_class' field
    }
    # If the user already exists in the dictionary, add the tweet information to the user's list of tweets
    if username in user_data:
        user_data[username]['tweets'].append(tweet_info)
    # If the user does not exist, create a new entry with the user and tweet information
    else:
        user_data[username] = {'user_info': user_info, 'tweets': [tweet_info]}

# Create a list to store rows for the transformed DataFrame
transformed_rows = []

# Iterate through the user_data dictionary to create rows for the transformed DataFrame
for username, user_info in user_data.items():
    user_row = {'username': username}
    # Add user information to the user_row
    user_row.update(user_info['user_info'])
    # Add tweet information as separate columns for each tweet associated with the user
    for i, tweet in enumerate(user_info['tweets'], start=1):
        user_row[f'tweetcreatedts_{i}'] = tweet['tweetcreatedts']
        user_row[f'retweetcount_{i}'] = tweet['retweetcount']
        user_row[f'hashtags_{i}'] = tweet['hashtags']
        user_row[f'mentioned_users_{i}'] = tweet['mentioned_users']
        user_row[f'cleaned_text_{i}'] = tweet['cleaned_text']
        user_row[f'compound_{i}'] = tweet['compound']    # Added 'compound' field
        user_row[f'sentiment_class_{i}'] = tweet['sentiment_class']    # Added 'sentiment_class' field
    transformed_rows.append(user_row)

# Create the transformed DataFrame
transformed_df = pd.DataFrame(transformed_rows)

# Display the transformed DataFrame
print(transformed_df)


In [None]:
transformed_df[transformed_df['cleaned_text_332'].notnull()]


Unnamed: 0,username,following,followers,totaltweets,usercreatedts,location,acctdesc,tweetcreatedts_1,retweetcount_1,hashtags_1,...,cleaned_text_331,compound_331,sentiment_class_331,tweetcreatedts_332,retweetcount_332,hashtags_332,mentioned_users_332,cleaned_text_332,compound_332,sentiment_class_332
9813,belleng324,2562.0,730.0,37698.0,2019-08-22 09:11:37,n.a,n.a,2019-11-08 16:37:35,100.0,[],...,break car windows baton attempt arrest driver ...,-0.6996,1.0,2019-11-12 14:31:16,187.0,['hongkongprotests'],"['@annielab_jmsc', '@business:']",spot fake news \n\nuniversity hong kong studen...,-0.4767,2.0


In [None]:
pd.set_option('display.max_colwidth', None)

In [None]:
transformed_df[transformed_df['username'] == 'SolomonYue']

Unnamed: 0,username,following,followers,totaltweets,usercreatedts,location,acctdesc,tweetcreatedts_1,retweetcount_1,hashtags_1,...,cleaned_text_331,compound_331,sentiment_class_331,tweetcreatedts_332,retweetcount_332,hashtags_332,mentioned_users_332,cleaned_text_332,compound_332,sentiment_class_332
11304,SolomonYue,240.0,87014.0,10967.0,2013-04-09 15:38:48,"salem, oregon, usa","vice chairman & ceo at republicans overseas, rnc member since 2000, co-founder of rnc republican national conservative caucus & conservative steering committee",2019-11-09 02:26:03,581.0,[],...,,,,,,,,,,


In [None]:
transformed_df.to_csv('transformed_df.csv', index=False)

### Hashtag and time windows

In [None]:
data

NameError: name 'data' is not defined

In [None]:
from datetime import datetime

ind_to_drop = []
date = []

# First find out which 'tweetcreatedts' is not a string or in other weird formats
for i in range(0, len(data)):
    ith_date_str = data.iloc[i,:]['tweetcreatedts']
    ith_match = re.search(r'\d{4}-\d{2}-\d{2}', ith_date_str)
    if ith_match == None:
        ind_to_drop.append(i)
    else:
        continue

NameError: name 'data' is not defined