Skip to content

The code hosted under this repository is for research and may be undocumented/partially documented and is primarily provided to help researchers replicate results.

Notifications You must be signed in to change notification settings

centre-for-humanities-computing/china-twitter

Repository files navigation

Tables

Table 1: Dataset

Data collection
Date range Nov 1, 2019 - Feb 28, 2021
Days collected 486
Handles collected 46
Diplomacy 34
Media 12
Datasets Number of cases
1. Total original tweets and retweets collected (network analysis) 343.148
1a. Diplomacy original tweets 37.344
1b. Diplomacy retweets 23.512
1c. Media original tweets 253.578
1d. Media retweets 28.714
2. Subsample of original tweets in English (topic analysis) 239.943
2a. Diplomacy 25.830
2.b Media 214.113
3. Coded sample of diplomacy original tweets in English (discourse analysis) 4.879

Table 2: Diplomacy

@handle User Followers Followings Original tweets in period Original tweets in English Retweets by handle in period Retweets in English Hashtags in original tweets Hashtags in retweets Total tweets in period Total tweets since created Date created
@Amb_ChenXu CHEN Xu, Ambassador, Permant Representative of the P.R.C. to the U.N. office in Geneva, Switzerland 8036 68 299 298 49 46 22 41 348 414 2019.12
@AmbassadeChine Embassy of the P.R.C. in Paris, France 36568 1136 5485 136 1365 263 2349 2290 6850 8350 2019.08
@AmbCina Embassy of the P.R.C. in Rome, Italy 35146 151 1553 0 137 95 3715 160 1690 2866 2018.05
@AmbCuiTiankai CUI Tiankai, former Chinese Ambassador to the U.S. 135871 43 260 250 29 28 150 33 289 416 2019.06
@AmbLiuXiaoMing LIU Xiaoming, former Chinese Ambassador to the U.K. 121828 46 4606 4465 11 11 2865 10 4617 4745 2019.10
@CCGBelfast Consulate General of the P.R.C. in Belfast, U.K. 964 8 2 2 2 2 0 0 4 4 2020.03
@China_Lyon Consulate General of the P.R.C. in Lyon, France 721 323 25 2 179 8 10 164 204 300 2020.03
@ChinaAmbUN ZHANG Jun, Ambassador, Permanent Representative of the P.R.C. to the U.N. 10395 231 406 381 294 290 174 218 700 1390 2020.02
@ChinaCGCalgary Consulate General of the P.R.C. in Calgary, Canada 2993* 196* 1292 1260 16 16 49 5 1308 2305 2019.12
@chinacgedi Consulate General of the P.R.C. in Edinburgh, U.K. 1463 23 37 37 89 86 8 115 126 241 2020.02
@ChinaCGMTL Consulate General of the P.R.C. in Montreal, Canada 484* 68* 103 26 86 57 39 100 189 1430 2020.01
@ChinaConsulate Consulate General of the P.R.C. in Chicago, U.S. 3676 146 442 428 81 81 139 61 523 1309 2017.02
@ChinaConSydney Consulate General of the P.R.C. in Sydney, Australia 5851* 340* 2081 2066 333 321 2285 329 2414 4746 2020.04
@ChinaEmbGermany Embassy of the P.R.C. in Berlin, Germany 4409 214 929 50 449 354 1217 431 1378 1820 2019.12
@ChinaEmbOttawa Embassy of the P.R.C. in Ottawa, Canada 13176* 335* 559 543 687 680 27 768 1246 4723 2014.06
@ChinaEUMission Mission of the P.R.C. to the E.U. 21131 1913 1865 1675 292 286 1427 293 2157 10000 2013.09
@ChinaInDenmark Embassy of the P.R.C. in Copenhagen, Denmark 1255 511 488 470 300 291 108 361 788 2099 2017.05
@ChinainVan Consulate General of the P.R.C. in Vancouver, Canada 716* 57* 1 1 2 2 0 0 3 727 2021.02
@Chinamission2un Mission of the P.R.C. to the U.N. 63414 619 1292 1263 1637 1617 796 1613 2929 4744 2015.04
@ChinaMissionGva Mission of the P.R.C. to the U.N. office in Geneva, Switzerland 3482 175 1146 1069 1401 1379 584 1076 2547 4526 2015.05
@ChinaMissionVie Mission of the P.R.C. to the U.N. office in Vienna, Austria 3976 396 655 654 149 148 924 123 804 1081 2019.10
@chinascio State Council Information Office of the P.R.C. 47975 172 2478 2465 102 100 4062 103 2580 16800 2015.09
@ChineseEmbinUK Embassy of the P.R.C. in London, U.K. 28965 40 1670 1395 139 138 846 110 1809 2456 2019.11
@ChineseEmbinUS Embassy of the P.R.C. in Washington, D.C., U.S. 86782 255 1347 1262 648 630 1256 681 1995 2375 2019.06
@CHN_UN_NY Spokesperson of Mission of the P.R.C. to the U.N. 1438 54 254 253 969 942 24 673 1223 1906 2020.05
@ChnConsul_osaka Consulate General of the P.R.C. in Osaka, Japan 18171* 741* 477 1 673 59 99 344 1150 3125 2019.09
@ChnEmbassy_jp Embassy of the P.R.C. in Tokyo, Japan 91103* 774* 1110 7 500 103 27 168 1610 5172 2014.04
@ChnMission LIU Yuyin, Spokesperson, Permanent Representative of the P.R.C. to the U.N. office in Geneva, Switzerland 1011 97 11 10 189 178 0 22 200 1158 2020.01
@consulat_de Consulate General of the P.R.C. in Strasbourg, France 955 361 565 6 1412 513 1631 1642 1977 2482 2020.02
@GeneralkonsulDu DU Xiaohui, Consul General, Consulate General of the P.R.C. to Hamburg, Germany 1581 85 291 23 325 102 292 405 616 690 2020.02
@MFA_China Ministry of Foreign Affairs, Beijing, P.R.C. 298033 160 1630 1595 1754 1234 636 1222 3384 4177 2019.10
@SpokespersonCHN HUA Chunying, Spokesperson & Director General, Information Department, Ministry of Foreign Affairs, Beijing, P.R.C. 894507 159 2223 2042 132 122 2136 100 2355 3522 2019.10
@SpokespersonHZM HU Zhaoming, Spokesperson & Director General, Bureau of Public Information and Communication, International Department, C.P.C. Central Committee, Beijing, P.R.C. 7707 35 97 97 0 0 13 0 97 150 2020.04
@zlj517 ZHAO Lijian, Spokesperson & Deputy Director General, Information Department, Ministry of Foreign Affairs, Beijing, P.R.C. 960093 174492 1665 1598 9081 8423 326 8506 10746 65400 2010.05

(*) metadata retrieved on 25.02.2022, whereas the rest were retrieved on 21.06.2021

Table 3: Media

@handle User Followers Followings Original tweets in period Original tweets in English Retweets by handle in period Retweets in English Hashtags in original tweets Hashtags in retweets Total tweets in period Total tweets since created Date created
@CGTNOfficial China Global Television Network (CGTN) 13528250 70 44640 43984 12883 12825 44498 7598 57523 174500 2013.01
@chenweihua CHEN Weihua, China Daily E.U. Bureau Chief and columnist 98677 2814 10015 9417 12768 12349 0 4436 22783 38400 2009.11
@ChinaDaily China Daily 4284437 537 38612 38412 1658 1649 69934 3343 40270 152400 2009.11
@CNS1952 China News Service 475273 146 20204 0 3 0 4877 1 20207 59000 2013.07
@globaltimesnews Global Times 1870039 520 59120 58646 573 568 80108 794 59693 191100 2009.06
@HuXijin_GT HU Xijin, Global Times Editor-in-chief 439720 670 880 880 14 14 2 18 894 2551 2014.08
@PDChina People's Daily 6928270 4360 17972 17933 93 93 18144 111 18065 99400 2011.05
@PDChinese People's Daily (Chinese) 753245 332 13577 0 0 0 3851 0 13577 52300 2013.06
@QiushiJournal Qiushi Journal 1691 158 133 128 0 0 352 0 133 388 2020.05
@shen_shiwei SHEN Shiwei, CGTN News Producer 36485 4956 4922 4554 377 357 9740 398 5299 6902 2012.05
@XHNews Xinhua News 12395089 65 40199 40019 29 29 21828 10 40228 202300 2012.02
@XinWen_Ch Voice of China 4242 1221 3304 140 316 26 208 79 3620 3793 2019.12

Table 4: Followings within the network

@Amb_ChenXu follows… @AmbassadeChine follows… @AmbCina follows… @AmbCuiTiankai follows… @AmbLiuXiaoMing follows… @CCGBelfast follows… @China_Lyon follows… @ChinaAmbUN follows… @ChinaCGCalgary follows… @chinacgedi follows… @ChinaCGMTL follows… @ChinaConsulate follows… @ChinaConSydney follows… @ChinaEmbGermany follows… @ChinaEmbOttawa follows… @ChinaEUMission follows… @ChinaInDenmark follows… @ChinainVan follows… @Chinamission2un follows… @ChinaMissionGva follows… @ChinaMissionVie follows… @chinascio follows… @ChineseEmbinUK follows… @ChineseEmbinUS follows… @CHN_UN_NY follows… @ChnConsul_osaka follows… @ChnEmbassy_jp follows… @ChnMission follows… @consulat_de follows… @GeneralkonsulDu (deleted account) @MFA_China follows… @SpokespersonCHN follows… @SpokespersonHZM follows… @zlj517 follows… @CGTNOfficial follows… @chenweihua follows… @ChinaDaily follows… @CNS1952 follows… @globaltimesnews follows… @HuXijin_GT follows… @PDChina follows… @PDChinese follows… @QiushiJournal follows… @shen_shiwei follows… @XHNews follows… @XinWen_Ch follows…
@Amb_ChenXu N/A x x x x x x x x x x x x x x x x x x x x x x x x x 25
@AmbassadeChine x N/A x x x x x x x x x x x x x x x x x x x x x x x x x x x 28
@AmbCina x N/A x x x x x x x x x x x x x x x 16
@AmbCuiTiankai x x x N/A x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 38
@AmbLiuXiaoMing x x x N/A x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 39
@CCGBelfast N/A x x x x x x x x x x 10
@China_Lyon x N/A x x x x x x x x x x x x 13
@ChinaAmbUN x x x x N/A x x x x x x x x x x x x x x x x x x x x x x x x x x 30
@ChinaCGCalgary x x x N/A x x x x x x x x x x x x x x x x x x 21
@chinacgedi x x x N/A x x x x x x x x x 12
@ChinaCGMTL x x N/A x x x x x x 8
@ChinaConsulate x x x x x x N/A x x x x x x x x x x x x x x x 21
@ChinaConSydney x x x x N/A x x x x x x x x x x x x x x 18
@ChinaEmbGermany x x x x N/A x x x x x x x x x x x x x 17
@ChinaEmbOttawa x x x x x x x x N/A x x x x x x x x x x x x x x x x x x x x 28
@ChinaEUMission x x x x x x x x x x x x N/A x x x x x x x x x x x x x x x x x x x x x 33
@ChinaInDenmark x x x x N/A x x x x x x x 11
@ChinainVan x x x N/A x x x x x 8
@Chinamission2un x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x x x x x x x x x x 37
@ChinaMissionGva x x x x x x x N/A x x x x x x x x x x x 18
@ChinaMissionVie x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x x 30
@chinascio x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x x x x x 33
@ChineseEmbinUK x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x 33
@ChineseEmbinUS x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x x x x x 37
@CHN_UN_NY x x x x x N/A x x x x x 10
@ChnConsul_osaka x x x x x x x x x x N/A x x x x x x 16
@ChnEmbassy_jp x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x 27
@ChnMission x x x x x x x N/A x x x x 11
@consulat_de x x x x x x x N/A x x x x 11
@GeneralkonsulDu (deleted account) N/A 0
@MFA_China x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x 41
@SpokespersonCHN x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x x x x x 41
@SpokespersonHZM x x x x x x x x x x x x x N/A x x x x x x 19
@zlj517 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x 40
@CGTNOfficial x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x x x 37
@chenweihua x x x x x x x x x x x x x x N/A x x x x x 19
@ChinaDaily x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x x 38
@CNS1952 x x x x x x x x x N/A x x 11
@globaltimesnews x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x x x 37
@HuXijin_GT x x x x x x x x x x x x x x x x x N/A x x x x 21
@PDChina x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A x x x x 39
@PDChinese x x x x x x x x x x x x x x x x x x x x N/A x x 22
@QiushiJournal x x x x x x x x x x x x x x N/A x 15
@shen_shiwei x x x x x x x x x x x x x x x N/A x 16
@XHNews x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x N/A 39
@XinWen_Ch x N/A 1
15 36 20 8 9 29 38 22 29 15 17 26 30 29 23 40 18 20 28 22 38 25 17 34 11 39 29 20 30 0 33 34 13 43 7 36 22 17 21 18 19 6 30 39 7 13

Table 5a: Popular hashtags in original tweets and retweets from all diplomats

Hashtags Number of cases
#covid19 4783
#china 3537
#coronavirus 1355
#xinjiang 1102
#us 994
#chine 855
#hongkong 753
#wuhan 572
#cina 498
#xijinping 478
#cpec 347
#beijing 340
#bri 294
#ciie 254
#5g 246
#covid 245
#covid_19 240
#who 189
#shanghai 188
#un 180
#poverty 177
#畅游友城 176
#tibet 174
#voyagezdanslesvilleschinoisesjumelées 171
#pompeo 163
#pressroomhighlights 161
#chinese 158
#multilateralism 155
#vaccine 144
#unsc 143

Table 5b: Popular hashtags in original tweets from MFA and SCIO

Hashtags Number of cases
#covid19 772
#china 484
#us 342
#xinjiang 162
#pressroomhighlights 159
#coronavirus 141
#poverty 116
#wuhan 98
#hongkong 91
#beijing 71
#trade 67
#5g 60
#tibet 59
#pompeo 56
#beltandroad 54
#vaccines 51
#economy 47
#springfestival 44
#xijinping 41
#un 38
#twosessions 37
#cpc 36
#ciie 34
#humanrights 34
#nationalsecurity 33
#who 33
#vaccine 32
#economic 31
#gdp 31
#multilateralism 31

Table 5c: Popular hashtags in original tweets from embassies, ambassadors and missions

Hashtags Number of cases
#covid19 2612
#china 1935
#xinjiang 633
#coronavirus 599
#chine 514
#cina 485
#hongkong 450
#wuhan 278
#xijinping 247
#畅游友城 176
#covid 172
#voyagezdanslesvilleschinoisesjumelées 171
#us 159
#covid_19 152
#beijing 138
#5g 136
#ciie 131
#xizang 124
#shanghai 115
#nationalsecuritylaw 114
#魅力疆藏 102
#who 100
#uk 98
#covidー19 90
#jiangsu 90
#chinese 89
#multilateralism 89
#climatechange 85
#hksar 85
#forzacinaitalia 83

Table 6: Foreign influencers mentioning Chinese diplomats and media

@handle User
@AndyBxxx Andy Boreham (Reports On China)
@BarrettYouTube Lee and Oli Barrett
@BeehiveChina Barrie Jones (Best China Info)
@ChinaTeacher1 Fernando Munoz Bernal (FerMuBe)
@DanielDumbrill Daniel Dumbrill
@JaYoeLife Matthew Galat
@Jingjing_Li Li Jingjing
@LivingChina Jason Lightfoot (Living in China)
@Noel_Calibre Noel Lee
@thecyrusjanssen Cyrus Janssen

Influencers identified by ASPI: https://www.aspi.org.au/report/borrowing-mouths-speak-xinjiang

Figures

Figure 1: Top mentionees (in-degree) and mentioners (out-degree)

alt text

Figure 2: Total mentions to handles in network from all Twitter users

alt text

Figure 3: Total mentions between handles in network

alt text

Figure 4: Mentionees in network (in-degree)

alt text

Figure 5: Mentioners in network (out-degree)

alt text

Figure 6 (diplomacy) open here

Figure 7 (media) open here

Figure 8a

alt text

Figure 8b

alt text

Figure 8c

alt text

Figure 8d

alt text

Extract data

The code used for creating dataframes from the JSON files can be found in the extract_data folder.

Topic model

Latent Dirichlet Allocation topic modelling using gensim package in Python (See documentation: https://radimrehurek.com/gensim_3.8.3/models/ldamodel.html). LDA is a hierarchical Bayesian model with three levels, in which each item of a collection, in this case tweets, is modeled as a finite mixture over an underlying set of topics. In turn, each topic is modeled as an infinite mixture over an underlying set of topic probabilities. An explicit representation of each tweet is provided by the topic probabilities.

A total of 180 models were trained for both diplomat and media tweets with a variation of the following three hyperparameters:

  • Number of Topics (K)
    • The topic model was trained requesting 10, 15, 20, 25, 30 and 35 latent topics
  • Dirichlet hyperparameter alpha: A-priori document-topic density
    • The topic model was trained using 6 different a-priori beliefs about the document-topic density, including 0.01, 0.31, 0.61, 0.91, symmetric equation and asymmetric equation
  • Dirichlet hyperparameter beta: A-priori word-topic density
    • The topic model was trained using 5 different a-priori beliefs about the word-topic density, including 0.01, 0.31, 0.61, 0.91 and symmetric equation

The model with the best equation coherence score is chosen for analysis.

Usage:

  1. Navigate to the topic model folder
cd topic_model
  1. Install requirements
  • Pip install
pip install -r requirements.txt
  • Download en_core_web_sm
python -m spacy download en_core_web_sm
  1. Lemmatization and cleaning of tweets
python preprocess/prep_text.py
  1. Generate multiple models with a variety of hyperparameters
python preprocess/gen_model.py
  1. Evaluate the topic models created above, and determining which is the best one
python preprocess/eval_model.py

When the model has been generated using above commands, run the code in the topic_model.ipynb to visualize the results. Furthermore, visualisations of how prevalent each topic was over time (averaged topic weight) can be found in the topics_over_time.ipynb.

Semantic kernel

There are three parameters which can be adjusted:

  • First order associations: Indicates the number associations wanted from each of the seeds (Written in uppercase letters along side the seeds in the graph)
  • Second order associations: Indicates the number of associations wanted from each of the first order associations. (Written in lowercase in the graph)
  • Pruning: Can be set to none, soft and hard. According to the pruning settings, words that are not linked closely enough to the rest of the graph across the hierarchical levels are removed.

Input given to the model is lists of seeds. Each list should be written as a seperate txt file in the res folder.

Usage:

  1. Navigate to the semantic kernel folder
cd semantic_kernel
  1. Prepare data for semantic kernel (the data used is what was preprocessed for the topic model)
python prep_semantic/create_subsets.py

python prep_semantic/csv2txt.py -i data/text_diplomat.csv -o data/data_semantic/text_diplomat
python prep_semantic/csv2txt.py -i data/text_diplomat_orig.csv -o data/data_semantic/text_diplomat_orig
  1. Train model and generate graphs

First time running make sure to set train to True

cd semantic_kernel/semantic-kernel

run_diplomats.sh
run_diplomats_orig.sh
  1. Tweaking of parameters
  • Pruning: modify the txt file in prun folder
  • First and second order associations: modify the txt file in assoc folder

Network Analysis

Network analysis performed using the networkx package in python (https://networkx.org/) and the network visualizations are generated from the file network_main.py (see usage below). Nodes in the networks are Twitter handles, and edges (connections) are weighted by the number of mentions between the Twitter handles that are displayed. The network visualizations only plot Twitter handles that are either flagged as (i) Chinese diplomats or (ii) Chinese media outlets. The edgewidth (strength of connections) is determined by the number of mentions between Twitter handles of Chinese diplomats and media outlets (see below). The nodesize (size of handle) is determined by various attributes, such as:

  • total mentions (Figure 2): number of total mentions to the Twitter handle in question from all users (also non-diplomats and non-media that are not shown as nodes in the plot). This shows how "popular" the Chinese diplomats and media outlets are on Twitter broadly, rather than just their popularity/activity within the diplomat/media sub-network.
  • weighted degree (Figure 3): node-size scaled by number of total number of connections between Twitter handle in question and other Chinese diplomats and media outlets (both directions counted, and each mention counted). The weighted degree plot corresponds to in-degree + out-degree (i.e. we count both directions).
  • in-degree (Figure 4): number of mentions from other Chinese diplomats and media outlets to the Twitter handle in question (only one direction counted).
  • out-degree (Figure 5): number of mentions from the Twitter handle in question to other Chinese diplomats and media outlets (only one direction counted).

In addition to the network visualizations, we also show the top 10 handles (based on weighted degree) in Figure 1. The plot is generated in summary_stats_focus.py (see usage below). Clearly, some handles are primarily mentionees and have high in-degree (e.g. CHNews) while others are primarily mentioners and have high out-degree (e.g. zlj517) within the diplomat/media sub-network.

Usage:

  1. Activate environment
source cnenv/bin/activate
  1. Navigate to the network code folder
cd networks/src
  1. Run bash script
bash main.sh

in main.sh set:
PRE=true
NET=true
SUM=true

This ensures that the bash script calls (runs)

  1. preprocessing (concat_files.py)
  2. network visualizations (network_main.py)
  3. summary data analysis (summary_stats_focus.py)

Bot Detection

We train a logistic classifier on the cresci-2017 (Cresci et al., 2017) data set (available: https://botometer.osome.iu.edu/bot-repository/datasets.html) to classify Twitter handles as genuine or spam/bot/fake. We use the widely used fofo metric (e.g. Yang et al., 2013; Tavazoee et al., 2020) which is (following/followers) of an account. We use (following+1/followers+1) to avoid division with zero, and when an account appears more than once in a data set we use only the last appearance (i.e. the number of following and followers for the handle at that time). The intuition behind the metric is that bot-accounts tend to follow many other accounts (following) but they tend to have few followers. This means that they will generally have a high fofo-ratio (i.e. high following, low followers). Using the trained model, we estimate the fraction of genuine accounts vs. spam/bot/fake accounts in our own data set, as well as in a baseline data set consisting of vaccine-related tweets from 2020-2021 (https://www.kaggle.com/datasets/gpreda/all-covid19-vaccines-tweets). We estimate 27.22% of the accounts in the baseline (vaccine) data set to be non-genuine accounts and 46.44% of accounts in our data set of Chinese state media and diplomats to be non-genuine accounts. There is considerable uncertainty around this estimate since (1) our data set might differ in other respects than the amount of bot-activity from the baseline data set and (2) while the fofo-metric is widely used (Yang et al., 2013) it is not universally found to be accurate in detecting bots.

About

The code hosted under this repository is for research and may be undocumented/partially documented and is primarily provided to help researchers replicate results.

Topics

Resources

Stars

Watchers

Forks