Data related to investigation of chat client censorship
HTML Shell


Collection of keyword lists used to censor content on chat apps and live streaming apps used in China.

Full details on data collection and analysis methods and results are avalible in reports below:

Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC

Asia Chats: Investigating Regionally-based Keyword Censorship in LINE

Every Rose Has Its Thorn: Censorship and Surveillance on Social Video Platforms in China

Keyword Content Analysis

Datasets include raw keyword lists collected from the applications and processed datasets that include translations of keywords. Keywords were translated to English using combination of machine and human translation. Based on interpreting these translations with contextual information, we coded each keyword into content categories grouped under six general themes according to a code book


All data is provided under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International and available in full here and summarized here