## WeChat_Censored

Our bot is a informational bot that tweets out articles that are censored on WeChat, the most popular social media platform in China.

The data source of our bot is from WeChatSCOPE, a project lauched by the University of Hong Kong, which “scrape” data from the WeChat's API. When an article is deleted, the copy of it appears in the database at https://wechatscope.jmsc.hku.hk/api/update_weixin_public_pretty?days=7.

Every time when the database is updated with newly deleted articles, our bot will tweet out the original title, the censored date, the restored link of the article, as well as the reasons why it was deleted. There are three reasons for the deletion of an WeChat article: 
1. The article was deleted the author;
2. The content cannot be viewed due to violations (Received related complaints, the content violates the "Internet User Public Account Information Service Management Regulations")
3. The account has been blocked, the content cannot be viewed (Responding to related complaints, the account is suspected of violating the "Regulations on the Management of Public Account Information Services for Internet Users")
    

In [1]:
from requests import get

# Specify the location of the information you want as a string

url = 'https://wechatscope.jmsc.hku.hk/api/update_weixin_public?days=7'

# Then fetch the data (the resource) at that address using get() from
# the "requests" package

response = get(url)
data=response.json()
data

[{'url': 'mp.weixin.qq.com/s?__biz=MjM5Njc4NTgzMA==&mid=2650667017&idx=2&sn=f35a7546a9aef9e8c27e95e81c367004',
  'title': '退伍回家，发现弟弟含冤九泉，一声令下，十万将士直奔而来',
  'title_eng': 'When the veteran returned home and found that his brother was wronged, Jiuquan was ordered, and one hundred thousand soldiers came straight.',
  'nickname': '探索者',
  'created_at': '2020-03-01',
  'archive': 'gh_715befbd50ba_2020-03-01_2650667017_VYcPnRlA3k.y.tar.gz',
  'censored_date': '2020-03-03 02:38:13',
  'censored_msg': '该内容已被发布者删除',
  'update_date': '2020-03-02 12:33:40'},
 {'url': 'mp.weixin.qq.com/s?__biz=MzAwNDE3MjY2Ng==&mid=2653503741&idx=1&sn=2442e802761c678e3b873ebc0f7516ae',
  'title': '喝酒上头难受，三两就断片？行家教你这样喝酒，酒量翻倍，不难受',
  'title_eng': 'Drinking is uncomfortable, just break the film? Experts teach you to drink like this',
  'nickname': '警笛',
  'created_at': '2020-03-02',
  'archive': 'gh_2877643e575a_2020-03-02_2653503741_2SAxNuHKzI.y.tar.gz',
  'censored_date': '2020-03-03 07:46:21',
  'censored_msg': '此内容因

The latest censored article was the last one.

In [2]:
data[-1]

{'url': 'mp.weixin.qq.com/s?__biz=Mzg2NzA1NzcyMA==&mid=2247501756&idx=3&sn=abb0d832052d9ee5b957c2d34920083d',
 'title': '用它抹脸，7天白1度，色斑，暗沉通通消失，15天黄脸婆变少女 !',
 'title_eng': 'Use it to wipe your face, 7 days white, 1 degree, stains, dullness disappear, 15 days yellow face woman becomes a girl!',
 'nickname': '言情说',
 'created_at': '2020-03-06',
 'archive': 'gh_00e66b7f0a40_2020-03-06_2247501756_r7Z3DGv06s.y.tar.gz',
 'censored_date': '2020-03-08 11:35:04',
 'censored_msg': '该内容已被发布者删除',
 'update_date': '2020-03-07 10:38:39'}

In [1]:
# insert your own keys and secrets here...
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth)

Tweet the latest article

In [12]:
from time import sleep
from requests import get


url = 'https://wechatscope.jmsc.hku.hk/api/update_weixin_public?days=7'

# keep track of the previous wired link/url that we tweeted
prev_tweeted_link = ''

# loop forever!
while True:
    
     # fetch and parse the url
    response = get(url)
    data=response.json()

    # get the latest article 
    latest_article = data[-1]

    # take the link of the first story and see if we've tweeted it before
    link = latest_article['url']
    if link != prev_tweeted_link:
            
        print('new article - lets tweet it: ' + link)
        
        # build the text of our tweet
        tweet_text =latest_article['title'] + '   ' +  "censored date:"+ " " + latest_article['censored_date'] + '   ' + "censored message:" + " " + latest_article["censored_msg"] + " " + "https://wechatscope.jmsc.hku.hk/api/html?fn="+latest_article['archive']
        
        # fire it off to twitter
        api.update_status(status=tweet_text)
        
        # keep track of the this link that we just tweeted
        prev_tweeted_link = link
    else:
        print('no new article...lets wait a little while')

    # sleep for a little while
    sleep(300)

    
# if you want to stop this script, hit the Stop button in your notebook

new article - lets tweet it: mp.weixin.qq.com/s?__biz=MjM5NTk0Nzg0MQ==&mid=2665720512&idx=1&sn=5291a6a5db7fd7f42c4f4b46ba049c26
new article - lets tweet it: mp.weixin.qq.com/s?__biz=MzUyMDYwMTA5Ng==&mid=2247485604&idx=4&sn=5a56f91b268dc14e01a771fc8023e173
new article - lets tweet it: mp.weixin.qq.com/s?__biz=MzI4ODA4MDczMw==&mid=2247487572&idx=1&sn=64306f73ccc267c1e12adf60e53569b4
no new article...lets wait a little while
no new article...lets wait a little while
no new article...lets wait a little while
no new article...lets wait a little while
no new article...lets wait a little while


KeyboardInterrupt: 

**Some variations:**

Tweet out the articles that were manually deleted by the censors: taking out those articles that are "deleted by the author" (which usually means that they were identified and deleted by the algorithm). 

In [11]:

from time import sleep
from requests import get
import json

url = 'https://wechatscope.jmsc.hku.hk/api/update_weixin_public?days=7'

# keep track of the previous wired link/url that we tweeted
prev_tweeted_link = ''

# loop forever!
while True:
    
     # fetch and parse the url
    response = get(url)
    data=response.json()

    # get the latest article 
    latest_article = data[-1]

    # take the link of the first story and see if we've tweeted it before
    link = latest_article['url']
    if link != prev_tweeted_link:
        if latest_article['censored_msg']!="该内容已被发布者删除":
            
            print('new article - lets tweet it: ' + link)
        
        # build the text of our tweet
            tweet_text =latest_article['title'] + ' ' + latest_article['title_eng'] + ' ' +  "censored date:"+ " " + latest_article['censored_date'] + ' ' + "censored message:" + " " + latest_article["censored_msg"] + " " +"https://wechatscope.jmsc.hku.hk/api/html?fn="+latest_article['archive']
        
        # fire it off to twitter
            api.update_status(status=tweet_text)
        
        # keep track of the this link that we just tweeted
            prev_tweeted_link = link
        else:
            print('the article was deleted by the author...lets wait a little while')
    else:
        print('no new article...lets wait a little while')

    # sleep for a little while
    sleep(60)

    
# if you want to stop this script, hit the Stop button in your notebook

new article - lets tweet it: mp.weixin.qq.com/s?__biz=MzUyMDYwMTA5Ng==&mid=2247485616&idx=3&sn=d0e53d8914acab894b194f89f0ae8f88


KeyboardInterrupt: 

Tweet out all the available articles that were not deleted by the author.

In [10]:
from time import sleep
from requests import get

url = 'https://wechatscope.jmsc.hku.hk/api/update_weixin_public?days=7'

response = get(url)
data=response.json()

for article in data:
    if article['censored_msg']!="该内容已被发布者删除":
        tweet_text =article['title'] + '   ' +  "censored date:"+ " " + article['censored_date'] + '   ' + "censored message:" + " " + article["censored_msg"] + " " + "https://wechatscope.jmsc.hku.hk/api/html?fn="+article['archive']
        api.update_status(status=tweet_text)
    else:
        print("deleted")
    


deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted
deleted


TweepError: [{'code': 187, 'message': 'Status is a duplicate.'}]