Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content
This repository contains the implementation of methods in "Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content".
- Python 2.7
- Keras (with Theano backend)
Three sub-datasets of our CCMR dataset are saved in the folder CCMR as three json files (lists of json objects), "CCMR/CCMR_Twitter.txt", "CCMR_Google.txt" and "CCMR_Baidu.txt".
For CCMR Twitter, each tweet is saved as a json object with keys "tweet_id", "content", "image_id", "event", and "timestamp". For CCMR Google and Baidu, each webpage is saved as a json object with keys "url", "title", "image_id", and "event". The values of "image_id" are lists of image or video names from VMU 2015 dataset. All of those image files and video URLs are available in "images.zip".
To reproduce experiments results, simply run main.py.
Download parallel English and Mandarin sentence of news and microblogs from UM-Corpus and save them in a folder named 'UM_Corpus'.
Run prepare_UM_Corpus.py to split and tokenize the data in UM-Corpus.
Run train_multilingual_embedding.py to train the multilingual sentence embedding.
Run prepare_FNC_split.py to tokenize, embed and split the data from Fake News Challenge.
Run train_agreement_classifier.py to train the agreement classifier.
Run prepare_CCMR.py to tokenize the CCMR dataset.
Run extract_clcp_feats.py to extract all cross-lingual cross-platform features and splits of the data we need for experiments. CLCP saves the available output file.
Play with main.py and other scripts to test everything from the Paper.