In [1]:
import numpy as np
import json
import pickle
from keras.models import load_model

import sys
sys.path.append("..")
from RNN.scripts.gen import DataGenerator
from RNN.scripts.rnn import singleRNN

Using TensorFlow backend.


First, we load the word2token and token2word converters, as well as the embedding matrix converting our tokens to 300-dimensional embeddings.

In [2]:
with open('word2token.pickle', 'rb') as f:
    word2token = pickle.load(f)
    
with open('token2word.pickle', 'rb') as f:
    token2word = pickle.load(f)
    
with open('embedding_matrix.pickle', 'rb') as f:
    embedding_matrix = pickle.load(f)

Then, we import a small demo dataset containing a subset of 1000 samples from the original test set (i.e. not used for training/validation).

In [3]:
with open('sample_data.pickle', 'rb') as f:
    partition = pickle.load(f)

As shown in the example below, the title of each claim and the textual rating of each review is tokenized and the numerical rating is converted to a boolean value.

In [4]:
ID = list(partition['demo'].keys())[1]

print('Example ID:', ID)
print(partition['demo'][ID])

Example ID: 88adff73-f7bd-d2e9-b510-a155ce088ba7
{'claim': array([   5,  464, 1283,   66, 1122,    3, 7941,  543,    2, 9050, 4828]), 'rating': array([31,  7]), 'isFake': True}


In this case, the given claim was evaluated as fake by the original fact checker. We can convert the tokenized values back to their original textual representation to see what the claim and textual ratings were:

In [5]:
claim = ' '.join([token2word.get(t) for t in partition['demo'][ID]['claim']])
rating = ' '.join([token2word.get(t) for t in partition['demo'][ID]['rating']])

print("Claim title:\t", claim)
print("Text rating:\t", rating)

Claim title:	 a single immigrant can bring in unlimited numbers of distant relatives
Text rating:	 mostly false


We then feed all of these data samples to our model by constructing a small data generator.

In [6]:
gen_demo = DataGenerator(partition, mode='demo', all_text=True, train_oov=False, batch_size=1, return_id=False)

So our example batch now contains the tokenized rating description and claim title in combination, and the associated rating value to be predicted as either 1 (indicating the item is fake) or 0 (factually correct):

In [7]:
print('Sample batch:\n', gen_demo.__getitem__(1))

Sample batch:
 (array([[  31,    7,    5,  464, 1283,   66, 1122,    3, 7941,  543,    2,
        9050, 4828]]), array([[1]]))


Since the pretrained model occupies 273.1 MB, it must be downloaded separately and placed in the demo folder to proceed. The model is made available for download via the following [link](https://drive.google.com/file/d/1SpgTDMaSFUG-cBT3Zh1TZrGU7GFr4HNn/view?usp=sharing).

Having downloaded the model, we can now construct the RNN with our pretrained weights:

In [9]:
rnn_single = singleRNN(embedding_matrix)
rnn_single.load_weights("pretrained_model.hdf5")

Finally, we can compute the predicted scores and compare them to the true scores (computation may take about a minute):

In [10]:
def preds(model, generator):
    preds = model.predict_generator(generator)
    y_pred = preds.round().astype(int).reshape(len(preds),)
    
    y_true = np.empty(shape=(len(generator)), dtype=int)    
    for i in range(len(generator)):
        y_true[i] = generator.__getitem__(i)[1]
    
    return y_true, y_pred

y_true, y_pred = preds(rnn_single, gen_demo)

So the corresponding accuracy adds up to:

In [11]:
print("Accuracy = {0}%".format(round(sum(y_true == y_pred) / len(y_true), 3)*100))

Accuracy = 94.3%


Not bad at all!