<a href="https://colab.research.google.com/github/RobinSmits/FakeNews-Generator-And-Detector/blob/main/FakeNews_Generator_And_Detector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this last notebook we will use the 'test' part of the 'ag_news_subset' dataset. It contains 7600 rows with data that both the T5 and RoBERTa model have never seen before.

We will again use the T5 model to use the 'title' as input and generate fake news. The generated output is stored in the file 't5_generated_fake_news_final.csv'.

As a final and last step the RoBERTa model will classify the input into real or fake news.

In [1]:
import numpy as np
import os
import pandas as pd
from tqdm.notebook import tqdm
from urllib.request import urlopen
import tarfile

# Install Specific Versions
!pip install tensorflow==2.4.1
!pip install tensorflow-datasets==4.1.0
!pip install transformers==4.2.2
!pip install sentencepiece==0.1.95

# Import Packages
import tensorflow as tf
import tensorflow_datasets as tfds
from transformers import *
import sentencepiece

Collecting tensorflow-datasets==4.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/8b/02/c1260ff4caf483c01ce36ca45a63f05417f732d94ec42cce292355dc7ea4/tensorflow_datasets-4.1.0-py3-none-any.whl (3.6MB)
[K     |████████████████████████████████| 3.6MB 4.8MB/s 
Installing collected packages: tensorflow-datasets
  Found existing installation: tensorflow-datasets 4.0.1
    Uninstalling tensorflow-datasets-4.0.1:
      Successfully uninstalled tensorflow-datasets-4.0.1
Successfully installed tensorflow-datasets-4.1.0
Collecting transformers==4.2.2
[?25l  Downloading https://files.pythonhosted.org/packages/88/b1/41130a228dd656a1a31ba281598a968320283f48d42782845f6ba567f00b/transformers-4.2.2-py3-none-any.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 6.0MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |█████████████████

I've created and tested these notebooks on Google Colab Pro and used Google Drive to store and load any files created. 

If you run the code locally on a computer then modify the 'WORK_DIR' accordingly. Google Drive will not be needed in that case.

In [2]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set Folder to use...
WORK_DIR = '/content/drive/My Drive/fake_news/'
os.makedirs(WORK_DIR, exist_ok = True) 

Mounted at /content/drive


Next we set some config for the device to use (Note: This notebook runs on TPU/GPU and CPU) We also set the necessary constants. 

And all the necessary information for the T5 and RoBERTa models will be set.

In [3]:
# Configure Strategy. Assume TPU...if not set default for GPU/CPU
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.TPUStrategy(tpu)
except ValueError:
    strategy = tf.distribute.get_strategy()

# Set Auto Tune
AUTO = tf.data.experimental.AUTOTUNE

# Supress Warnings
tf.autograph.set_verbosity(0, False)

# Set Pandas Display Options
pd.set_option('display.max_colwidth', 256)

# Constants
MAX_LEN = 512
VERBOSE = 1

# Batch Size
GENERATE_BATCH_SIZE = 19 * strategy.num_replicas_in_sync
PREDICT_BATCH_SIZE = 16 * strategy.num_replicas_in_sync
print(f'Predict Batch Size: {PREDICT_BATCH_SIZE}')
print(f'Generate Batch Size: {GENERATE_BATCH_SIZE}')

Predict Batch Size: 16
Generate Batch Size: 19


In [4]:
# Set T5 Type
t5_size = 't5-base'
print(f'T5 Model Type: {t5_size}')

# Set T5 Task Name
task_name = 'generate fake news: '
print(f'T5 Task Name: {task_name}')

# Set T5 Config
t5_config = T5Config.from_pretrained(t5_size)

# Set T5 Tokenizer
t5_tokenizer = T5Tokenizer.from_pretrained(t5_size, return_dict = True)

T5 Model Type: t5-base
T5 Task Name: generate fake news: 


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




In [5]:
# Set RoBERTa Type
roberta_type = 'roberta-base'
print(f'RoBERTa Model Type: {roberta_type}')

# Set RoBERTa Config
roberta_config = RobertaConfig.from_pretrained(roberta_type, num_labels = 2)#, output_attentions=True) # Binary classification so set num_labels = 2

# Set RoBERTa Tokenizer
roberta_tokenizer = RobertaTokenizer.from_pretrained(roberta_type, 
                                                     add_prefix_space = False,
                                                     do_lower_case = False)

RoBERTa Model Type: roberta-base


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




For Generation we will use the 'test' set part of the Tensorflow Dataset 'ag_news_subset'. It contains a train set of 120K rows and a test set of 7600 rows.

Both the T5 and RoBERTa model have never been trained on the 'test' set part of the data. It is completely unseen to both models.

Each row contains a 'title' which is a news paper headline and a 'description' which is a short part of the news paper article.

The 'title' will be used as input for the T5 model to generate the fake news.

!! Note: I've experienced multiple times that on the initial download of the dataset an error occurs. If you run it again it will just work...

In [6]:
# AG News Subset Download URL from TFDS
AGNEWSSUBSET_URL = 'https://drive.google.com/uc?export=download&id=0Bz8a_Dbh9QhbUDNpeUdjb0wxRms'
AGNEWSSUBSET_DIR = '/tmp/agnewssubet/'

# Download Tar.Gz File and Extract
with urlopen(AGNEWSSUBSET_URL) as targzstream:
    thetarfile = tarfile.open(fileobj = targzstream, mode="r|gz")
    thetarfile.extractall(AGNEWSSUBSET_DIR)
    
# List Dataset files
agnewssubset_files = os.listdir(AGNEWSSUBSET_DIR + 'ag_news_csv/')
print(agnewssubset_files)

# Load Test Csv
test_df = pd.read_csv(AGNEWSSUBSET_DIR + 'ag_news_csv/test.csv', names = ['label', 'title', 'description'])
print(test_df)

# Add column for generated text
test_df['generated'] = ''

# Dataset features
print(test_df.columns)

# Samples
total_samples = test_df.shape[0] 
print(f'Total Samples: {total_samples}')

['train.csv', 'readme.txt', 'classes.txt', 'test.csv']
      label  ...                                                                                                                                                                                                                                                      description
0         3  ...                                                                                                                                  Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.
1         4  ...                       SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the  #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.
2         4  ...                                           AP - A company founded by a chemistry researcher at the Universi

Create the Keras Model to be used for T5.

In [7]:
class KerasTFT5ForConditionalGeneration(TFT5ForConditionalGeneration):
    def __init__(self, *args, log_dir = None, cache_dir = None, **kwargs):
        super().__init__(*args, **kwargs)
        self.loss_tracker= tf.keras.metrics.Mean(name='loss') 
    
    @tf.function
    def train_step(self, data):
        x = data[0]
        y = x['labels']
        y = tf.reshape(y, [-1, 1])
        with tf.GradientTape() as tape:
            outputs = self(x, training = True)
            loss = outputs[0]
            logits = outputs[1]
            loss = tf.reduce_mean(loss)
            grads = tape.gradient(loss, self.trainable_variables)
            
        self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
        self.loss_tracker.update_state(loss)        
        self.compiled_metrics.update_state(y, logits)
        metrics = {m.name: m.result() for m in self.metrics}
        
        return metrics

    def test_step(self, data):
        x = data[0]
        y = x["labels"]
        y = tf.reshape(y, [-1, 1])
        output = self(x, training = False)
        loss = output[0]
        loss = tf.reduce_mean(loss)
        logits = output[1]
        
        self.loss_tracker.update_state(loss)
        metrics = self.compiled_metrics.update_state(y, logits)
        
        return metrics

Next create the model and load the weights file.

In [8]:
# Create Model
with strategy.scope():
    model = KerasTFT5ForConditionalGeneration.from_pretrained(t5_size, config = t5_config)
    model.compile(optimizer = tf.keras.optimizers.Adam(), 
                  metrics = [tf.keras.metrics.SparseTopKCategoricalAccuracy(name = 'accuracy')])

# Summary
model.summary()

# Load Weights
model.load_weights(WORK_DIR + 't5_base_model.h5')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=892146080.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing KerasTFT5ForConditionalGeneration.

All the layers of KerasTFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use KerasTFT5ForConditionalGeneration for predictions without further training.


Model: "keras_tf_t5for_conditional_generation"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
shared (TFSharedEmbeddings)  multiple                  24674304  
_________________________________________________________________
encoder (TFT5MainLayer)      multiple                  84954240  
_________________________________________________________________
decoder (TFT5MainLayer)      multiple                  113275008 
Total params: 222,903,554
Trainable params: 222,903,552
Non-trainable params: 2
_________________________________________________________________


Use the test_ds to prepare the final dataframe that will be saved to disk after the fake news generation.

Perform the text generation based on the prepared dataframe. Note that the 'title' is used as input. The generated fake news will be stored in the dataframe 'generated' column.

The dataframe is saved to storage for reference.

In [9]:
text_list = None
generated = []

for index, row in tqdm(zip(range(total_samples), test_df.iterrows()), total = total_samples):
    index += 1

    if text_list is None:
        text_list = []

    # Prep input text
    text_list.append(task_name + row[1]['title'])
    
    if index % GENERATE_BATCH_SIZE == 0:
        # Batch Encode with Special Tokens
        textlist_encoded = t5_tokenizer.batch_encode_plus(text_list, add_special_tokens = True, max_length = MAX_LEN, truncation = True, padding = 'max_length', return_tensors = 'tf')
        
        input_ids = textlist_encoded['input_ids']
        
        # Generate FakeNews
        generated_fakenews = model.generate(input_ids, 
                                          max_length = MAX_LEN, 
                                          top_p = 0.96, 
                                          top_k = 256, 
                                          temperature = 1.3,
                                          num_beams = 1, 
                                          num_return_sequences = 1, 
                                          repetition_penalty = 1.3)
        
        for mapping in generated_fakenews.numpy():
            generated.append(t5_tokenizer.decode(mapping, skip_special_tokens = True))

        # Reset Text List
        text_list = []

# Generate Final File
test_df['generated'] = generated
test_df.to_csv(WORK_DIR + 't5_generated_fake_news_final.csv')

# Summary...
test_df.head()

HBox(children=(FloatProgress(value=0.0, max=7600.0), HTML(value='')))




Unnamed: 0,label,title,description,generated
0,3,Fears for T N pension after talks,Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.,"The government is preparing to pay the pension of its workers, but it is not clear whether the government will be able to meet its obligations."
1,4,The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com),"SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.","SPACE.com - The second private team of astronauts has set a launch date for the first human spaceflight, which will take place in the next few weeks."
2,4,Ky. Company Wins Grant to Study Peptides (AP),"AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.","AP - A Kentucky company has won a grant to study the effects of peptides on the body, a key part of its research program."
3,4,Prediction Unit Helps Forecast Wildfires (AP),"AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry a...","AP - A new unit of the National Weather Service's forecasting service helps forecast wildfires in the United States, including the latest in a series of hurricanes that have killed at least 2,000 people."
4,4,Calif. Aims to Limit Farm-Related Smog (AP),"AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.","AP - California's air quality is a key issue in the fight against greenhouse gas emissions, but it's also a challenge for the state."


### RoBERTa FakeNews Detector

We load the 't5_generated_fake_news_final.csv' file and do some preprocessing to get the input news and labels correct for the classifier.

In [10]:
# Import Generated Fake News
df = pd.read_csv(WORK_DIR + 't5_generated_fake_news_final.csv', usecols = ['title', 'description', 'generated'])
df.head()

Unnamed: 0,title,description,generated
0,Fears for T N pension after talks,Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.,"The government is preparing to pay the pension of its workers, but it is not clear whether the government will be able to meet its obligations."
1,The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com),"SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.","SPACE.com - The second private team of astronauts has set a launch date for the first human spaceflight, which will take place in the next few weeks."
2,Ky. Company Wins Grant to Study Peptides (AP),"AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.","AP - A Kentucky company has won a grant to study the effects of peptides on the body, a key part of its research program."
3,Prediction Unit Helps Forecast Wildfires (AP),"AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry a...","AP - A new unit of the National Weather Service's forecasting service helps forecast wildfires in the United States, including the latest in a series of hurricanes that have killed at least 2,000 people."
4,Calif. Aims to Limit Farm-Related Smog (AP),"AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.","AP - California's air quality is a key issue in the fight against greenhouse gas emissions, but it's also a challenge for the state."


In [11]:
# Split out 'description', rename column to 'news' and set label to 0
df_description = df[['title', 'description']].copy()
df_description.rename(columns = {'description': 'news'}, inplace = True)
df_description['label'] = 0
df_description.head()

Unnamed: 0,title,news,label
0,Fears for T N pension after talks,Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.,0
1,The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com),"SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.",0
2,Ky. Company Wins Grant to Study Peptides (AP),"AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.",0
3,Prediction Unit Helps Forecast Wildfires (AP),"AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry a...",0
4,Calif. Aims to Limit Farm-Related Smog (AP),"AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.",0


In [12]:
# Split out 'generated', rename column to 'news' and set label to 1
df_generated = df[['title', 'generated']].copy()
df_generated.rename(columns = {'generated': 'news'}, inplace = True)
df_generated['label'] = 1
df_generated.head()

Unnamed: 0,title,news,label
0,Fears for T N pension after talks,"The government is preparing to pay the pension of its workers, but it is not clear whether the government will be able to meet its obligations.",1
1,The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com),"SPACE.com - The second private team of astronauts has set a launch date for the first human spaceflight, which will take place in the next few weeks.",1
2,Ky. Company Wins Grant to Study Peptides (AP),"AP - A Kentucky company has won a grant to study the effects of peptides on the body, a key part of its research program.",1
3,Prediction Unit Helps Forecast Wildfires (AP),"AP - A new unit of the National Weather Service's forecasting service helps forecast wildfires in the United States, including the latest in a series of hurricanes that have killed at least 2,000 people.",1
4,Calif. Aims to Limit Farm-Related Smog (AP),"AP - California's air quality is a key issue in the fight against greenhouse gas emissions, but it's also a challenge for the state.",1


In [13]:
# Combine Dataframes to a final dataframe.
test_df = pd.concat([df_description, df_generated], ignore_index = True)
test_df.sample(n = 10)

Unnamed: 0,title,news,label
9711,Oil Hits \$46 as YUKOS Cuts China Supply,"Oil prices hit $46 on Friday as the US government cut its supply of crude to China, a move that could boost demand for oil.",1
610,AL Wrap: Ortiz Fuels Red Sox Fire as Blue Jays Go Down,TORONTO (Reuters) - David Ortiz thumped two homers and drove in four runs to fire the Boston Red Sox to an 11-5 win over the Toronto Blue Jays in the American League Wednesday.,0
6657,Najeh to the rescue for Packers,"No Ahman Green, no problem. At least that #39;s what it looked like on Monday night, as the Green Bay Packers ran roughshod over the St.",0
7664,UN launches 210-million-dollar appeal for flood-hit Bangladesh (AFP),"AFP - The United Nations has launched an appeal for 210 million dollars to help rebuild the country after a devastating flood that killed at least 2,000 people and injured more than 2,000.",1
3296,Old Bones Give New Date for Giant Deer Demise (Reuters),"Reuters - Many large mammals were wiped out in the\last Ice Age but the Eurasian giant deer managed to survive,\scientists said on Wednesday.",0
4002,"Odyssey Warns of Weak Quarter, CEO Quits","CHICAGO (Reuters) - Odyssey Healthcare Inc. &lt;A HREF=""http://www.investor.reuters.com/FullQuote.aspx?ticker=ODSY.O target=/stocks/quickinfo/fullquote""&gt;ODSY.O&lt;/A&gt; on Monday warned of an earnings shortfall, announced the resignation of its...",0
6170,All-rounder McGrath,"Glenn McGrath, thoroughbred fast bowler for a decade, embarked on a new career as an all-rounder in his 102nd Test match at the Gabba, hitting his first half-century as Australia",0
13794,Ohio State #39;s big plays kill Wolverines,"ATHENS (Sports Network) - The Ohio State football team has been a force in the NFL for the last two years. They have been a force in the Big East since the 1990s, and they are still a force in the league.",1
8650,Not a big hit everywhere,"The sluggers aren #39;t going to be able to get the ball in the end zone, but they can still get the ball.",1
14784,Klitschko too good for Williams,"ATHENS (Reuters) - Russian tennis star Vitali Klitschko is too good for Williams to be the man in the middle of the pack, according to a report on Monday.",1


Next we define a function to process the Pandas Test Dataframe. We loop through all rows and from each row we use the columns 'news' and 'label'.

Note that the 'label' column is only used for validation of the predictions.

In [14]:
def create_dataset(df):
    total_samples = df.shape[0]

    # Placeholders input
    input_ids = np.zeros((total_samples, MAX_LEN), dtype = 'int32')
    input_masks = np.zeros((total_samples, MAX_LEN), dtype = 'int32')
    labels = np.zeros((total_samples, ), dtype = 'int32')

    for index, row in tqdm(zip(range(0, total_samples), df.iterrows()), total = total_samples):
        
        # Get news and label...
        news = row[1]['news']
        label = row[1]['label']

        # Process News - Set Label.....
        input_encoded = roberta_tokenizer.encode_plus(news, add_special_tokens = True, max_length = MAX_LEN, truncation = True, padding = 'max_length')
        input_ids_sample = input_encoded['input_ids']
        input_ids[index,:] = input_ids_sample
        attention_mask_sample = input_encoded['attention_mask']
        input_masks[index,:] = attention_mask_sample
        labels[index] = int(label)

    # Create Dataset.
    dataset = tf.data.Dataset.from_tensor_slices(({'input_ids': input_ids, 'attention_mask': input_masks}, labels))

    # Return Dataset
    return dataset

In [15]:
# Show Sizes
print(f'Test DF Shape: {test_df.shape}')

# Create Validation Dataset
test_dataset = create_dataset(test_df)
test_dataset = test_dataset.batch(PREDICT_BATCH_SIZE)
test_dataset = test_dataset.repeat()
#test_dataset = test_dataset.prefetch(128)

# Steps
test_steps = test_df.shape[0] // PREDICT_BATCH_SIZE
print(f'Test Steps: {test_steps}')

Test DF Shape: (15200, 3)


HBox(children=(FloatProgress(value=0.0, max=15200.0), HTML(value='')))


Test Steps: 950


Define a function to create and compile the RoBERTa base model.

In [16]:
def build_model():
    # Create Model
    with strategy.scope():      
        model = TFRobertaForSequenceClassification.from_pretrained(roberta_type, config = roberta_config)
        
        optimizer = tf.keras.optimizers.Adam()
        loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True)
        metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')

        model.compile(optimizer = optimizer, loss = loss, metrics = [metric])        
        
        return model

Create the model and load the weights file

In [17]:
# Create Model
model = build_model()

# Summary
model.summary()

# Load Weights
model.load_weights(WORK_DIR + 'roberta_base_model.h5')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=657434796.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model: "tf_roberta_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
roberta (TFRobertaMainLayer) multiple                  124055040 
_________________________________________________________________
classifier (TFRobertaClassif multiple                  592130    
Total params: 124,647,170
Trainable params: 124,647,170
Non-trainable params: 0
_________________________________________________________________


Next lets first evaluate the test set and see how well the RoBERTa model can classify the generated data.

With an evaluation accuracy of around 97% the RoBERTa model performs a nice job of classifying the real and fake news.

In [18]:
# Evaluate Dataset
eval = model.evaluate(test_dataset, steps = test_steps, verbose = 1)
print(f'Detection Accuracy: {eval[1] * 100}%')

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f5d526a82a0> is not a module, class, method, function, traceback, frame, or code object


The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).


Cause: while/else statement not yet supported


The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.


Detection Accuracy: 96.38158082962036%


We can also perform prediction with the test set. This is basically the same action as the evaluation. However evaluation will give us back the evaluation metrics where as prediction will give us back the raw predictions.

In [19]:
# Predict Dataset
preds = model.predict(test_dataset, steps = test_steps, verbose = 1)

# Raw Predictions
print(preds.logits[:5])

# Probabilities
probs = tf.nn.softmax(preds.logits).numpy()
print(probs[:5])

The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.


[[ 4.028013   -4.328664  ]
 [ 4.0794563  -4.396236  ]
 [-0.65141207  0.5614262 ]
 [ 3.6734707  -3.9799175 ]
 [ 2.7360373  -3.0042768 ]]
[[9.9976522e-01 2.3476820e-04]
 [9.9979156e-01 2.0843127e-04]
 [2.2919923e-01 7.7080077e-01]
 [9.9952579e-01 4.7420900e-04]
 [9.9679655e-01 3.2034637e-03]]


In [20]:
test_df['label_pred'] = np.argmax(probs, axis = 1)

So the majority of the real news and fake news where classified correctly. That is very nice..but to be honest I'am a lot more interrested in the wrong predictions for the real and fake news.

Let's take a look at some of the predictions where our model messed up.

In [21]:
# Real News ... but classified as Fake...
test_df[test_df.label.eq(0) & test_df.label_pred.eq(1)].head()

Unnamed: 0,title,news,label,label_pred
2,Ky. Company Wins Grant to Study Peptides (AP),"AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.",0,1
14,Socialites unite dolphin groups,"Dolphin groups, or ""pods"", rely on socialites to keep them from collapsing, scientists claim.",0,1
22,IBM Chips May Someday Heal Themselves,New technology applies electrical fuses to help identify and repair faults.,0,1
26,Giddy Phelps Touches Gold for First Time,Michael Phelps won the gold medal in the 400 individual medley and set a world record in a time of 4 minutes 8.26 seconds.,0,1
32,Sister of man who died in Vancouver police custody slams chief (Canadian Press),Canadian Press - VANCOUVER (CP) - The sister of a man who died after a violent confrontation with police has demanded the city's chief constable resign for defending the officer involved.,0,1


In [22]:
# Fake News ... but classified as Real...
test_df[test_df.label.eq(1) & test_df.label_pred.eq(0)].head()

Unnamed: 0,title,news,label,label_pred
8941,The Discreet Charm of the Very Bourgeois Toy Store?,The Discreet Charm of the Very Bourgeois Toy Store?,1,0
9148,"Post-Olympic Greece tightens purse, sells family silver to fill budget holes (AFP)","AFP - Greece tightened its purse after the Olympic Games, selling its family silver medal to fill budget holes and reducing the number of athletes competing in the Olympics.",1,0
10821,National Geographic Photo Camps Give Kids New Views,National Geographic Photo Camps Give Kids New Views!,1,0
12737,"Retail, auto sales, job numbers suggest tougher times","Retail sales, auto sales and job numbers are all pointing to tougher times ahead for the economy.",1,0
12997,WordPerfect Office 12 - Home Edition Defines Home Productivity &lt;b&gt;...&lt;/b&gt;,WordPerfect Office 12 - Home Edition Defines Home Productivity &lt;strong&gt;WordPerfect Office 12 - Strong&lt;/strong&gt; &lt;strong&gt;,1,0
