### Main goals:
Clarify and reflect on the definition of the term "fake news", which may vary among databases, sometimes non-binary.\
Research, where the data comes from and inspect the data: what are the labels, sources, and authors?\
Is there a person, source or topic which is over- or under-represented?\
Study the literature on how others approach this task. Check the related literature and select your model architecture of choice: LSTM, ...\
Develop a classification model to predict fake news from the text. How do you judge the quality of your results, i.e. which metrics do you consider?
### Optional:
Inspect the falsely classified ones. What can you learn from them?\
Investigate edge cases that you found in your data inspection with respect to how the model learned to identify these.\
Experiment with how you could mitigate if edge cases are covered poorly.


In [1]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments,Trainer
from datasets import load_dataset
import numpy as np
import evaluate
from sklearn.metrics import classification_report

In [2]:
model_name = "google-bert/bert-base-uncased"

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [4]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

In [5]:
data = load_dataset('GonzaloA/fake_news')

Repo card metadata block was not found. Setting CardData to empty.


In [6]:
data = data.remove_columns('Unnamed: 0')

In [7]:
tokenized_data = data.map(tokenize_function, batched=True)

In [59]:
small_train_dataset = tokenized_data["train"].shuffle(seed=42).select(range(100))
small_eval_dataset = tokenized_data["validation"].shuffle(seed=42).select(range(100))
small_test_dataset = tokenized_data['test']

In [9]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

In [11]:
metric = evaluate.load("accuracy")

In [12]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [13]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [14]:
trainer.train()

  0%|          | 0/39 [00:00<?, ?it/s]

  0%|          | 0/13 [00:00<?, ?it/s]

{'eval_loss': 0.31016087532043457, 'eval_accuracy': 0.92, 'eval_runtime': 9.747, 'eval_samples_per_second': 10.26, 'eval_steps_per_second': 1.334, 'epoch': 1.0}


  0%|          | 0/13 [00:00<?, ?it/s]

{'eval_loss': 0.22731147706508636, 'eval_accuracy': 0.91, 'eval_runtime': 9.4725, 'eval_samples_per_second': 10.557, 'eval_steps_per_second': 1.372, 'epoch': 2.0}


  0%|          | 0/13 [00:00<?, ?it/s]

{'eval_loss': 0.19146320223808289, 'eval_accuracy': 0.94, 'eval_runtime': 9.4679, 'eval_samples_per_second': 10.562, 'eval_steps_per_second': 1.373, 'epoch': 3.0}
{'train_runtime': 129.383, 'train_samples_per_second': 2.319, 'train_steps_per_second': 0.301, 'train_loss': 0.2629552743373773, 'epoch': 3.0}


TrainOutput(global_step=39, training_loss=0.2629552743373773, metrics={'train_runtime': 129.383, 'train_samples_per_second': 2.319, 'train_steps_per_second': 0.301, 'train_loss': 0.2629552743373773, 'epoch': 3.0})

In [60]:
test_labels = small_test_dataset['label']
small_test_dataset = small_test_dataset.remove_columns(['label','token_type_ids'])

In [61]:
small_test_dataset

Dataset({
    features: ['title', 'text', 'input_ids', 'attention_mask'],
    num_rows: 8117
})

In [62]:
predictions = trainer.predict(small_test_dataset)

  0%|          | 0/1015 [00:00<?, ?it/s]

In [63]:
predicted_labels = predictions.predictions.argmax(axis=1)

In [67]:
predicted_labels

array([0, 0, 0, ..., 0, 1, 0])

In [68]:
print(classification_report(test_labels, predicted_labels))

              precision    recall  f1-score   support

           0       0.95      0.90      0.92      3782
           1       0.91      0.96      0.94      4335

    accuracy                           0.93      8117
   macro avg       0.93      0.93      0.93      8117
weighted avg       0.93      0.93      0.93      8117



In [70]:
# trainer.save_model('model')