# How to use this notebook

This notebook creates 2 files, `model.py` and `app.py`, which are the model loading/logic and streamlit GUI files respectively.

Run all the cells in sequential order to start *unReal*; there are comments in each cell detailing what to do in each step.

# Install requirements



In [1]:
%%writefile requirements.txt

transformers==3.1.0 # for BERT model, pytorch already inbuilt to colab
streamlit==1.11.1
pyngrok==4.1.1 # newer versions don't work

Writing requirements.txt


In [2]:
# install dependencies
!pip install -r requirements.txt

# output at the end of this cell will have a button named "RESTART RUNTIME";
# just press it and proceed with the rest of the notebook as per normal

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers==3.1.0
  Downloading transformers-3.1.0-py3-none-any.whl (884 kB)
[K     |████████████████████████████████| 884 kB 28.0 MB/s 
[?25hCollecting streamlit==1.11.1
  Downloading streamlit-1.11.1-py2.py3-none-any.whl (9.1 MB)
[K     |████████████████████████████████| 9.1 MB 55.7 MB/s 
[?25hCollecting pyngrok==4.1.1
  Downloading pyngrok-4.1.1.tar.gz (18 kB)
Collecting tokenizers==0.8.1.rc2
  Downloading tokenizers-0.8.1rc2-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
[K     |████████████████████████████████| 3.0 MB 49.6 MB/s 
Collecting sentencepiece!=0.1.92
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 64.1 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 55.0 MB/s 
Collecting rich

In [1]:
# to give access to the trained model saved in your google drive
from google.colab import drive 
drive.mount('/content/drive')

Mounted at /content/drive


## Model loading and logic file

This file instantiates the BERT model class and loads in the pretrained model (which should be saved in your google drive).

In [2]:
# IMPORTANT: change the PATH variable to access your saved weights in your
# google drive! it can be found after the model class definition

%%writefile model.py

import torch
import torch.nn as nn
import transformers
from transformers import AutoModel, BertTokenizerFast

# define model architecture
class BERT_Arch(nn.Module):
    def __init__(self, bert_head, bert_body):
      super(BERT_Arch, self).__init__()
      self.bert_head = bert_head
      self.bert_body = bert_body

      # Max pooling layer 
      self.max_pooling = nn.MaxPool1d(4, stride=4)
      # dropout layer
      self.dropout = nn.Dropout(0.1)
      # relu activation function
      self.relu =  nn.ReLU()
      # dense layer 1
      self.fc = nn.Linear(384, 768)
      self.fc1 = nn.Linear(768, 512)
      # dense layer 2 (Output layer)
      self.fc2 = nn.Linear(512, 4)
      # softmax activation function 
      self.softmax = nn.LogSoftmax(dim=-1)
 
    # define forward pass
    def forward(self, sent_id_head, sent_id_body, mask_head, mask_body):
      # print(sent_id.size())
      # print(mask.size())

      # pass inputs to the model   
      _, cls_hs_h = self.bert_head(sent_id_head, attention_mask=mask_head)
      _, cls_hs_b = self.bert_body(sent_id_body, attention_mask=mask_body)
      cls_hs = torch.cat((cls_hs_h, cls_hs_b), dim=1)
      max_pool_out =torch.squeeze(self.max_pooling(cls_hs.unsqueeze(0)))

      fc_out = self.fc(max_pool_out)
      fc_act_out = self.relu(fc_out)

      x = self.fc1(fc_act_out)
      x = self.relu(x)
      x = self.dropout(x)

      # output layer
      x = self.fc2(x)

      # apply softmax activation
      x = self.softmax(x)
      return x


# loading the trained model (weights)
# change it to your own PATH to the .pt file in your google drive!
# galen's PATH
PATH = '/content/drive/MyDrive/saved_weights_bert_2.pt'
# szechang's PATH
# PATH = 'drive/MyDrive/AI bert/saved_weights_bert_2.pt'

bert_head = AutoModel.from_pretrained('bert-base-uncased')
bert_body = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

device = torch.device("cuda")
model = BERT_Arch(bert_head, bert_body)
model = model.to(device)
model.load_state_dict(torch.load(PATH))

label_to_idx = {"agree":0, "disagree":1, "discuss":2, "unrelated":3}
idx_to_label = {0:"agree", 1:"disagree", 2:"discuss", 3:"unrelated"}

max_seq_len_h = 20
max_seq_len_b = 512

def input_to_tensor(user_input_head, user_input_body):

  tokens_head = tokenizer.batch_encode_plus(
      [user_input_head],
      max_length = max_seq_len_h,
      padding='max_length',
      pad_to_max_length=True,
      truncation=True,
      return_token_type_ids=False
  )

  tokens_body = tokenizer.batch_encode_plus(
      [user_input_body],
      max_length = max_seq_len_b,
      padding='max_length',
      pad_to_max_length=True,
      truncation=True,
      return_token_type_ids=False
  )

  seq_head = torch.tensor(tokens_head['input_ids'])
  mask_head = torch.tensor(tokens_head['attention_mask'])
  seq_body = torch.tensor(tokens_body['input_ids'])
  mask_body = torch.tensor(tokens_body['attention_mask'])
  
  return seq_head.to(device),seq_body.to(device),mask_head.to(device),mask_body.to(device)


Writing model.py


## Streamlit GUI file

This file contains the code for the GUI, loading in `model.py` and allowing the user to give input of their *news headline* and *news article (body)*.

The model then predicts and classifies the inputs given based on the 4 different stances: `Agree`, `Disagree`, `Discuss`, `Unrelated`.

In [3]:
%%writefile app.py

import streamlit as st
import numpy as np
from model import *

#---------------------------------#
# Page configuration

PAGE_CONFIG = {"page_title":"AI Project Group 18",
              "page_icon":":newspaper:",
              "layout":"wide"}
st.set_page_config(**PAGE_CONFIG)

#---------------------------------#
# Home page

st.write("""
# unReal

#### *Fake News Classification and Prediction through AI*

In this implementation, the *BERT base model (uncased)* was trained and applied for Fake News Stance Detection.

Insert your news and news article in the corresponding boxes below and hit *Submit* to check if they are categorised under:

`Agree`, `Disagree`, `Discuss`, or `Unrelated`.
""")

st.markdown("""---""") 

if 'output' not in st.session_state:
    st.session_state.output = ""

def get_output(user_input_1, user_input_2):
  head_tensor,body_tensor,head_mask,body_mask = input_to_tensor(user_input_1, user_input_2)
  
  output_tensor = model(head_tensor, body_tensor, head_mask, body_mask)
  output_idx = np.argmax(output_tensor.detach().cpu().numpy())
  output = idx_to_label[output_idx]
 
  st.session_state.output = output

user_input_1 = st.text_area("Enter your news headline here:")
user_input_2 = st.text_area("Enter your news article (body) here:")

st.button(label="Submit", on_click=get_output, args=(user_input_1, user_input_2))

st.write("### Your news headline and news article are categorised under:")
st.write(st.session_state.output)


Writing app.py


# Running the GUI

In [4]:
# check if required py files have been written to colab sandbox
# app.py and model.py should be seen
!ls

app.py	drive  model.py  requirements.txt  sample_data


In [5]:
# ngrok authentication, only needs to be done once per session
!ngrok authtoken 2CQtJERhcUlxLR6cdKdzfP8J9jC_56J8CecbbnGjX8dp1tE4j

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


In [6]:
# start streamlit app instance
!streamlit run app.py &>/dev/null&
!pgrep streamlit # outputs streamlit process number (required for killing)

420


In [7]:
from pyngrok import ngrok
# setup tunnel to 8501 (streamlit port)
pub_url = ngrok.connect(port='8501')
print(pub_url) # generates url for app

http://6a8d-35-221-38-46.ngrok.io


In [8]:
# shutdown
!kill 420 # change the process number
ngrok.kill()