# Q&A on live news feed

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf

## Install transformers library

In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 26.0 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 3.8 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 51.4 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 50.4 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 50.7 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml
 

## Load pre-trained model and tokenizer

In [3]:
from transformers import TFBertForQuestionAnswering
from transformers import BertTokenizer

In [4]:
model = TFBertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

Downloading:   0%|          | 0.00/443 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.25G [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFBertForQuestionAnswering.

All the layers of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

## Install wget
For loading news corpus from web server

In [5]:
!pip install wget

Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=0f85ad41e2a78e58435e1c392b73f19a786a4f111b6dd8aee24a383c068694a9
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


## Define function for Q&A
parameters:
  
  **question**: user input question 
  
  **text**: corpus to use for answering question

In [6]:
def question_answer(question, text):
    
    #tokenize question and text in ids as a pair
    input_ids = tokenizer(question, text, return_tensors="tf", truncation=True)
    
    #string version of tokenized ids
    tokens = tokenizer.convert_ids_to_tokens(input_ids.input_ids[0])
    
    #model output using input_ids and segment_ids
    output = model(input_ids)
    
    #reconstructing the answer
    answer_start = tf.argmax(tf.cast(output.start_logits, tf.int32), axis=1)
    answer_end = tf.where(tf.equal(output.end_logits, float(tf.reduce_max(output.end_logits[0]))))[:,-1]

    if answer_end >= answer_start:
        answer = tokens[int(answer_start)]
        for i in range(int(answer_start)+1, int(answer_end)+1):
            if tokens[i][0:2] == "##":
                answer += tokens[i][2:]
            else:
                answer += " " + tokens[i]
                
    if answer.startswith("[CLS]"):
        answer = "Unable to find the answer to your question."
    
    print("\nAnswer:\n{}".format(answer.capitalize()))

## Function for accepting user questions

In [7]:
def question_answer_auto():
  question = input("\nPlease enter your question: \n")

  while True:
      question_answer(question, updated_news)
      
      flag = True
      flag_N = False
      
      while flag:
          response = input("\nDo you want to ask another question based on this text (Y/N)? ")
          if response[0] == "Y":
              question = input("\nPlease enter your question: \n")
              flag = False
          elif response[0] == "N":
              print("\nOK!")
              flag = False
              flag_N = True
        
      if flag_N == True:
          break

## Final run

Loop for updating news corpus and getting a new question

In [8]:
import wget
import time
import re

updated_news=''
files = ['First', 'Second', 'Third', 'Fourth', 'Fifth']

for j in files:
  wget.download('https://raw.githubusercontent.com/abcom-mltutorials/dataset/main/'+j+'.csv')

  doc = '/content/'+j+'.csv'
  updated_news = updated_news + list(pd.read_csv(doc).head(0))[0]
  updated_news = re.sub("[^a-zA-Z0-9$'-., ]", "", updated_news)
  print('Corpus updated with ' + j+'.csv')
  question_answer_auto()
  flag = True
  flag_N = False
  while flag:
    response = input("\nDo you want to quit (Y/N)? ")
    if response[0] == "Y":
      print("\n Bye bye! \n")
      flag = False
      flag_N = True
    elif response[0] == "N":
      print("\n Let's continue \n")
      flag = False
      
  if flag_N == True:
    break

Corpus updated with First.csv

Please enter your question: 
What did Ukraine's ambassador say?

Answer:
Russian president vladimir putin has declared war on ukraine

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
What Ukraine's websites were attacked?

Answer:
Ukraine ' s parliament and other government and banking websites

Do you want to ask another question based on this text (Y/N)? N

OK!

Do you want to quit (Y/N)? N

 Let's continue 

Corpus updated with Second.csv

Please enter your question: 

Answer:
Would lead to consequences you have never seen

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
How many people were killed on the first day?

Answer:
137

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
Which organization is assisting to evacuate Indian nationals?

Answer:
Mea

Do you want to ask another question based on this text (Y/N)? N

O