#Question Answering with T5

In [1]:
! pip install -q pyspark==3.1.2 spark-nlp

[K     |████████████████████████████████| 212.4 MB 72 kB/s 
[K     |████████████████████████████████| 130 kB 20.1 MB/s 
[K     |████████████████████████████████| 198 kB 57.2 MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


In [2]:
import sparknlp
spark= sparknlp.start()

In [3]:
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
import pandas as pd
from pyspark.sql.functions import col, when, count, isnan
from pyspark.sql import functions as F

### The model that answers closed book questions.

Firstly, implementing document assembler and uploading pretrained T5 model.

In [4]:
documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document") 

t5= T5Transformer.pretrained(name="t5_base", lang="en")\
    .setInputCols("document")\
    .setOutputCol("answer")\
    .setTask("question")

t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


Creating pipeline and fit, transform with data

In [5]:
nlp_pipeline= Pipeline(stages=[
                               documentAssembler,
                               t5
])

text= [["Who is the most famous artist in the world?"],   #sample questions
       ["What is the capital of Ireland?"]]

df= spark.createDataFrame(text).toDF("text")
model= nlp_pipeline.fit(df)
result= model.transform(df)

Displaying the result

In [6]:
result.select("text", "answer.result").show(truncate=False)

+-------------------------------------------+-------------+
|text                                       |result       |
+-------------------------------------------+-------------+
|Who is the most famous artist in the world?|[John Lennon]|
|What is the capital of Ireland?            |[Dublin]     |
+-------------------------------------------+-------------+



### The model that answers open book questions.

Generating the questions and context data


In [28]:
context= '''context: Mark Knopfler is a British singer-songwriter, guitarist, and record producer. 
            He was born in Glasgow, Scotland, and raised in Blyth, near Newcastle in England, from the age of seven. 
            He became known as the lead guitarist, singer and songwriter of the rock band Dire Straits. He pursued a solo career after leaving the band in 1987. 
            Dire Straits reunited in early 1991, but dissolved again in 1995. He is now an independent solo artist.
        '''

question_1= "question: Who is Mark Knopfler"
question_2= "question: When was Mark Knopfler born?"
question_3= "question: What is the name of Mark Knopfler's rock band?"
question_4= "question: When did Mark Knopfler leave the band?"

data= [[question_1+ context],
       [question_2+ context],
       [question_3+ context],
       [question_4+ context]]

df= spark.createDataFrame(data).toDF("text")
      

We have prepared our context and questions. Now, it's time to build the model.


In [12]:
documentAssembler= DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

t5= T5Transformer.pretrained(name="t5_base", lang="en")\
    .setInputCols("document")\
    .setOutputCol("answer")

pipeline= Pipeline(stages=[ 
                           documentAssembler,
                           t5
])

model= pipeline.fit(df)
result= model.transform(df)

t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


In [30]:
result.columns

['text', 'document', 'answer']

Inspecting the results

In [31]:
result.select("answer.result").show(truncate=False)

+-------------------------------------------------------------+
|result                                                       |
+-------------------------------------------------------------+
|[a British singer-songwriter, guitarist, and record producer]|
|[Glasgow, Scotland]                                          |
|[Dire Straits]                                               |
|[1987]                                                       |
+-------------------------------------------------------------+

