![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/english/question-answering/MPNetForQuestionAnswering.ipynb)

## Colab Setup

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.3.1
setup Colab for PySpark 3.2.3 and Spark NLP 5.3.1
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.5/281.5 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.8/564.8 kB[0m [31m42.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 kB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


# Download MPNetForQuestionAnswering Model and Create Spark NLP Pipeline

Lets create a Spark NLP pipeline with the following stages:

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.common import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
import pandas as pd

# for GPU training >> sparknlp.start(gpu = True)
spark = sparknlp.start()

print("Spark NLP version", sparknlp.version())
print("Apache Spark version:", spark.version)

Spark NLP version 5.3.1
Apache Spark version: 3.2.3


In [43]:
MPNetForQuestionAnswering

In [51]:
document_assembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

spanClassifier = MPNetForQuestionAnswering.pretrained() \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer") \
    .setCaseSensitive(False)

pipeline = Pipeline().setStages([
     document_assembler,
     spanClassifier
])

mpnet_base_question_answering_squad2 download started this may take some time.
Approximate size to download 384.9 MB
[OK!]


Lets create a dataframe with some queries and passages to be used as input for the pipeline.

In [55]:
examples = [
    ["Do you know where I'm from?", "I'm from Tokyo and love sushi."],
    ["Can you guess my favorite color?", "My favorite color is blue and I love the ocean."],
    ["What do you think I do for a living?", "I'm a teacher in New York and enjoy reading."],
    ["Are you aware of my hobby?", "I enjoy painting and often visit art galleries."],
    ["Do you know my pet's name?", "My dog's name is Max and he loves long walks."]
    ]

In [57]:
data = spark.createDataFrame(examples).toDF("question", "context")

Run the pipeline and get the results.

In [61]:
result = pipeline.fit(data).transform(data)
result.select("question", "context", "answer.result").show(truncate=False)

+------------------------------------+-----------------------------------------------+----------+
|question                            |context                                        |result    |
+------------------------------------+-----------------------------------------------+----------+
|Do you know where I'm from?         |I'm from Tokyo and love sushi.                 |[Tokyo]   |
|Can you guess my favorite color?    |My favorite color is blue and I love the ocean.|[blue]    |
|What do you think I do for a living?|I'm a teacher in New York and enjoy reading.   |[teacher] |
|Are you aware of my hobby?          |I enjoy painting and often visit art galleries.|[painting]|
|Do you know my pet's name?          |My dog's name is Max and he loves long walks.  |[Max]     |
+------------------------------------+-----------------------------------------------+----------+

