![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/Spark_NLP_Udemy_MOOC/Open_Source/27.01.QuestionAnswering_with_Transformers.ipynb)

# QuestionAnswering with Transformers

This notebook will cover the different parameters and usages of Transformers-bases QuestionAnswering annotators.



**📖 Learning Objectives:**

1. Be able to create a pipeline for question answering using a Transformers-bases annotator.

2. Understand how to use the annotators for predictions.

3. Become comfortable using the different parameters of the annotator.

4. Import Transformers models from Hugging Face to Spark NLP.



**🔗 Helpful Links:**

- Documentation : [Transformers in Spark NLP](https://nlp.johnsnowlabs.com/docs/en/transformers)



- Scala Doc : [BertForQuestionAnswering](https://nlp.johnsnowlabs.com/api/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForQuestionAnswering.html)


- For extended examples of usage, see the [Spark NLP Workshop repository.](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/14.Transformers_for_Token_Classification_in_Spark_NLP.ipynb)



## Transformers and Spark NLP

Spark NLP has extended support for importing models from `HuggingFace` 🤗   and `TF Hub` since `3.1.0`. You can easily use the `saved_model` feature in HuggingFace within a few lines of codes and import any of the following types of models into Spark NLP.

</br>
<div align="center">
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">AlbertForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">BertForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">DeBertaForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">DistilBertForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">LongformerForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">RoBertaForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">XlmRoBertaForQuestionAnswering</span></td>
  </tr>
  <tr>
    <td class="tg-0pky"><span style="color:#905;background-color:#ddd">TapasForQuestionAnswering</span></td>
  </tr>
</tbody>
</table>
</div>
</br>

> We will keep working on the remaining annotators and extend this support to aditional Transformers models. To keep updated, visit [this page](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) on compatibility and development of the adaptations of TF Hub and  HuggingFace to Spark NLP. Keep tuned for the next releases.

### Question Answering

As mentioned above, we already have implemented many different Transformers models in Spark NLP, and specifically for question answering we have all the versions of **`ForQuestionAnswering`**, where can be any of:

- `BERT` ([BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin et al.): Randomly changes input texts (for example, 15% of them) with _MASKS_ or random tokens in order to learn a language model. Given two sentences, the learning process makes two tasks: 
    - Predict the sentences by correctly replacing the wrong tokens.
    - Predict if the sentences are consecutive or not.
- `ALBERT` ([ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), Zhenzhong Lan et al.): Same as Bert, with changes in some hyperparameters that optimizes memomy usage. The training phase instead of predicting if the two sentences are consecutive, now they predict if they were swapped or not (two consecutive sentences are input, model predict if they were given in the correct order or not).
- `RoBERTa` ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692), Yinhan Liu et al.): Same as Bert, but with some different training methods (e.g., using dynamic masking in each epoch instead).
- `CamemBERT` ([CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894), Louis Martin et al.): Based on RoBerta model, trained with French dataset.
- `DistilBERT` ([DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108),Victor Sanh et al.): Distilled version of Bert (model parameters were reduced by using transfer learning from big model to smaller model). 
- `Longformer` ([Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150), Iz Beltagy et al.): Allows the use of upt to 4096 tokens instead of the usual limit of 512. To optimize the added computational cost, replace dense matrixes by sparse representations.
- `XlmRoBerta` ([Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), Alexis Conneau et al.): Applies the training methods from RoBerta to Xlm model. 
- `Xlnet` ([XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237), Zhilin Yang et al.): differently than token masking applied in Bert models, it trains the language model by permuting the tokens. 


For more details on these models and others available on HuggingFace, pelase visit the [HuggingFace documentation](https://huggingface.co/docs/transformers/model_summary).

## **🎬 Colab Setup**

In [None]:
!pip install -q pyspark==3.1.2 spark-nlp==4.2.4

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.4/212.4 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m448.4/448.4 KB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.6/198.6 KB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


In [None]:
import sparknlp

from sparknlp.annotator import *
from sparknlp.common import *
from sparknlp.base import *

spark = sparknlp.start()

print("Spark NLP version", sparknlp.version())
print("Apache Spark version:", spark.version)

spark

Spark NLP version 4.2.4
Apache Spark version: 3.1.2


## **🖨️ Input/Output Annotation Types**

- Input: `DOCUMENT`, `TOKEN`

- Output: `CHUNK`

## **🔎Parameters**

- `setCaseSensitive()`:
Set whether to ignore case in index lookups with this parameter
(Default depends on model)

- `setMaxSentenceLength()` : Maximum sentence length to process, limited to 512 for all models except `Longformer` which has a limit of 4096. (Default: 512)

- `batchSize` : Large values allows faster processing but requires more memory (Default depends on model)

- `configProtoBytes` : ConfigProto from tensorflow, serialized into byte array. Get with `config_proto.SerializeToString()`

## Defining the Spark NLP Pipeline

In [None]:
from sparknlp.base import MultiDocumentAssembler
from sparknlp.annotator import Tokenizer, BertForQuestionAnswering, DistilBertForQuestionAnswering, RoBertaForQuestionAnswering
from pyspark.ml import Pipeline
import pyspark.sql.functions as F

➤ Let's prepared the pre-requisite columns first, so we can use them in different annotators.

In [None]:
documentAssembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

pipeline = Pipeline(stages=[documentAssembler])

In [None]:
example_df = spark.createDataFrame([["Who is founder of Alibaba Group?", "Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire."]]).toDF("question", "context")

example_df.show()

+--------------------+--------------------+
|            question|             context|
+--------------------+--------------------+
|Who is founder of...|Alibaba Group fou...|
+--------------------+--------------------+



In [None]:
example_df = pipeline.fit(example_df).transform(example_df)
example_df.show()

+--------------------+--------------------+--------------------+--------------------+--------------------+
|            question|             context|   document_question|    document_context|              answer|
+--------------------+--------------------+--------------------+--------------------+--------------------+
|Who is founder of...|Alibaba Group fou...|[{document, 0, 31...|[{document, 0, 12...|[{chunk, 0, 6, Ja...|
+--------------------+--------------------+--------------------+--------------------+--------------------+



### Bert

In [None]:
question_answering = BertForQuestionAnswering.pretrained("bert_base_cased_qa_squad2") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer")

bert_base_cased_qa_squad2 download started this may take some time.
Approximate size to download 385.5 MB
[OK!]


In [None]:
question_answering.getCaseSensitive()

True

In [None]:
result = question_answering.transform(example_df)
result.selectExpr("question as Question", "answer.result as Answer").show(truncate=False)

+--------------------------------+---------+
|Question                        |Answer   |
+--------------------------------+---------+
|Who is founder of Alibaba Group?|[Jack Ma]|
+--------------------------------+---------+



### DistilBert

In [None]:
question_answering = DistilBertForQuestionAnswering \
    .pretrained("distilbert_qa_BERT", "en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer")


result = question_answering.transform(example_df)
result.selectExpr("question as Question", "answer.result as Answer").show(truncate=False)

distilbert_qa_BERT download started this may take some time.
Approximate size to download 232.8 MB
[OK!]
+--------------------------------+---------+
|Question                        |Answer   |
+--------------------------------+---------+
|Who is founder of Alibaba Group?|[Jack Ma]|
+--------------------------------+---------+



### RoBerta for Spanish


In [None]:
example_df = spark.createDataFrame([["¿En qué ciudad vive su madre?", "Su madre vive en Madrid y es maestra." ]]).toDF("question", "context")

documentAssembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

question_answering = RoBertaForQuestionAnswering \
    .pretrained("roberta_qa_roberta_base_bne_squad_2.0_es_jamarju", "es") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer")

pipeline = Pipeline(stages=[
    documentAssembler,
    question_answering
    ])

result = pipeline.fit(example_df).transform(example_df)
result.selectExpr("question as Question", "answer.result as Answer").show(truncate=False)

roberta_qa_roberta_base_bne_squad_2.0_es_jamarju download started this may take some time.
Approximate size to download 435.2 MB
[OK!]
+-----------------------------+--------+
|Question                     |Answer  |
+-----------------------------+--------+
|¿En qué ciudad vive su madre?|[Madrid]|
+-----------------------------+--------+



In [None]:
result_df = result.select(F.explode(F.arrays_zip(result.document_question.result,
                                                 result.document_context.result, 
                                                 result.answer.result)).alias("cols"))\
                  .select(F.expr("cols['0']").alias("question"),
                          F.expr("cols['1']").alias("context"),
                          F.expr("cols['2']").alias("answer"))

result_df.show(50, truncate=100)

+-----------------------------+-------------------------------------+------+
|                     question|                              context|answer|
+-----------------------------+-------------------------------------+------+
|¿En qué ciudad vive su madre?|Su madre vive en Madrid y es maestra.|Madrid|
+-----------------------------+-------------------------------------+------+



###  Using LightPipeline

[LightPipelines](https://nlp.johnsnowlabs.com/docs/en/concepts#using-spark-nlps-lightpipeline) are Spark NLP specific Pipelines, equivalent to Spark ML Pipeline, but meant to deal with smaller amounts of data. They’re useful working with small datasets, debugging results, or when running either training or prediction from an API that serves one-off requests.

Spark NLP LightPipelines are Spark ML pipelines converted into a single machine but the multi-threaded task, **becoming more than 10x times faster** for smaller amounts of data (small is relative, but 50k sentences are roughly a good maximum). To use them, we simply plug in a trained (fitted) pipeline and then annotate a plain text. We don't even need to convert the input text to DataFrame in order to feed it into a pipeline that's accepting DataFrame as an input in the first place. This feature would be quite useful when it comes to getting a prediction for a few lines of text from a trained ML model.

For more details, check the following 
[Medium post](https://medium.com/spark-nlp/spark-nlp-101-lightpipeline-a544e93f20f1).

This class accepts strings or list of strings as input, without the need to transform your text into a spark data frame. The [.annotate()](https://nlp.johnsnowlabs.com/api/python/reference/autosummary/sparknlp/base/light_pipeline/index.html#sparknlp.base.light_pipeline.LightPipeline.annotate) method returns a dictionary (or list of dictionary if a list is passed as input) with the results of each step in the pipeline. To retrieve all metadata from the anntoators in the result, use the method [.fullAnnotate()](https://nlp.johnsnowlabs.com/api/python/reference/autosummary/sparknlp/base/light_pipeline/index.html#sparknlp.base.light_pipeline.LightPipeline.fullAnnotate) instead, which always returns a list.

To extract the results from the object, you just need to parse the dictionary.

Let's use the `distilbert_base_cased_qa_squad2` model with `LightPipeline` and `.fullAnnotate()` it with sample data.

In [None]:
documentAssembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

question_answering = DistilBertForQuestionAnswering \
    .pretrained("distilbert_base_cased_qa_squad2", "en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer")\
    .setCaseSensitive(True) \
    .setMaxSentenceLength(512)

pipeline = Pipeline(stages=[
    documentAssembler,
    question_answering
    ])

model = pipeline.fit(spark.createDataFrame([["", ""]]).toDF("question", "context"))

distilbert_base_cased_qa_squad2 download started this may take some time.
Approximate size to download 232.8 MB
[OK!]


In [None]:
light_model= LightPipeline(model)
light_result= light_model.fullAnnotate("What type of flight decks are aircraft carriers equipped with?" ,"An aircraft carrier is a warship that serves as a seagoing airbase, equipped with a full-length flight deck and facilities for carrying, arming, deploying, and recovering aircraft.")[0]

In [None]:
light_result

{'document_question': [Annotation(document, 0, 61, What type of flight decks are aircraft carriers equipped with?, {})],
 'document_context': [Annotation(document, 0, 179, An aircraft carrier is a warship that serves as a seagoing airbase, equipped with a full-length flight deck and facilities for carrying, arming, deploying, and recovering aircraft., {})],
 'answer': [Annotation(chunk, 0, 12, full - length, {'chunk': '0', 'start_score': '0.8622168', 'score': '0.81347287', 'end': '33', 'start': '31', 'end_score': '0.7647289', 'sentence': '0'})]}

In [None]:
light_result.keys()

dict_keys(['document_question', 'document_context', 'answer'])

# From HuggingFace to Spark NLP

Here you will learn how to export a model from HuggingFace to Spark NLP. 

For compatibility details and examples, check [this page](https://nlp.johnsnowlabs.com/docs/en/transformers#import-transformers-into-spark-nlp).

### Export and Save HuggingFace model

- Let's install `HuggingFace` and `TensorFlow`. You don't need `TensorFlow` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock TensorFlow on `2.11.0` version and Transformers on `4.25.1`. This doesn't mean it won't work with the future releases, but we wanted you to know which versions have been tested successfully.

In [None]:
!pip install -q transformers==4.25.1 tensorflow==2.11.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m90.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m113.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h

- HuggingFace comes with a native `saved_model` feature inside `save_pretrained` function for TensorFlow based models. We will use that to save it as TF `SavedModel`.
- We'll use [deepset/bert-large-uncased-whole-word-masking-squad2](https://huggingface.co/deepset/bert-large-uncased-whole-word-masking-squad2) model from HuggingFace as an example
- In addition to `TFBertForQuestionAnswering` we also need to save the `BertTokenizer`. This is the same for every model, these are assets needed for tokenization inside Spark NLP.

In [None]:
from transformers import TFBertForQuestionAnswering, BertTokenizer 
import tensorflow as tf

MODEL_NAME = 'deepset/minilm-uncased-squad2'

tokenizer = BertTokenizer.from_pretrained(MODEL_NAME)
tokenizer.save_pretrained('./{}_tokenizer/'.format(MODEL_NAME))

try:
  model = TFBertForQuestionAnswering.from_pretrained(MODEL_NAME)
except:
  model = TFBertForQuestionAnswering.from_pretrained(MODEL_NAME, from_pt=True)
    
model.save_pretrained("./{}".format(MODEL_NAME), saved_model=True)

# Define TF Signature
@tf.function(
  input_signature=[
      {
          "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"),
          "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"),
          "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"),
      }
  ]
)
def serving_fn(input):
    return model(input)

model.save_pretrained("./{}".format(MODEL_NAME), saved_model=True, signatures={"serving_default": serving_fn})


Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/107 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/477 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/133M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


➤ Let's have a look inside these two directories and see what we are dealing with:

In [None]:
!ls -l {MODEL_NAME}

total 130028
-rw-r--r-- 1 root root       657 Mar  5 21:00 config.json
drwxr-xr-x 3 root root      4096 Mar  5 20:59 saved_model
-rw-r--r-- 1 root root 133136696 Mar  5 21:00 tf_model.h5


In [None]:
!ls -l {MODEL_NAME}/saved_model/1

total 9112
drwxr-xr-x 2 root root    4096 Mar  5 20:59 assets
-rw-r--r-- 1 root root      55 Mar  5 21:00 fingerprint.pb
-rw-r--r-- 1 root root  164802 Mar  5 21:00 keras_metadata.pb
-rw-r--r-- 1 root root 9150273 Mar  5 21:00 saved_model.pb
drwxr-xr-x 2 root root    4096 Mar  5 21:00 variables


In [None]:
!ls -l {MODEL_NAME}_tokenizer

total 236
-rw-r--r-- 1 root root    125 Mar  5 20:58 special_tokens_map.json
-rw-r--r-- 1 root root    492 Mar  5 20:58 tokenizer_config.json
-rw-r--r-- 1 root root 231508 Mar  5 20:58 vocab.txt


- As you can see, we need the SavedModel from `saved_model/1/` path
- We also be needing `vocab.txt` from the tokenizer
- All we need is to just copy the `vocab.txt` to `saved_model/1/assets` which Spark NLP will look for

In [None]:
asset_path = '{}/saved_model/1/assets'.format(MODEL_NAME)

!cp {MODEL_NAME}_tokenizer/vocab.txt {asset_path}

➤ We have our `vocab.txt` inside assets directory

In [None]:
!ls -l {MODEL_NAME}/saved_model/1/assets

total 228
-rw-r--r-- 1 root root 231508 Mar  5 21:00 vocab.txt


## Import and Save BertForQuestionAnswering in Spark NLP

- Let's use `loadSavedModel` functon in `BertForQuestionAnswering` which allows us to load TensorFlow model in SavedModel format
- Most params can be set later when you are loading this model in `BertForQuestionAnswering` in runtime like `setMaxSentenceLength`, so don't worry what you are setting them now
- `loadSavedModel` accepts two params, first is the path to the TF SavedModel. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [None]:
from sparknlp.annotator import *
from sparknlp.base import *

spanClassifier = BertForQuestionAnswering.loadSavedModel(
     '{}/saved_model/1'.format(MODEL_NAME),spark)\
  .setInputCols(["document_question",'document_context'])\
  .setOutputCol("answer")\
  .setCaseSensitive(False)\
  .setMaxSentenceLength(512)

  # setCaseSensitive is set to False because the model we imported is `uncased`

➤ Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [None]:
spanClassifier.write().overwrite().save("./{}_spark_nlp".format(MODEL_NAME))

➤ Let's clean up stuff we don't need anymore

In [None]:
!rm -rf {MODEL_NAME}_tokenizer {MODEL_NAME}




Awesome 😎  !

➤ This is your BertForQuestionAnswering model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [None]:
! ls -l {MODEL_NAME}_spark_nlp

total 138552
-rw-r--r-- 1 root root 141868303 Mar  5 21:00 bert_classification_tensorflow
drwxr-xr-x 4 root root      4096 Mar  5 21:00 fields
drwxr-xr-x 2 root root      4096 Mar  5 21:00 metadata


➤ Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny BertForQuestionAnswering model in Spark NLP 🚀 pipeline! 

In [None]:
document_assembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

spanClassifier_loaded = BertForQuestionAnswering.load("./{}_spark_nlp".format(MODEL_NAME))\
  .setInputCols(["document_question",'document_context'])\
  .setOutputCol("answer")

pipeline = Pipeline().setStages([
    document_assembler,
    spanClassifier_loaded
])

example = spark.createDataFrame([["The most populated city in the United States is which city?", "New York is the most populous city in the United States and the center of the New York metropolitan area"]]).toDF("question", "context")
result = pipeline.fit(example).transform(example)


result.selectExpr("question as Question", "answer.result as Answer").show(truncate=False)

+-----------------------------------------------------------+----------+
|Question                                                   |Answer    |
+-----------------------------------------------------------+----------+
|The most populated city in the United States is which city?|[New York]|
+-----------------------------------------------------------+----------+



➤ Cool! You can now go wild and use hundreds of BertForQuestionAnswering models from HuggingFace 🤗 in Spark NLP 🚀