![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/english/document-assembler/Loading_Multiple_Documents.ipynb)

# Loading Multiple Documents with MultiDocumentAssembler

This notebook shows examples how to loading multiple documents with the MultiDocumentAssembler.

## Colab Setup

In [None]:
!pip install -q pyspark==3.3.0 spark-nlp==4.3.1

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *

spark = sparknlp.start()

print("Spark NLP version", sparknlp.version())
print("Apache Spark version:", spark.version)

spark

Spark NLP version 4.3.1
Apache Spark version: 3.3.0


### Question-Answering with RoBertaForQuestionAnswering

Here, we have a column for the questions and the respective context. The model will
then predict the answer given these two columns. MultiDocumentAssembler provides an easy
way to supply both at the same time.

In [None]:
document_assembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_qa_roberta_base_squad2_covid","en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer") \
    .setCaseSensitive(True)

pipeline = Pipeline().setStages([document_assembler,
                                 spanClassifier])

data = spark.createDataFrame([["Do I have Covid?", "I have a fever and a cough and for the past few days, I have lost my sense of smell and taste. Later I was diagnosed with Covid."]]).toDF("question", "context")

result = pipeline.fit(data).transform(data)

roberta_qa_roberta_base_squad2_covid download started this may take some time.
Approximate size to download 442.8 MB
[ — ]roberta_qa_roberta_base_squad2_covid download started this may take some time.
Approximate size to download 442.8 MB
Download done! Loading the resource.
[ | ]

2023-02-02 15:43:02.762789: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


[OK!]


In [None]:
result.select('answer.result').show(truncate=False)

[Stage 6:>                                                          (0 + 1) / 1]

+----------------------------------+
|result                            |
+----------------------------------+
|[Later I was diagnosed with Covid]|
+----------------------------------+



                                                                                