![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/english/sequence-classification/MPNetForSequenceClassification.ipynb)

## Colab Setup

In [1]:
!wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.3.1
setup Colab for PySpark 3.2.3 and Spark NLP 5.3.1
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.5/281.5 MB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.8/564.8 kB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


# Download MPNetForQuestionAnswering Model and Create Spark NLP Pipeline

Lets create a Spark NLP pipeline with the following stages:

In [2]:
import sparknlp
from sparknlp.base import *
from sparknlp.common import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
import pandas as pd

# for GPU training >> sparknlp.start(gpu = True)
spark = sparknlp.start()

print("Spark NLP version", sparknlp.version())
print("Apache Spark version:", spark.version)

Spark NLP version 5.3.1
Apache Spark version: 3.2.3


In [3]:
MPNetForSequenceClassification

In [4]:
document = DocumentAssembler() \
     .setInputCol("text") \
     .setOutputCol("document")

tokenizer = Tokenizer() \
     .setInputCols(["document"]) \
     .setOutputCol("token")

sequenceClassifier = MPNetForSequenceClassification.pretrained() \
     .setInputCols(["document", "token"]) \
    .setOutputCol("label")

pipeline = Pipeline().setStages([document, tokenizer, sequenceClassifier])


mpnet_sequence_classifier_ukr_message download started this may take some time.
Approximate size to download 384.5 MB
[OK!]


Lets create a dataframe with some queries to be used as input for the pipeline.

In [5]:
data = spark.createDataFrame([
     ["I love driving my car."],
     ["The next bus will arrive in 20 minutes."],
     ["pineapple on pizza is the worst 🤮"]]).toDF("text")

pipelineModel = pipeline.fit(data)
results = pipelineModel.transform(data)

display the results

In [9]:
results.select("text", "label.result").show(truncate=False)

+---------------------------------------+--------------------+
|text                                   |result              |
+---------------------------------------+--------------------+
|I love driving my car.                 |[TRANSPORT/CAR]     |
|The next bus will arrive in 20 minutes.|[TRANSPORT/MOVEMENT]|
|pineapple on pizza is the worst 🤮     |[FOOD]              |
+---------------------------------------+--------------------+

