

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTENCE_GRAMMAR.ipynb)



# **Evaluate Sentence Grammar**

## 1. Colab Setup

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.3.0 spark-nlp==4.2.8

In [2]:
import json
import pandas as pd
import numpy as np

import sparknlp
import pyspark.sql.functions as F

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
from sparknlp.annotator import *
from sparknlp.base import *
from sparknlp.pretrained import PretrainedPipeline
from pyspark.sql.types import StringType, IntegerType

## 2. Start Spark Session

In [3]:
spark = sparknlp.start()
print ("Spark NLP Version :", sparknlp.version())
spark

Spark NLP Version : 4.2.8


## 3. Select the model to use

In [5]:
#MODEL_NAME = 't5_small'
MODEL_NAME = 't5_base'

## 4 Examples to try on the model

In [6]:
text_list = ['Anna and Mike is going skiing and they is liked is', 'Anna and Mike like to dance']

## 5. Define the Spark NLP pipeline

The `T5 Transformer` model is able to perform 18 different tasks (ref.: [this paper](https://arxiv.org/abs/1910.10683)). To check the grammar in a sentence, we use the prefix `cola sentence:` in the model.

In [7]:
# Prefix to be used on the T5Transformer().setTask(<<prefix>>)
task_prefix = 'cola sentence:'

In [8]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

t5 = T5Transformer() \
    .pretrained(MODEL_NAME) \
    .setTask(task_prefix)\
    .setMaxOutputLength(200)\
    .setInputCols(["documents"]) \
    .setOutputCol("T5")

pipeline = Pipeline(
    stages=[document_assembler, 
            t5])

t5_base download started this may take some time.
Approximate size to download 451.8 MB
[OK!]


## 6. Run the pipeline

In [9]:
# Fit on empty data frame (model is pretrained)
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = pipeline.fit(empty_df)

# Create Light Pipeline
lmodel = LightPipeline(pipeline_model)

# Use the model to make predictions
res = lmodel.fullAnnotate(text_list)

Before _validateStagesInputCols


## 7. Visualize the results

Using Light Pipeline:

In [10]:
for r in res:
    print(f"{r['documents'][0].result} => Grammar: {r['T5'][0].result}")

Anna and Mike is going skiing and they is liked is => Grammar: unacceptable
Anna and Mike like to dance => Grammar: acceptable


Using pipeline model:

In [11]:
# Send example texts to spark data frame
text_df = spark.createDataFrame(pd.DataFrame({'text': text_list}))

# Predict with the model
result = pipeline_model.transform(text_df)

In [12]:
result.select('text', 'T5.result').show(truncate=False)

+--------------------------------------------------+--------------+
|text                                              |result        |
+--------------------------------------------------+--------------+
|Anna and Mike is going skiing and they is liked is|[unacceptable]|
|Anna and Mike like to dance                       |[acceptable]  |
+--------------------------------------------------+--------------+

