

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTENCE_GRAMMAR.ipynb)



# **Evaluate Sentence Grammar**

## 1. Colab Setup

In [1]:
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash
# !bash colab.sh
# -p is for pyspark
# -s is for spark-nlp
# !bash colab.sh -p 3.1.1 -s 3.0.1
# by default they are set to the latest

openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.18.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)
setup Colab for PySpark 3.1.1 and Spark NLP 3.0.0
[K     |████████████████████████████████| 212.3MB 72kB/s 
[K     |████████████████████████████████| 143kB 33.8MB/s 
[K     |████████████████████████████████| 204kB 52.1MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


In [2]:
import pandas as pd
import numpy as np
import json
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

## 2. Start Spark Session

In [3]:
spark = sparknlp.start()

## 3. Select the model to use

In [4]:
#MODEL_NAME = 't5_small'
MODEL_NAME = 't5_base'

## 4 Examples to try on the model

In [5]:
text_list = ['Anna and Mike is going skiing and they is liked is', 'Anna and Mike like to dance']

## 5. Define the Spark NLP pipeline

The `T5 Transformer` model is able to perform 18 different tasks (ref.: [this paper](https://arxiv.org/abs/1910.10683)). To check the grammar in a sentence, we use the prefix `cola sentence:` in the model.

In [6]:
# Prefix to be used on the T5Transformer().setTask(<<prefix>>)
task_prefix = 'cola sentence:'

In [7]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

t5 = T5Transformer() \
    .pretrained(MODEL_NAME) \
    .setTask(task_prefix)\
    .setMaxOutputLength(200)\
    .setInputCols(["documents"]) \
    .setOutputCol("T5")

pipeline = Pipeline(stages=[document_assembler, t5])

t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


## 6. Run the pipeline

In [8]:
# Fit on empty data frame (model is pretrained)
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = pipeline.fit(empty_df)

# Create Light Pipeline
lmodel = LightPipeline(pipeline_model)

# Use the model to make predictions
res = lmodel.fullAnnotate(text_list)

## 7. Visualize the results

Using Light Pipeline:

In [9]:
for r in res:
    print(f"{r['documents'][0].result} => Grammar: {r['T5'][0].result}")

Anna and Mike is going skiing and they is liked is => Grammar: unacceptable
Anna and Mike like to dance => Grammar: acceptable


Using pipeline model:

In [10]:
# Send example texts to spark data frame
text_df = spark.createDataFrame(pd.DataFrame({'text': text_list}))

# Predict with the model
result = pipeline_model.transform(text_df)

In [11]:
result.select('text', 'T5.result').show(truncate=False)

+--------------------------------------------------+--------------+
|text                                              |result        |
+--------------------------------------------------+--------------+
|Anna and Mike is going skiing and they is liked is|[unacceptable]|
|Anna and Mike like to dance                       |[acceptable]  |
+--------------------------------------------------+--------------+

