

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/T5_LINGUISTIC.ipynb)




### Spark NLP documentation and instructions:
https://nlp.johnsnowlabs.com/docs/en/quickstart

### Spark NLP Google T5 Article 	
https://towardsdatascience.com/hands-on-googles-text-to-text-transfer-transformer-t5-with-spark-nlp-6f7db75cecff

### For T5 models:
https://nlp.johnsnowlabs.com/models?q=T5

### You can find details about Spark NLP models here:
https://nlp.johnsnowlabs.com/models



## 1. Colab Setup

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.1.2 spark-nlp==4.2.8

# 2. Start the Spark session

In [2]:
import json
import pandas as pd
import numpy as np

import sparknlp
import pyspark.sql.functions as F

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
from sparknlp.annotator import *
from sparknlp.base import *
from sparknlp.pretrained import PretrainedPipeline
from pyspark.sql.types import StringType, IntegerType

In [3]:
spark = sparknlp.start()
print ("Spark NLP Version :", sparknlp.version())
spark

Spark NLP Version : 4.2.8


# T5 for grammar error correction

In [4]:
documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")
    
t5 = T5Transformer.pretrained("t5_grammar_error_corrector") \
    .setTask("gec:") \
    .setInputCols(["documents"])\
    .setMaxOutputLength(200)\
    .setOutputCol("corrections")

pipeline = Pipeline().setStages([documentAssembler, 
                                 t5])


t5_grammar_error_corrector download started this may take some time.
Approximate size to download 883.7 MB
[OK!]


In [5]:
pipeline_model = pipeline.fit(spark.createDataFrame([['']]).toDF('text'))
T5model = LightPipeline(pipeline_model)

example_txt = """

Anna and Mike is going skiing and they is liked is
"""

res = T5model.fullAnnotate(example_txt)[0]


print ('Prediction:', res['corrections'][0].result)

Before _validateStagesInputCols
Prediction: Anna and Mike are going skiing and they like it.


# T5 for Informal to Formal Style Transfer

In [6]:
documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

t5 = T5Transformer.pretrained("t5_informal_to_formal_styletransfer") \
    .setTask("transfer Casual to Formal:") \
    .setInputCols(["documents"]) \
    .setMaxOutputLength(200) \
    .setOutputCol("transfers")

pipeline = Pipeline().setStages([documentAssembler, t5])


t5_informal_to_formal_styletransfer download started this may take some time.
Approximate size to download 881.2 MB
[OK!]


In [7]:
pipeline_model = pipeline.fit(spark.createDataFrame([['']]).toDF('text'))
T5model = LightPipeline(pipeline_model)

example_txt = """

btw - ur looks familiar.
"""

res = T5model.fullAnnotate(example_txt)[0]


print ('Prediction:', res['transfers'][0].result)

Before _validateStagesInputCols
Prediction: By the way, your appearance is familiar.


# T5 for Passive to Active Style Transfer

In [8]:
documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

t5 = T5Transformer.pretrained("t5_passive_to_active_styletransfer") \
    .setTask("transfer Passive to Active:") \
    .setInputCols(["documents"]) \
    .setMaxOutputLength(200) \
    .setOutputCol("transfers")

pipeline = Pipeline().setStages([documentAssembler, t5])


t5_passive_to_active_styletransfer download started this may take some time.
Approximate size to download 253.2 MB
[OK!]


In [9]:
pipeline_model = pipeline.fit(spark.createDataFrame([['']]).toDF('text'))
T5model = LightPipeline(pipeline_model)

example_txt = """

The flat tire was changed by Sue.
"""

res = T5model.fullAnnotate(example_txt)[0]


print ('Prediction:', res['transfers'][0].result)

Before _validateStagesInputCols
Prediction: Sue changed the flat tire.
