

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/T5TRANSFORMER_TRANSLATION.ipynb)




# **Text Translation using google's T5 Transformer**

### Spark NLP documentation and instructions:
https://nlp.johnsnowlabs.com/docs/en/quickstart

### Spark NLP Google T5 Article 	
https://towardsdatascience.com/hands-on-googles-text-to-text-transfer-transformer-t5-with-spark-nlp-6f7db75cecff

### You can find details about Spark NLP annotators here:
https://nlp.johnsnowlabs.com/docs/en/annotators

### You can find details about Spark NLP models here:
https://nlp.johnsnowlabs.com/models


## 1. Colab Setup

In [1]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.1.2 spark-nlp

## 2. Start the Spark session

Import dependencies and start Spark session.

In [2]:
import json
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp

spark = sparknlp.start()

## 3. Select the DL model

For complete model list: 
https://nlp.johnsnowlabs.com/models

For `T5` models:
https://nlp.johnsnowlabs.com/models?tag=t5

##4. Text Translation using T5 Transformer - English to German

 Define Spark NLP pipeline

In [3]:
from sparknlp.annotator import *
from sparknlp.base import *

from pyspark.ml import Pipeline

document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("documents")

sentence_detector = SentenceDetectorDLModel().pretrained()\
  .setInputCols("documents")\
  .setOutputCol("sentence")
  
t5 = T5Transformer().pretrained("t5_small", 'en') \
  .setInputCols(["sentence"]) \
  .setOutputCol("translation")\
  .setTask("translate English to German:")\
  .setMaxOutputLength(200)
  
pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    t5
])

data = spark.createDataFrame([
  [1, "My name is Spark NLP! It's nice to meet you."],
  [2, "My name is Wolfgang and I live in Berlin"]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("translation.result").show(truncate=False)

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
t5_small download started this may take some time.
Approximate size to download 139 MB
[OK!]
+---------------------------------------------------------+
|result                                                   |
+---------------------------------------------------------+
|[Mein Name ist Spark NLP!, Es ist schön, Sie zu treffen.]|
|[Mein Name ist Wolfgang und ich lebe in Berlin.]         |
+---------------------------------------------------------+



##5. Text Translation using T5 Transformer - English to French

In [4]:
from sparknlp.annotator import *
from sparknlp.base import *

from pyspark.ml import Pipeline

document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("documents")

sentence_detector = SentenceDetectorDLModel().pretrained()\
  .setInputCols("documents")\
  .setOutputCol("sentence")
  
t5 = T5Transformer().pretrained("t5_small", 'en') \
  .setInputCols(["sentence"]) \
  .setOutputCol("translation")\
  .setTask("translate English to French:")\
  .setMaxOutputLength(200)
  
pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    t5
])

data = spark.createDataFrame([
  [1, "My name is Spark NLP! It's nice to meet you."],
  [2, "My name is Wolfgang and I live in Berlin"]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("translation.result").show(truncate=False)

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
t5_small download started this may take some time.
Approximate size to download 139 MB
[OK!]
+------------------------------------------------------------+
|result                                                      |
+------------------------------------------------------------+
|[Mon nom est Spark NLP!, C'est agréable de vous rencontrer.]|
|[Mon nom est Wolfgang et je réside à Berlin.]               |
+------------------------------------------------------------+

