

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/multilingual/Translation_Marian.ipynb)

# Translate text with the Marian Transformer

## 1. Colab Setup

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.3.1 spark-nlp

# Install Spark NLP Display lib
! pip install --upgrade -q spark-nlp-display

## 2. Start the Spark session

Import dependencies and start Spark session.

In [None]:
import json
import pandas as pd
import numpy as np

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

spark = sparknlp.start()

## 3. Select the DL model

For complete model list: 
https://nlp.johnsnowlabs.com/models

For `Translation` models:
https://nlp.johnsnowlabs.com/models?tag=translation

## 4. A sample text in Italian for demo - we'll translate Italian text to English

In [None]:
text = """La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi."""

## 5. Define Spark NLP pipeline

In [None]:
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

## More accurate Sentence Detection using Deep Learning
sentencerDL = SentenceDetectorDLModel()\
.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentences")

marian = MarianTransformer.pretrained("opus_mt_it_en", "xx")\
.setInputCols(["sentences"])\
.setOutputCol("translation")

nlp_pipeline = Pipeline(stages=[
    documentAssembler,
    sentencerDL, marian
])

sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[OK!]
opus_mt_it_en download started this may take some time.
Approximate size to download 454.8 MB
[OK!]


## 6. Run the pipeline

In [None]:
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = nlp_pipeline.fit(empty_df)
lmodel = LightPipeline(pipeline_model)
res = lmodel.fullAnnotate(text)

## 7. Visualize results

In [None]:
print ('Original:', text, '\n\n')

print ('Translated:\n')
for sentence in res[0]['translation']:
  print (sentence.result)

Original: La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi. 


Translated:

La Gioconda is an oil painting of the sixteenth century created by Leonardo.
It's held at the Louvre in Paris.
