# MarianMT - Pytorch
This notebook shows how to compile a pre-trainded MarianMT/Pytorch to AWS Inferentia (inf1 instances) using NeuronSDK. The original implementation is provided by HuggingFace.

**Reference:** https://huggingface.co/Helsinki-NLP/opus-mt-en-de

## 1) Install dependencies

In [None]:
# Set Pip repository  to point to the Neuron repository
%pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
# now restart the kernel

In [None]:
#Install Neuron PyTorch
%pip install -U torch-neuron==1.7.* "protobuf<4" "transformers==4.0.1" neuron-cc[tensorflow] sentencepiece
# use --force-reinstall if you're facing some issues while loading the modules
# now restart the kernel again

## 2) Initialize libraries and prepare input samples

In [None]:
import sys
if not ".." in sys.path: sys.path.append("..")
    
from transformers import MarianMTModel, MarianTokenizer, MarianConfig

model_name='Helsinki-NLP/opus-mt-en-de'   # English -> German model
num_texts = 1                             # Number of input texts to decode
num_beams = 4                             # Number of beams per input text
max_encoder_length = 32                   # Maximum input token length
max_decoder_length = 32                   # Maximum output token length

tokenizer = MarianTokenizer.from_pretrained(model_name)

text='I am a small frog'

## 3) Load a pre-trained model


In [None]:
import torch
import torch.neuron
from common.wrapper import infer, NeuronGeneration
model = MarianMTModel.from_pretrained(model_name)
model.eval()

infer(model, tokenizer, text, num_beams, max_encoder_length, max_decoder_length)

## 4) Compile the model for Inferentia with NeuronSDK

This model is very complex, so we'll use a wrapper around the decoder and encoder sub-modules. This wrapper was  extracted [from this implementation](https://github.com/aws/aws-neuron-sdk/blob/master/src/examples/pytorch/transformers-marianmt.ipynb) to make the model traceable.

For more details, please check the wrapper source code: [wrapper.py](../common/wrapper.py)

The PyTorch-Neuron trace Python API provides a method to generate PyTorch models for execution on Inferentia, which can be serialized as TorchScript. It is analogous to torch.jit.trace() function in PyTorch.
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html?highlight=trace

In [None]:
model_neuron = NeuronGeneration(model.config)
neuron_name='marianmt_en2gb'
# 1. Compile the model
# Note: This may take a couple of minutes since both the encoder/decoder will be compiled
model_neuron.trace(
    model=model,
    num_texts=num_texts,
    num_beams=num_beams,
    max_encoder_length=max_encoder_length,
    max_decoder_length=max_decoder_length,
)

# 2. Serialize an artifact
# After this call you will have an `encoder.pt`, `decoder.pt` and `config.json` in the neuron_name folder
model_neuron.save_pretrained(neuron_name)

## 5) A simple test to check the predictions

In [6]:
infer(model_neuron, tokenizer, text, num_beams, max_encoder_length, max_decoder_length)

Texts:
1 Ich bin ein kleiner Frosch
2 Ich bin ein kleiner Frosch.
3 Ich bin ein kleiner Frosch!
4 - Ich bin ein kleiner Frosch.
