## Content

1. [Machine Translation](#1.-Machine-Translation)
2. [Neural Machine Translation](#2.-Neural-Machine-Translation)
3. [Loading the dataset](#3.-Loading-the-dataset)
4. [Preparing the Text Data](#4.-Preparing-the-Text-Data)
5. [Build, Train Model](#5.-Build,-Train-Model)
6. [Evaluate Model](#6.-Evaluate-Model)

## 1. Machine Translation

When we hear about language translation, the first thing that comes to mind is Google Translate!!!, it's such a life savior. It has changed the world by allowing people to communicate when it wouldn’t otherwise be possible.<br>
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.(Source - Wikipedia)<br><br>
***
As famously quoted by the late Susan Sontag,<br>
> _"Translation is the circulatory system of the world's literatures"_
***


Machine translation is the task of automatically converting source text in one language to text in another language. Given a sequence of text in a source language, there is no one single best translation of that text to another language. This is because of the natural ambiguity and flexibility of human language. It is a challenging task that traditionally involves large statistical models developed using highly sophisticated linguistic knowledge.<br>
**Classical machine translation** methods often involve rules for converting text in the source language to the target language. The rules are often developed by linguists and may operate at the lexical, syntactic, or semantic level. This focus on rules gives the name to this area of study: Rule-based Machine Translation, or RBMT. The key limitations of the classical machine translation approaches are both the expertise required to develop the rules, and the vast number of rules and exceptions required.<br>


## 2. Neural Machine Translation

Neural machine translation, or NMT for short, is the use of neural network models to learn a statistical model for machine translation. The key benefit to the approach is that a single system can be trained directly on source and target text, no longer requiring the pipeline of specialized systems used in statistical machine learning.<br>
As such, neural machine translation systems are said to be _end-to-end systems_ as only one model is required for the translation.


### Encoder-Decoder Model

Multilayer Perceptron neural network models can be used for machine translation, although the models are limited by a fixed-length input sequence where the output must be the same length.<br>
These early models have been greatly improved upon recently through the use of recurrent neural networks(RNNs) organized into an encoder-decoder architecture that allow for variable length input and output sequences.<br>
As stated in [Neural Machine Translation by Jointly Learning to Align and Translate, 2014](https://arxiv.org/abs/1409.0473),
>An encoder neural network reads and encodes a source sentence into a fixed-length vector. A decoder then outputs a translation from the encoded vector. The whole encoder–decoder system, which consists of the encoder and the decoder for a language pair, is jointly trained to maximize the probability of a correct translation given a source sentence.

Key to the encoder-decoder architecture is the ability of the model to encode the source text into an internal fixed-length representation called the context vector. Interestingly, once encoded, different decoding systems could be used, in principle, to translate the context into different languages.

### Encoder-Decoders with Attention

Although effective, the Encoder-Decoder architecture has problems with long sequences of text to be translated. The problem stems from the fixed-length internal representation that must be used to decode each word in the output sequence.<br>
The solution is the use of an attention mechanism that allows the model to learn where to place attention on the input sequence as each word of the output sequence is decoded.<br>
> Using a fixed-sized representation to capture all the semantic details of a very long sentence is very difficult. A more efficient approach, however, is to read the whole sentence or paragraph, then to produce the translated words one at a time, each time focusing on a different part of the input sentence to gather the semantic details required to produce the next output word.

The encoder-decoder recurrent neural network architecture with attention is currently the state-of-the-art on some benchmark problems for machine translation. And this architecture is used in the heart of the Google Neural Machine Translation system, or GNMT, used in their Google Translate service.Although effective, the neural machine translation systems still suffer some issues, such as scaling to larger vocabularies of words and the slow speed of training the models.

We will develop a neural machine translation system for translating German phrases to English. 

## 3. Loading the Dataset

The dataset is available on [ManyThings.org](http://www.manythings.org/anki/) which comprises of German phrases and their English counterparts.<br><br>
Dataset: German–English deu-eng.zip<br>

In [2]:
#Unzipping the file and putting it in destination folder
!unzip 'data/deu-eng.zip' -d 'data/'

Archive:  data/deu-eng.zip
  inflating: data/deu.txt            
  inflating: data/_about.txt         


Unzipping the file returns a file _deu.txt_ that has pairs of English to German phrases, one per line with a tab separating the language. We will frame the prediction problem as given a sequence of words in German as input, translate or predict the sequence of words in English.

## 4. Preparing the Text Data

The next step is to prepare the text data ready for modeling. It's very important to know the steps we will be performing as far as data cleaning process is concerned since it can vary depending on the datasets.<br>
Few observations from the dataset we have at our disposal,
- Symbols like punctuation and other special characters
- The text contains uppercase and lowercase.
- There are duplicate phrases in English with different translations in German.
- The file is ordered by sentence length with very long sentences toward the end of the file.

A good text cleaning procedure may handle some or all of these observations. Data preparation is divided into two subsections:
- Clean Text
- Split Text

### Clean Text

## 5. Build, Train Model

## 6. Evaluate Model