<center> <font size = 24 color = 'steelblue'> <b>Machine Translation<br>


![Image Description](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/NLP_November/Lesson%205/Machine%20translation.jpg)

## Overview:

The goal is to build a machine translation pipeline by leveraging embeddings to translate an English dictionary to French. It involves loading necessary libraries and embeddings, working with embedding vectors, and using cosine similarity to measure semantic similarity. Additionally, gradient computation optimizes the transformation matrix for effective translation.

# <a id= 'f0'>
<font size = 4>
    
**Table of Contents:**<br>
[1. Introduction](#f1)<br>
[2. Loading libraries](#f2)<br>
[3. Loading embeddings](#f3)<br>
[4. Translating English dictionary to French](#f4)<br>
> [4.1 Working with embeddings](#f4.1)<br>
> [4.2 Computing the gradient of loss in respect to transform matrix R](#f4.2)<br>
[3. Cosine Similarity](#f3)<br>

##### <a id = 'f1'>
<font size = 10 color = 'midnightblue'> **Introduction**

<div class="alert alert-block alert-success">
<font size = 4>

- Machine translation involves the use of automated systems to translate text or speech from one language to another.
- NLP plays a crucial role in understanding, interpreting, and generating human language in a way that considers context and meaning.
- NLP techniques are employed to enhance the quality and accuracy of machine translation systems.
- NLP helps in addressing linguistic nuances, context understanding, and idiosyncrasies specific to each language.

##### <a id = 'f2'>
<font size = 10 color = 'midnightblue'> **Load the Libraries**

In [None]:
# install below libraries if not done
!pip install numpy==1.23.5
!pip install tensorflow==2.13.1
!pip install nltk==3.8.1
!pip install pandas==1.5.3
!pip install gensim==4.2.0
!pip install scipy==1.9.3
!pip install matplotlib==3.6.3
!pip install scikit-learn==1.3.1

In [None]:
import nltk
import pdb
import pickle
import string
import pandas as pd
import time
import gensim
import matplotlib.pyplot as plt
import numpy as np
import scipy

from gensim.models import KeyedVectors
from nltk.corpus import stopwords, twitter_samples
from nltk.tokenize import TweetTokenizer
import re
from nltk.stem import PorterStemmer

from sklearn.metrics.pairwise import cosine_similarity

In [None]:
nltk.download('stopwords')
nltk.download('twitter_samples')

In [None]:
twitter_samples.fileids()

In [None]:
data = twitter_samples.strings('positive_tweets.json')

##### <a id = 'f3'>
<font size = 10 color = 'midnightblue'> **Load English and French Embeddings**

In [None]:
en_emb_subset = pickle.load(open("en_embeddings.p", 'rb'))
fr_emb_subset = pickle.load(open("fr_embeddings.p", 'rb'))

In [None]:
file =  pd.read_csv('en-fr.train.txt', delimiter = ' ', header =None, index_col = [0]).squeeze('columns')
eng_to_fr_dict_train =  file.to_dict()

In [None]:
len(eng_to_fr_dict_train)

In [None]:
file2 =  pd.read_csv('en-fr.test.txt', delimiter = ' ', header =None, index_col = [0]).squeeze('columns')
eng_to_fr_dict_test =  file2.to_dict()

In [None]:
len(en_emb_subset)

[top](#f0)

##### <a id= 'f4'>
<font size = 10 color = 'midnightblue'> **Translating English Dictionary to French** <br>


##### <a id = 'f4.1'>
<font size = 6 color = 'pwdrblue'> <b>Working with embeddings

<div class="alert alert-block alert-success">
<font size = 4>
    
- Generate a matrix where where the columns are the English embeddings.
- Generate a matrix where the columns correspond to the French embeddings.
- Generate the projection matrix that minimizes the F norm ||X R -Y||^2.

> - The goal is often to find a transformation matrix that minimizes the difference between two matrices.
> - The Frobenius norm is a way to measure the "size" or magnitude of a matrix.

In [None]:
# get the set of words of English

eng_words = en_emb_subset.keys()
fr_words = fr_emb_subset.keys()

<font size = 5 color = 'seagreen'> <b>Check whether embedding is present for both the English and French words present in translations dictionary

In [None]:
eng_emb =[]
frnch_emb = []

for eng, fr in eng_to_fr_dict_train.items():
    if (eng in eng_words) and (fr in fr_words):
       # get the embeddings and store
        eng_emb.append(en_emb_subset[eng])
        frnch_emb.append(fr_emb_subset[fr])

<font size = 5 color = 'seagreen'> <b>Create English and French Embedded Matrix

In [None]:
X = np.vstack(eng_emb)
X.shape

In [None]:
Y = np.vstack(frnch_emb)
Y.shape

<font size = 5 color = 'seagreen'> <b>Translation

<div class="alert alert-block alert-success">
<font size = 4>
    
The loss function will be squared Frobenius norm of the difference between
matrix and its approximation, divided by the number of training examples $m$.
</div>

<font size = 5>
$$ L(X, Y, R)=\frac{1}{m}\sum_{i=1}^{m} \sum_{j=1}^{n}\left( a_{i j} \right)^{2}$$


<font size = 4>
    
<center> where $a_{i j}$ is value in $i$th row and $j$th column of the matrix $\mathbf{XR}-\mathbf{Y}$.

##### <a id = 'f4.2'>
<font size = 6 color = 'pwdrblue'> <b>Computing the gradient of loss in respect to transform matrix R

<div class="alert alert-block alert-success">
<font size = 4>
    
* Calculate the gradient of the loss with respect to transform matrix `R`.
* The gradient is a matrix that encodes how much a small change in `R`
affect the change in the loss function.
* The gradient gives us the direction in which we should decrease `R`
to minimize the loss.
* $m$ is the number of training examples (number of rows in $X$).
* The formula for the gradient of the loss function $𝐿(𝑋,𝑌,𝑅)$ is:

$$\frac{d}{dR}𝐿(𝑋,𝑌,𝑅)=\frac{d}{dR}\Big(\frac{1}{m}\| X R -Y\|_{F}^{2}\Big) = \frac{2}{m}X^{T} (X R - Y)$$



[top](#f0)

The below code implements a simple gradient descent algorithm to optimize a transformation matrix R for minimizing the mean squared error (MSE) between the transformed input data X @ R and the target data Y.

#### Initialization:

- A random matrix R of shape (X.shape[1], X.shape[1]).
- Number of training steps (train_steps = 600) and learning rate (learning_rate = 0.8).

#### Training Loop:

- For every even iteration, the code computes the loss (MSE) and prints it.
- The gradient of the loss with respect to R is computed and used to update R in the direction that reduces the loss.
This process iteratively adjusts R to minimize the difference between the predicted and actual values in Y.

In [None]:
R =  np.random.rand(X.shape[1], X.shape[1])
train_steps =600
learning_rate  = 0.8

for i in range(train_steps+1):
    if i%2 ==0:
        diff = (X @ R) -Y
        sq_diff = diff**2
        loss = np.sum(sq_diff)/X.shape[0]
        print(f"loss at iteration {i} is : {loss:.3f}")
    gradient =  np.dot(X.transpose(),np.dot(X,R) -Y)*(2/X.shape[0])
    R = R - learning_rate*gradient

In [None]:
R

[top](#f0)

In [None]:
pred = np.dot(X,R) # finding dot product between X and R

In [None]:
pred

The provided code checks how well a transformation matrix R maps input data X to target data Y using cosine similarity.

- cosine_similarity(): Calculates how similar two vectors are.

- nearest_neighbor(): Finds the vector in a list (candidates) that is most similar to a given vector v.

- test_vocab():

>> - Transforms X using the matrix R.
>> - For each transformed vector, it finds the closest match in Y.
>> - Measures the accuracy by checking if the closest match is correct.

In [None]:
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2)/(np.linalg.norm(vec1)*np.linalg.norm(vec2))

def nearest_neighbor(v, candidates, k=1):
    return np.argsort([cosine_similarity(v,row) for row in candidates])[-k:]

def test_vocab(X,Y,R):
    pred = np.dot(X,R)
    return sum([nearest_neighbor(row,Y)==index for index, row in enumerate(pred)])/len(pred)

test_vocab(X,Y,R)