<a href="https://colab.research.google.com/github/amantayal44/Hindi-to-English-NMT/blob/main/phase1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## NLP

## Setup and Installation

In [None]:
# for storing and loading file directly from google drive
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
!git clone "https://github.com/anoopkunchukuttan/indic_nlp_library"
!git clone https://github.com/anoopkunchukuttan/indic_nlp_resources.git
!pip install Morfessor
INDIC_NLP_LIB_HOME=r"/content/indic_nlp_library"
INDIC_NLP_RESOURCES="/content/indic_nlp_resources"

fatal: destination path 'indic_nlp_library' already exists and is not an empty directory.
fatal: destination path 'indic_nlp_resources' already exists and is not an empty directory.


In [None]:
!pip install nltk -U
!python3 -m spacy download en

Requirement already up-to-date: nltk in /usr/local/lib/python3.7/dist-packages (3.5)
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.7/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.7/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


In [None]:
!pip install revtok



In [None]:
import sys
sys.path.append(r'{}'.format(INDIC_NLP_LIB_HOME))
from indicnlp import common
common.set_resources_path(INDIC_NLP_RESOURCES)
from indicnlp import loader
loader.load()
from tqdm import tqdm
import nltk
nltk.download('wordnet')
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.meteor_score import single_meteor_score
from indicnlp.tokenize import indic_tokenize 
import csv 
import re
import warnings
# warnings.filterwarnings("ignore") #uncomment only if code is done

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


## Creating and Preprocessing Data

In [None]:
dataset = []
with open("gdrive/MyDrive/train.csv",encoding="utf-8") as f:
  csv_reader = csv.reader(f, delimiter=',')
  i = 0
  for r in csv_reader:
    if i == 0:
      i = 1
      continue
    dataset.append([r[1],r[2]])

In [None]:
#non hindi symbols
non_hindi_chr = ['♫', '#', '$', '%', '&', '£', '¥', '§', '©', 'Â', 'è', 'Ã', '€','[',']']

In [None]:
# function to clean data
def clean_data(dataset,max_length=20):
  # remove dataset that has non ascii character in english part and keep sentences that has length less than max_length
  new_dataset = []
  i = 0
  for data in dataset:
    l_1 = len(indic_tokenize.trivial_tokenize(data[0]))
    l_2 = len(data[1].split(" "))
    check_chr = True  
    for nh in non_hindi_chr:
      if nh in data[0]:
        check_chr = False
        break

    if re.search(r'[^\x00-\x7F]+',data[1]) == None and max(l_1,l_2) <= max_length and check_chr:
      new_dataset.append(data)
    elif i<5:
      if i == 0: print("Some removed datasets")
      i += 1
      print("{}. \"{}\" , \"{}\"".format(i,data[0],data[1]))
  print("removed {} of {} datasets".format(len(dataset)-len(new_dataset),len(dataset)))
  return new_dataset

In [None]:
#to preprocess english sentence
def preprocess_eng(sentence):
  sentence = sentence.lower().strip() #lower case letters
  # removing shortforms
  sentence = re.sub(r"i'm","i am",sentence)
  sentence = re.sub(r"let's","let us",sentence)
  sentence = re.sub(r"\'ll", " will", sentence)
  sentence = re.sub(r"\'ve", " have", sentence)
  sentence = re.sub(r"\'re", " are", sentence)
  sentence = re.sub(r"\'d", " would", sentence)
  sentence = re.sub(r"\'re", " are", sentence)
  sentence = re.sub(r"n't"," not",sentence)

  sentence = re.sub(r"([?.!,])", r" \1 ", sentence) #creating space b/w punctuation
  sentence = re.sub(r'[" "]+', " ", sentence) # removing multiple places
  sentence = sentence.strip()
  return sentence

# some corresponding postprocess to increase score
def postprocess_eng(sentence,remove_unk=False):
  sentence = sentence.capitalize()
  sentence = re.sub(r" i ",r" I ",sentence)
  sentence = re.sub(r" ([?.!,])",r"\1",sentence)
  if remove_unk: sentence = re.sub(r" <unk> ",r" ",sentence)
  return sentence


In [None]:
def data_preprocessing(dataset,max_length=20):
  new_dataset = []
  for data in dataset:
    new_dataset.append([data[0],preprocess_eng(data[1]),data[1]])
  new_dataset = clean_data(new_dataset,max_length)
  # comparing change in bleu score and meteor score
  total_bleu_score = 0
  total_meteor_score = 0
  for i in tqdm(range(len(new_dataset))):
    total_bleu_score += sentence_bleu([new_dataset[i][2].split(" ")], postprocess_eng(new_dataset[i][1]).split(" "))
    total_meteor_score += single_meteor_score(new_dataset[i][2],postprocess_eng(new_dataset[i][1]))

  l = len(new_dataset)
  print("\nbleu score {}".format(round(total_bleu_score/l,2)))
  print("meteor score {}".format(round(total_meteor_score/l,2)))

  return new_dataset


In [None]:
orginal_dataset = dataset
dataset = data_preprocessing(dataset)

Some removed datasets
1. "एल सालवाडोर मे, जिन दोनो पक्षों ने सिविल-युद्ध से वापसी ली, उन्होंने वही काम किये जो कैदियों की कश्मकश के निदान हैं।" , "in el salvador , both sides that withdrew from their civil war took moves that had been proven to mirror a prisoner's dilemma strategy ."
2. "पर मेरे लिए उसका यहुदी विरोधी होना उसके कार्यों को और भी प्रशंसनीय बनाता है क्योंकि उसके पास भी पक्षपात करने के वही कारण थे जो बाकी फौजियों के पास थे पर उसकी सच जानने और उसे बनाए रखने की प्रेरणा सबसे ऊपर थी" , "but personally , for me , the fact that picquart was anti-semitic actually makes his actions more admirable , because he had the same prejudices , the same reasons to be biased as his fellow officers , but his motivation to find the truth and uphold it trumped all of that ."
3. "नहीं, नहीं, नहीं... ठीक है, हम उह हूँ... हम कार्ड का उपयोग करेंगे." , "no , no , no . . . fine , we will uh . . . we will use the card ."
4. "तो स्मार्ट में, हमारे पास लक्ष्य के अलावा, मलेरिया टीका विकसित करने के, हम अफ्

  0%|          | 0/82889 [00:00<?, ?it/s]

removed 19433 of 102322 datasets


The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 82889/82889 [00:17<00:00, 4798.62it/s]


bleu score 0.56
meteor score 0.92





In [None]:
from sklearn.model_selection import train_test_split

In [None]:
train_set,val_set = train_test_split(dataset,test_size=0.2,random_state=42)
val_set,test_set = train_test_split(val_set,test_size=0.5,random_state=42)

In [None]:
len(train_set),len(val_set),len(test_set)

(66311, 8289, 8289)

## Tokenization

In [None]:
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import Vocab
import torch
import random
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
#using simple tokenizer (will use subword later)
eng_tokenizer=get_tokenizer('spacy', language='en')
hindi_tokenizer = get_tokenizer(indic_tokenize.trivial_tokenize)

In [None]:
def get_vocab(dataset,eng_tokenizer,hindi_tokenizer,max_size_eng=5000,max_size_hindi=5000):
  eng_counter = Counter()
  hindi_counter = Counter()
  for data in tqdm(dataset):
    
    eng_counter.update(eng_tokenizer(data[1]))
    hindi_counter.update(hindi_tokenizer(data[0]))
  eng_vocab = Vocab(eng_counter,max_size=max_size_eng,specials=('<pad>','<unk>','<eos>','<sos>'))
  hindi_vocab = Vocab(hindi_counter,max_size=max_size_hindi,specials=('<pad>','<unk>','<eos>','<sos>'))
  return eng_vocab,hindi_vocab

In [None]:
eng_vocab,hindi_vocab = get_vocab(train_set,eng_tokenizer,hindi_tokenizer,2**13,2**13)

100%|██████████| 66311/66311 [00:04<00:00, 15557.51it/s]


In [None]:
def tokenize(dataset,eng_tokenizer,hindi_tokenizer,eng_vocab,hindi_vocab):
  tokenized_data = []
  for data in dataset:
    eng_data = torch.tensor([eng_vocab['<sos>']]+[eng_vocab[t] for t in eng_tokenizer(data[1])]+[eng_vocab['<eos>']], dtype=torch.long)
    hindi_data = torch.tensor([hindi_vocab['<sos>']]+[hindi_vocab[t] for t in hindi_tokenizer(data[0])]+[hindi_vocab['<eos>']], dtype=torch.long)
    tokenized_data.append([hindi_data,eng_data])
  return tokenized_data

In [None]:
tokenized_train_data= tokenize(train_set,eng_tokenizer,hindi_tokenizer,eng_vocab,hindi_vocab)
tokenized_val_data= tokenize(val_set,eng_tokenizer,hindi_tokenizer,eng_vocab,hindi_vocab)
tokenized_test_data = tokenize(test_set,eng_tokenizer,hindi_tokenizer,eng_vocab,hindi_vocab)

In [None]:
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader

BATCH_SIZE = 128
pad_hindi = hindi_vocab['<pad>']
pad_eng = eng_vocab['<pad>']

In [None]:
def get_data(data):
  hindi_data = []
  eng_data = []
  for hindi_sen,eng_sen in data:
    hindi_data.append(hindi_sen)
    eng_data.append(eng_sen)
  hindi_data = pad_sequence(hindi_data,padding_value=pad_hindi)
  eng_data = pad_sequence(eng_data,padding_value=pad_eng)
  return hindi_data,eng_data

In [None]:
train_data = DataLoader(tokenized_train_data, batch_size=BATCH_SIZE,shuffle=True,collate_fn=get_data)
val_data = DataLoader(tokenized_val_data, batch_size=BATCH_SIZE,shuffle=True,collate_fn=get_data)
test_data = DataLoader(tokenized_test_data, batch_size=BATCH_SIZE,shuffle=True,collate_fn=get_data)

##Model

In [None]:
from torch import nn
from torch.nn import LSTM,GRU,Linear,Embedding
import torch.optim as optim

### Seq2Seq

**Encoder**

In [None]:
class Encoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,num_layers=1,dropout=0.5,typ = "LSTM"):
    super().__init__()
    assert typ in ["LSTM","GRU"]
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.num_layers = num_layers
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.typ = typ
    if typ == "LSTM": self.rnn = LSTM(emb_size,hid_size,num_layers,dropout=dropout)
    if typ == "GRU": self.rnn = GRU(emb_size,hid_size,num_layers,dropout=dropout)
  
  def forward(self,input):
    embedded = self.dropout(self.embedding(input))
    if self.typ == "LSTM":
      outputs,(h,c) = self.rnn(embedded)
      # print(h.shape,c.shape)
      return h,c
    if self.typ == "GRU":
      outputs,h = self.rnn(embedded)
      return h


**Decoder**

In [None]:
class Decoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,num_layers=1,dropout=0.5,typ = "LSTM"):
    super().__init__()
    assert typ in ["LSTM","GRU"]
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.num_layers = num_layers
    self.typ = typ
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    if typ == "LSTM": self.rnn = LSTM(emb_size,hid_size,num_layers,dropout=dropout)
    if typ == "GRU": self.rnn = GRU(emb_size,hid_size,num_layers,dropout=dropout)
    self.out = Linear(hid_size,vocab_size)

  def forward(self,input,h,c=None):
    input = input.unsqueeze(0)
    embedded = self.dropout(self.embedding(input))
    if self.typ == "LSTM":
      output,(h,c) = self.rnn(embedded,(h,c))
      output = self.out(output.squeeze(0))
      return output,(h,c)
    if self.typ == "GRU":
      output,h = self.rnn(embedded,h)
      output = self.out(output.squeeze(0))
      return output,h

**Seq2Seq**

In [None]:
class seq2seq(nn.Module):
  def __init__(self,device,e_vocab_size,d_vocab_size,emb_size=256,hid_size=512,num_layers=1,e_type="LSTM",d_type="LSTM",dropout=0.5):
    super().__init__()
    # if decoder is then encoder should be LSTM to get h and c vectors
    if d_type=="LSTM": assert e_type == "LSTM"
    self.e_type = e_type
    self.d_type = d_type
    self.d_vocab_size = d_vocab_size
    self.e_vocab_size = e_vocab_size
    self.encoder = Encoder(e_vocab_size,emb_size,hid_size,num_layers,dropout,typ=e_type)
    self.decoder = Decoder(d_vocab_size,emb_size,hid_size,num_layers,dropout,typ=d_type)
    self.device = device

  def forward(self,src,target,teacher_forcing_ratio = 0.5):
    batch_size = target.shape[1]
    len = target.shape[0]

    output = torch.zeros(len,batch_size,self.d_vocab_size).to(self.device)
    if self.e_type == "LSTM": h,c = self.encoder(src)
    if self.e_type == "GRU": h = self.encoder(src)

    input = target[0,:]
    for i in range(1,len):
      if self.d_type == "LSTM": out,(h,c) = self.decoder(input,h,c)
      if self.d_type == "GRU": out,h = self.decoder(input,h)
      output[i] = out
      force = random.random() < teacher_forcing_ratio
      if force: input = target[i]
      else: input = out.argmax(1) 
    
    return output

In [None]:
def inference_seq2seq(model,sentence,eng_vocab,hindi_vocab,max_len=40):
  model.eval()
  sentence = sentence.unsqueeze(1).to(device)
  with torch.no_grad():
    if model.e_type == "LSTM":
      h,c = model.encoder(sentence)
    if model.e_type == "GRU":
      h = model.encoder(sentence)
  output = [eng_vocab['<sos>']]
  for i in range(max_len):
    target = torch.tensor([output[-1]],dtype=torch.long).to(device)
    with torch.no_grad():
      if model.d_type == "LSTM":
        out,(h,c) = model.decoder(target,h,c)
      if model.d_type == "GRU":
        out,h = model.decoder(target,h)
    prediction = out.argmax(1).item()
    if prediction == eng_vocab['<eos>']:
      break
    output.append(prediction)
  return output[1:-1]
  

In [None]:
e_vocab_size = len(eng_vocab)
h_vocab_size = len(hindi_vocab)

In [None]:
# wieghts b/w uniform distribution -0.08 - 0.08
def init_weights(m):
    for name, param in m.named_parameters():
        nn.init.uniform_(param.data, -0.08, 0.08)

**Train**

In [None]:
def train(model,dataset,optimizer,loss_fn,clip=1):
  model.train()
  epoch_loss = 0
  for src,target in dataset:
    optimizer.zero_grad()
    src = src.to(device)
    target = target.to(device)
    output = model(src,target)
    target = target[1:].view(-1)
    output = output[1:].view(-1,output.shape[-1])
    loss = loss_fn(output,target)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
    optimizer.step()
    epoch_loss += loss.item()
  
  return epoch_loss/len(dataset)


In [None]:
def evaluate(model,dataset,loss_fn):
  model.eval()
  epoch_loss = 0
  with torch.no_grad():
    for src,target in dataset:
      src = src.to(device)
      target = target.to(device)
      output = model(src,target,0)
      target = target[1:].view(-1)
      output = output[1:].view(-1,output.shape[-1])
      loss = loss_fn(output,target)
      epoch_loss += loss.item()
  
  return epoch_loss/len(dataset)


In [None]:
def parameters_count(model):
    return sum(param.numel() for param in model.parameters() if param.requires_grad)

### LSTM model

In [None]:
lstm_model = seq2seq(device,e_vocab_size,h_vocab_size,emb_size=256,hid_size=512).to(device)
lstm_model.apply(init_weights)

  "num_layers={}".format(dropout, num_layers))


seq2seq(
  (encoder): Encoder(
    (embedding): Embedding(5004, 256)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(256, 512, dropout=0.5)
  )
  (decoder): Decoder(
    (embedding): Embedding(5004, 256)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(256, 512, dropout=0.5)
    (out): Linear(in_features=512, out_features=5004, bias=True)
  )
)

In [None]:
parameters_count(lstm_model)

8283020

In [None]:
optimizer = optim.Adam(lstm_model.parameters())
loss_fn = nn.CrossEntropyLoss(ignore_index = pad_eng)

NameError: ignored

In [None]:
import time

In [None]:
def get_time(start,end):
  t = end-start
  return int(t/60),int(t%60)

In [None]:
EPOCHS = 10
best_val = 1000
for epoch in range(EPOCHS):
  start = time.time()
  train_loss = train(lstm_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(lstm_model, val_data,loss_fn)  
  end = time.time()

  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(lstm_model.state_dict(), 'lstm_model.pt')



train loss: 3.755 val loss: 4.163
time taken by 1 epoch 0 min 41 s
train loss: 3.567 val loss: 4.044
time taken by 2 epoch 0 min 41 s
train loss: 3.394 val loss: 3.963
time taken by 3 epoch 0 min 41 s
train loss: 3.248 val loss: 3.931
time taken by 4 epoch 0 min 41 s
train loss: 3.128 val loss: 3.876
time taken by 5 epoch 0 min 41 s
train loss: 2.997 val loss: 3.852
time taken by 6 epoch 0 min 41 s
train loss: 2.896 val loss: 3.821
time taken by 7 epoch 0 min 41 s
train loss: 2.779 val loss: 3.800
time taken by 8 epoch 0 min 41 s
train loss: 2.687 val loss: 3.830
time taken by 9 epoch 0 min 41 s
train loss: 2.614 val loss: 3.836
time taken by 10 epoch 0 min 41 s


In [None]:
test_loss = evaluate(lstm_model,test_data,loss_fn)

In [None]:
test_loss

3.8345588097205527

In [None]:
torch.save(lstm_model.state_dict(), 'lstm_model_l.pt')

In [None]:
lstm_model.load_state_dict(torch.load('lstm_model.pt'))

<All keys matched successfully>

In [None]:
test_loss = evaluate(lstm_model,test_data,loss_fn)
test_loss

3.8025996098151573

In [None]:
tokenized_train_data[0][1]

tensor([   3,   34,    6,  246,  307,   18,   70,   46,   20,  460, 3395,   10,
           2])

In [None]:
inference_seq2seq(lstm_model,tokenized_train_data[0][0],eng_vocab,hindi_vocab)

[34, 6, 307, 307, 70, 70, 20, 20, 20, 3395]

In [None]:
for i in range(5):
  output = inference_seq2seq(lstm_model,tokenized_train_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  output = postprocess_eng(output)
  print("pred: {} actual: {}".format(output,train_set[i][2]))

pred: So the question question : : we we we neurogenesis actual: So the next question is: can we control neurogenesis?
pred: <unk> : ( : : ( laughter ) so you see, you see it actual: TZ: (Exhales) SB: Yay! (Laughter) You know, there's something interesting.
pred: Thank you actual: Thank you. Thank you.
pred: My dad, oh, I me just like my <unk> actual: Me oh my, my oh me, guess I'm having company
pred: You will me me, please actual: Will you let go of me, please?


In [None]:
#bleu score and meteor score on test set
total_bleu_score_p = 0
total_meteor_score_p = 0
total_bleu_score = 0
total_meteor_score = 0
for i in tqdm(range(len(test_set))):
  output = inference_seq2seq(lstm_model,tokenized_test_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  total_bleu_score += sentence_bleu([test_set[i][1].split(" ")], output.split(" "))
  total_bleu_score_p += sentence_bleu([test_set[i][2].split(" ")], postprocess_eng(output).split(" "))
  total_meteor_score += single_meteor_score(test_set[i][1],output)
  total_meteor_score_p += single_meteor_score(test_set[i][2],postprocess_eng(output))

l = len(test_set)
print("\nbleu score {}, bleu score with on actual {}".format(round(total_bleu_score/l,2),round(total_bleu_score_p/l,2)))
print("meteor score {}, meteor score with on actual {}".format(round(total_meteor_score/l,2),round(total_meteor_score_p/l,2)))


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 8289/8289 [00:42<00:00, 193.78it/s]


bleu score 0.02, bleu score with on actual 0.0
meteor score 0.22, meteor score with on actual 0.15





In [None]:
train_set[0][1]

'so the next question is: can we control neurogenesis ?'

LSTM with 2 layers

In [None]:
lstm2_model = seq2seq(device,e_vocab_size,h_vocab_size,emb_size=256,hid_size=512,num_layers=2).to(device)
lstm2_model.apply(init_weights)
optimizer = optim.Adam(lstm2_model.parameters())
loss_fn = nn.CrossEntropyLoss(ignore_index = pad_eng)

In [None]:
parameters_count(lstm2_model)

12485516

In [None]:
EPOCHS = 10
best_val = 1000
for epoch in range(EPOCHS):
  start = time.time()
  train_loss = train(lstm2_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(lstm2_model, val_data,loss_fn)  
  end = time.time()

  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(lstm2_model.state_dict(), 'lstm2_model.pt')

train loss: 2.867 val loss: 3.809
time taken by 1 epoch 0 min 56 s
train loss: 2.774 val loss: 3.786
time taken by 2 epoch 0 min 55 s
train loss: 2.681 val loss: 3.842
time taken by 3 epoch 0 min 56 s
train loss: 2.598 val loss: 3.827
time taken by 4 epoch 0 min 56 s
train loss: 2.519 val loss: 3.849
time taken by 5 epoch 0 min 55 s
train loss: 2.449 val loss: 3.861
time taken by 6 epoch 0 min 56 s
train loss: 2.375 val loss: 3.899
time taken by 7 epoch 0 min 56 s
train loss: 2.322 val loss: 3.881
time taken by 8 epoch 0 min 55 s
train loss: 2.273 val loss: 3.934
time taken by 9 epoch 0 min 56 s
train loss: 2.205 val loss: 3.919
time taken by 10 epoch 0 min 56 s


In [None]:
test_loss = evaluate(lstm_model,test_data,loss_fn)
test_loss

3.8025260338416467

In [None]:
torch.save(lstm2_model.state_dict(), 'lstm2_model1.pt')

In [None]:
lstm2_model.load_state_dict(torch.load('lstm2_model.pt'))

<All keys matched successfully>

In [None]:
test_loss = evaluate(lstm_model,test_data,loss_fn)
test_loss

3.803394985198975

In [None]:
total_bleu_score_p = 0
total_meteor_score_p = 0
total_bleu_score = 0
total_meteor_score = 0
for i in tqdm(range(len(test_set))):
  output = inference_seq2seq(lstm2_model,tokenized_test_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  total_bleu_score += sentence_bleu([test_set[i][1].split(" ")], output.split(" "))
  total_bleu_score_p += sentence_bleu([test_set[i][2].split(" ")], postprocess_eng(output).split(" "))
  total_meteor_score += single_meteor_score(test_set[i][1],output)
  total_meteor_score_p += single_meteor_score(test_set[i][2],postprocess_eng(output))

l = len(test_set)
print("\nbleu score {}, bleu score with on actual {}".format(round(total_bleu_score/l,2),round(total_bleu_score_p/l,2)))
print("meteor score {}, meteor score with on actual {}".format(round(total_meteor_score/l,2),round(total_meteor_score_p/l,2)))


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 8289/8289 [00:51<00:00, 162.00it/s]


bleu score 0.03, bleu score with on actual 0.01
meteor score 0.24, meteor score with on actual 0.17





Seq2Seq with biLSTM

In [None]:
class biEncoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5,out=None):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = GRU(emb_size,hid_size,bidirectional=True)
    self.out = Linear(2*hid_size,out)
  
  def forward(self,input):
    embedded = self.dropout(self.embedding(input))
    outputs,h = self.rnn(embedded)
    h = self.out(torch.cat((h[-2,:,:], h[-1,:,:]), dim = 1)).unsqueeze(0)
    return h


class biDecoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = GRU(emb_size,hid_size)
    self.out = Linear(hid_size,vocab_size)

  def forward(self,input,h):
    input = input.unsqueeze(0)
    embedded = self.dropout(self.embedding(input))
    output,h = self.rnn(embedded,h)
    output = self.out(output.squeeze(0))
    return output,h

class biseq2seq(nn.Module):
  def __init__(self,device,e_vocab_size,d_vocab_size,emb_size=256,hid_size_e=512,hid_size_d=512,dropout=0.5):
    super().__init__()
    self.d_vocab_size = d_vocab_size
    self.e_vocab_size = e_vocab_size
    self.encoder = biEncoder(e_vocab_size,emb_size,hid_size_e,dropout,out=hid_size_d)
    self.decoder = biDecoder(d_vocab_size,emb_size,hid_size_d,dropout)
    self.device = device

  def forward(self,src,target,teacher_forcing_ratio = 0.5):
    batch_size = target.shape[1]
    len = target.shape[0]

    output = torch.zeros(len,batch_size,self.d_vocab_size).to(self.device)
    h = self.encoder(src)

    input = target[0,:]
    for i in range(1,len):
      out,h = self.decoder(input,h)
      output[i] = out
      force = random.random() < teacher_forcing_ratio
      if force: input = target[i]
      else: input = out.argmax(1) 
    
    return output

In [None]:
def init_weights(model):
    for name, param in model.named_parameters():
        if 'weight' in name:
            nn.init.normal_(param.data, mean=0, std=0.01)
        else:
            nn.init.constant_(param.data, 0)

In [None]:
bigru_model = biseq2seq(device,e_vocab_size,h_vocab_size,emb_size=512,hid_size_e=512,hid_size_d=512).to(device)
bigru_model.apply(init_weights)

biseq2seq(
  (encoder): biEncoder(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): GRU(512, 512, bidirectional=True)
    (out): Linear(in_features=1024, out_features=512, bias=True)
  )
  (decoder): biDecoder(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): GRU(512, 512)
    (out): Linear(in_features=512, out_features=8196, bias=True)
  )
)

In [None]:
optimizer = optim.Adam(bigru_model.parameters())
loss_fn = nn.CrossEntropyLoss(ignore_index = pad_eng)

In [None]:
parameters_count(bigru_model)

17849860

In [None]:
EPOCHS = 15
best_val = 1000
for epoch in range(EPOCHS):
  start = time.time()
  train_loss = train(bigru_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(bigru_model, val_data,loss_fn)  
  end = time.time()

  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(bigru_model.state_dict(), 'bigru_8k_model.pt')

time taken by 1 epoch 1 min 6 s
train loss: 5.047 val loss: 4.770
time taken by 2 epoch 1 min 6 s
train loss: 4.223 val loss: 4.386
time taken by 3 epoch 1 min 6 s
train loss: 3.770 val loss: 4.191
time taken by 4 epoch 1 min 6 s
train loss: 3.424 val loss: 4.018
time taken by 5 epoch 1 min 6 s
train loss: 3.114 val loss: 3.954
time taken by 6 epoch 1 min 6 s
train loss: 2.841 val loss: 3.942
time taken by 7 epoch 1 min 6 s
train loss: 2.632 val loss: 3.949
time taken by 8 epoch 1 min 6 s
train loss: 2.453 val loss: 3.990
time taken by 9 epoch 1 min 6 s
train loss: 2.277 val loss: 4.030
time taken by 11 epoch 1 min 6 s
train loss: 2.043 val loss: 4.146
time taken by 12 epoch 1 min 6 s
train loss: 1.945 val loss: 4.173
time taken by 13 epoch 1 min 6 s
train loss: 1.855 val loss: 4.277
time taken by 14 epoch 1 min 6 s
train loss: 1.764 val loss: 4.331
time taken by 15 epoch 1 min 6 s
train loss: 1.698 val loss: 4.361


In [None]:
test_loss = evaluate(bigru_model,test_data,loss_fn)
print(test_loss)
torch.save(bigru_model.state_dict(), 'bigru_8k_model1.pt')
bigru_model.load_state_dict(torch.load('bigru_8k_model.pt'))
test_loss = evaluate(bigru_model,test_data,loss_fn)
print(test_loss)

4.372185395314143
3.9585409567906304


In [None]:
def inference_biseq2seq(model,sentence,eng_vocab,hindi_vocab,max_len=40):
  model.eval()
  sentence = sentence.unsqueeze(1).to(device)
  with torch.no_grad():
    h = model.encoder(sentence)
  output = [eng_vocab['<sos>']]
  for i in range(max_len):
    target = torch.tensor([output[-1]],dtype=torch.long).to(device)
    with torch.no_grad():
      out,h = model.decoder(target,h)
    prediction = out.argmax(1).item()
    if prediction == eng_vocab['<eos>']:
      break
    output.append(prediction)
  return output[1:-1]

In [None]:
ls gdrive/MyDrive/cs779_model/

bigru_8k_model.pt


In [None]:
bigru_model.load_state_dict(torch.load('gdrive/MyDrive/cs779_model/bigru_8k_model.pt'))
test_loss = evaluate(bigru_model,test_data,loss_fn)
print(test_loss)

3.9568456723139835


In [None]:
total_bleu_score_p = 0
total_meteor_score_p = 0
total_bleu_score = 0
total_meteor_score = 0
for i in tqdm(range(len(test_set))):
  output = inference_biseq2seq(bigru_model,tokenized_test_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  total_bleu_score += sentence_bleu([test_set[i][1].split(" ")], output.split(" "))
  total_bleu_score_p += sentence_bleu([test_set[i][2].split(" ")], postprocess_eng(output).split(" "))
  total_meteor_score += single_meteor_score(test_set[i][1],output)
  total_meteor_score_p += single_meteor_score(test_set[i][2],postprocess_eng(output,remove_unk=True))

l = len(test_set)
print("\nbleu score {}, bleu score with on actual {}".format(round(total_bleu_score/l,2),round(total_bleu_score_p/l,2)))
print("meteor score {}, meteor score with on actual {}".format(round(total_meteor_score/l,2),round(total_meteor_score_p/l,2)))


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 8289/8289 [00:46<00:00, 178.94it/s]


bleu score 0.04, bleu score with on actual 0.01
meteor score 0.27, meteor score with on actual 0.19





In [None]:
sample = []
with open("hindistatements-2.csv",encoding="utf-8") as f:
  csv_reader = csv.reader(f, delimiter=',')
  i = 0
  for r in csv_reader:
    if i == 0:
      i = 1
      continue
    sample.append(r[2])

In [None]:
def final_result(model,inference,sample,hindi_tokenizer,hindi_vocab,eng_vocab):
  result = []
  for s in sample:
    hindi_s = torch.tensor([hindi_vocab['<sos>']]+[hindi_vocab[t] for t in hindi_tokenizer(s)]+[hindi_vocab['<eos>']], dtype=torch.long)
    output = inference(model,hindi_s,eng_vocab,hindi_vocab)
    output = " ".join([eng_vocab.itos[t] for t in output])
    output = postprocess_eng(output)
    result.append(output)
  return result

In [None]:
bigru_result = final_result(bigru_model,inference_biseq2seq,sample,hindi_tokenizer,hindi_vocab,eng_vocab)

In [None]:
len(sample),len(bigru_result)

(5000, 5000)

In [None]:
sample[18],bigru_result[18]

('आप नीचे वहाँ आवश्यक उपकरण नहीं है.', 'You do not keep the there')

In [None]:
f = open("answer.txt", "w")
for s in bigru_result:
  f.write(s+"\n")
f.close()

In [None]:
class biEncoder_lstm(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5,out=None):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = LSTM(emb_size,hid_size,bidirectional=True)
    self.out_c = Linear(2*hid_size,out)
    self.out_h = Linear(2*hid_size,out)
  
  def forward(self,input):
    embedded = self.dropout(self.embedding(input))
    outputs,(h,c) = self.rnn(embedded)
    h = self.out_h(torch.cat((h[-2,:,:], h[-1,:,:]), dim = 1)).unsqueeze(0)
    c = self.out_c(torch.cat((c[-2,:,:], c[-1,:,:]), dim = 1)).unsqueeze(0)
    return h,c


class biDecoder_lstm(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = LSTM(emb_size,hid_size)
    self.out = Linear(hid_size,vocab_size)

  def forward(self,input,h,c):
    input = input.unsqueeze(0)
    embedded = self.dropout(self.embedding(input))
    output,(h,c) = self.rnn(embedded,(h,c))
    output = self.out(output.squeeze(0))
    return output,(h,c)

class biseq2seq_lstm(nn.Module):
  def __init__(self,device,e_vocab_size,d_vocab_size,emb_size=256,hid_size_e=512,hid_size_d=512,dropout=0.5):
    super().__init__()
    self.d_vocab_size = d_vocab_size
    self.e_vocab_size = e_vocab_size
    self.encoder = biEncoder_lstm(e_vocab_size,emb_size,hid_size_e,dropout,out=hid_size_d)
    self.decoder = biDecoder_lstm(d_vocab_size,emb_size,hid_size_d,dropout)
    self.device = device

  def forward(self,src,target,teacher_forcing_ratio = 0.5):
    batch_size = target.shape[1]
    len = target.shape[0]

    output = torch.zeros(len,batch_size,self.d_vocab_size).to(self.device)
    h,c = self.encoder(src)

    input = target[0,:]
    for i in range(1,len):
      out,(h,c) = self.decoder(input,h,c)
      output[i] = out
      force = random.random() < teacher_forcing_ratio
      if force: input = target[i]
      else: input = out.argmax(1) 
    
    return output

In [None]:
bilstm_model = biseq2seq_lstm(device,e_vocab_size,h_vocab_size,emb_size=512,hid_size_e=512,hid_size_d=512).to(device)
bilstm_model.apply(init_weights)

biseq2seq_lstm(
  (encoder): biEncoder_lstm(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(512, 512, bidirectional=True)
    (out_c): Linear(in_features=1024, out_features=512, bias=True)
    (out_h): Linear(in_features=1024, out_features=512, bias=True)
  )
  (decoder): biDecoder_lstm(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(512, 512)
    (out): Linear(in_features=512, out_features=8196, bias=True)
  )
)

In [None]:
parameters_count(bilstm_model)

19950596

In [None]:
optimizer = optim.Adam(bilstm_model.parameters())
loss_fn = nn.CrossEntropyLoss(ignore_index = pad_eng)

In [None]:
EPOCHS = 10
best_val = 1000
for epoch in tqdm(range(EPOCHS)):
  start = time.time()
  train_loss = train(bilstm_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(bilstm_model, val_data,loss_fn)  
  end = time.time()

  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(bilstm_model.state_dict(), 'bilstm_8k_model.pt')

 10%|█         | 1/10 [01:10<10:35, 70.63s/it]

time taken by 1 epoch 1 min 10 s
train loss: 5.196 val loss: 5.064


 20%|██        | 2/10 [02:21<09:25, 70.68s/it]

time taken by 2 epoch 1 min 10 s
train loss: 4.628 val loss: 4.822


 30%|███       | 3/10 [03:32<08:15, 70.84s/it]

time taken by 3 epoch 1 min 11 s
train loss: 4.381 val loss: 4.704


 40%|████      | 4/10 [04:43<07:05, 70.96s/it]

time taken by 4 epoch 1 min 11 s
train loss: 4.205 val loss: 4.610


 50%|█████     | 5/10 [05:55<05:55, 71.03s/it]

time taken by 5 epoch 1 min 11 s
train loss: 4.039 val loss: 4.502


 60%|██████    | 6/10 [07:06<04:44, 71.01s/it]

time taken by 6 epoch 1 min 10 s
train loss: 3.884 val loss: 4.449


 70%|███████   | 7/10 [08:16<03:32, 70.91s/it]

time taken by 7 epoch 1 min 10 s
train loss: 3.751 val loss: 4.385


 80%|████████  | 8/10 [09:27<02:21, 70.76s/it]

time taken by 8 epoch 1 min 10 s
train loss: 3.605 val loss: 4.340


 90%|█████████ | 9/10 [10:37<01:10, 70.70s/it]

time taken by 9 epoch 1 min 10 s
train loss: 3.489 val loss: 4.255


100%|██████████| 10/10 [11:48<00:00, 70.83s/it]

time taken by 10 epoch 1 min 10 s
train loss: 3.363 val loss: 4.244





In [None]:
EPOCHS = 5
for epoch in range(EPOCHS):
  start = time.time()
  train_loss = train(bilstm_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(bilstm_model, val_data,loss_fn)  
  end = time.time()

  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(bilstm_model.state_dict(), 'bilstm_8k_model.pt')

time taken by 1 epoch 1 min 11 s
train loss: 2.698 val loss: 4.192
time taken by 2 epoch 1 min 10 s
train loss: 2.604 val loss: 4.217
time taken by 3 epoch 1 min 10 s
train loss: 2.517 val loss: 4.213
time taken by 4 epoch 1 min 10 s
train loss: 2.435 val loss: 4.208
time taken by 5 epoch 1 min 10 s
train loss: 2.343 val loss: 4.265


In [None]:
test_loss = evaluate(bilstm_model,test_data,loss_fn)
print(test_loss)
torch.save(bilstm_model.state_dict(), 'bilstm_8k_model1.pt')
bilstm_model.load_state_dict(torch.load('bilstm_8k_model.pt'))
test_loss = evaluate(bilstm_model,test_data,loss_fn)
print(test_loss)

4.260195258947519
4.147019466987023


In [None]:
def inference_biseq2seq_lstm(model,sentence,eng_vocab,hindi_vocab,max_len=50):
  model.eval()
  sentence = sentence.unsqueeze(1).to(device)
  with torch.no_grad():
    h,c = model.encoder(sentence)
  output = [eng_vocab['<sos>']]
  for i in range(max_len):
    target = torch.tensor([output[-1]],dtype=torch.long).to(device)
    with torch.no_grad():
      out,(h,c) = model.decoder(target,h,c)
    prediction = out.argmax(1).item()
    if prediction == eng_vocab['<eos>']:
      break
    output.append(prediction)
  return output[1:-1]

In [None]:
total_bleu_score_p = 0
total_meteor_score_p = 0
total_bleu_score = 0
total_meteor_score = 0
for i in tqdm(range(len(test_set))):
  output = inference_biseq2seq_lstm(bilstm_model,tokenized_test_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  total_bleu_score += sentence_bleu([test_set[i][1].split(" ")], output.split(" "))
  total_bleu_score_p += sentence_bleu([test_set[i][2].split(" ")], postprocess_eng(output).split(" "))
  total_meteor_score += single_meteor_score(test_set[i][1],output)
  total_meteor_score_p += single_meteor_score(test_set[i][2],postprocess_eng(output,remove_unk=True))

l = len(test_set)
print("\nbleu score {}, bleu score with on actual {}".format(round(total_bleu_score/l,2),round(total_bleu_score_p/l,2)))
print("meteor score {}, meteor score with on actual {}".format(round(total_meteor_score/l,2),round(total_meteor_score_p/l,2)))


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 8289/8289 [00:48<00:00, 170.72it/s]


bleu score 0.03, bleu score with on actual 0.01
meteor score 0.25, meteor score with on actual 0.17





In [None]:
bilstm_result = final_result(bilstm_model,inference_biseq2seq_lstm,sample,hindi_tokenizer,hindi_vocab,eng_vocab)

In [None]:
f = open("answer.txt", "w")
for s in bigru_result:
  f.write(s+"\n")
f.close()

In [None]:
class biEncoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5,out=None):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = LSTM(emb_size,hid_size,bidirectional=True)
    self.out = Linear(2*hid_size,out)
  
  def forward(self,input):
    embedded = self.dropout(self.embedding(input))
    outputs,(h,c) = self.rnn(embedded)
    h = self.out(torch.cat((h[-2,:,:], h[-1,:,:]), dim = 1)).unsqueeze(0)
    return h


class biDecoder(nn.Module):
  def __init__(self,vocab_size,emb_size=256,hid_size=512,dropout=0.5):
    super().__init__()
    self.vocab_size = vocab_size
    self.hid_size = hid_size
    self.embedding = Embedding(vocab_size,emb_size)
    self.dropout = nn.Dropout(dropout)
    self.rnn = GRU(emb_size,hid_size)
    self.out = Linear(hid_size,vocab_size)

  def forward(self,input,h):
    input = input.unsqueeze(0)
    embedded = self.dropout(self.embedding(input))
    output,h = self.rnn(embedded,h)
    output = self.out(output.squeeze(0))
    return output,h

class biseq2seq(nn.Module):
  def __init__(self,device,e_vocab_size,d_vocab_size,emb_size=256,hid_size_e=512,hid_size_d=512,dropout=0.5):
    super().__init__()
    self.d_vocab_size = d_vocab_size
    self.e_vocab_size = e_vocab_size
    self.encoder = biEncoder(e_vocab_size,emb_size,hid_size_e,dropout,out=hid_size_d)
    self.decoder = biDecoder(d_vocab_size,emb_size,hid_size_d,dropout)
    self.device = device

  def forward(self,src,target,teacher_forcing_ratio = 0.5):
    batch_size = target.shape[1]
    len = target.shape[0]

    output = torch.zeros(len,batch_size,self.d_vocab_size).to(self.device)
    h = self.encoder(src)

    input = target[0,:]
    for i in range(1,len):
      out,h = self.decoder(input,h)
      output[i] = out
      force = random.random() < teacher_forcing_ratio
      if force: input = target[i]
      else: input = out.argmax(1) 
    
    return output

In [None]:
bilstm_gru_model = biseq2seq(device,e_vocab_size,h_vocab_size,emb_size=512,hid_size_e=512,hid_size_d=512).to(device)
bilstm_gru_model.apply(init_weights)

biseq2seq(
  (encoder): biEncoder(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(512, 512, bidirectional=True)
    (out): Linear(in_features=1024, out_features=512, bias=True)
  )
  (decoder): biDecoder(
    (embedding): Embedding(8196, 512)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): GRU(512, 512)
    (out): Linear(in_features=512, out_features=8196, bias=True)
  )
)

In [None]:
optimizer = optim.Adam(bilstm_gru_model.parameters())
loss_fn = nn.CrossEntropyLoss(ignore_index = pad_eng)

In [None]:
parameters_count(bilstm_gru_model)

18900484

In [None]:
EPOCHS = 10
best_val = 10000
for epoch in tqdm(range(EPOCHS)):
  start = time.time()
  train_loss = train(bilstm_gru_model, train_data, optimizer,loss_fn)
  val_loss = evaluate(bilstm_gru_model, val_data,loss_fn)  
  end = time.time()

  min,s = get_time(start,end)
  print("time taken by {} epoch {} min {} s".format(epoch+1,min,s))
  print("train loss: {:.3f} val loss: {:.3f}".format(train_loss,val_loss))
  if val_loss<best_val:
    best_val = val_loss
    torch.save(bilstm_gru_model.state_dict(), 'bilstm_gru_8k_model.pt')

 10%|█         | 1/10 [01:09<10:26, 69.59s/it]

time taken by 1 epoch 1 min 9 s
train loss: 5.064 val loss: 4.771


 20%|██        | 2/10 [02:18<09:16, 69.52s/it]

time taken by 2 epoch 1 min 9 s
train loss: 4.274 val loss: 4.476


 30%|███       | 3/10 [03:28<08:06, 69.51s/it]

time taken by 3 epoch 1 min 9 s
train loss: 3.889 val loss: 4.288


 40%|████      | 4/10 [04:38<06:57, 69.61s/it]

time taken by 4 epoch 1 min 9 s
train loss: 3.559 val loss: 4.132


 50%|█████     | 5/10 [05:48<05:48, 69.71s/it]

time taken by 5 epoch 1 min 9 s
train loss: 3.261 val loss: 4.047


 60%|██████    | 6/10 [06:58<04:39, 69.80s/it]

time taken by 6 epoch 1 min 9 s
train loss: 3.007 val loss: 4.008


 70%|███████   | 7/10 [08:07<03:28, 69.63s/it]

time taken by 7 epoch 1 min 9 s
train loss: 2.785 val loss: 4.000


 80%|████████  | 8/10 [09:16<02:18, 69.42s/it]

time taken by 8 epoch 1 min 8 s
train loss: 2.605 val loss: 4.042


 90%|█████████ | 9/10 [10:25<01:09, 69.44s/it]

time taken by 9 epoch 1 min 9 s
train loss: 2.439 val loss: 4.065


100%|██████████| 10/10 [11:34<00:00, 69.49s/it]

time taken by 10 epoch 1 min 9 s
train loss: 2.317 val loss: 4.098





In [None]:
test_loss = evaluate(bilstm_gru_model,test_data,loss_fn)
print(test_loss)
torch.save(bilstm_gru_model.state_dict(), 'bilstm_gru_8k_model1.pt')
bilstm_gru_model.load_state_dict(torch.load('bilstm_gru_8k_model.pt'))
test_loss = evaluate(bilstm_gru_model,test_data,loss_fn)
print(test_loss)

4.117301885898296
4.030121905987079


In [None]:
total_bleu_score_p = 0
total_meteor_score_p = 0
total_meteor_score_unk = 0
total_bleu_score = 0
total_meteor_score = 0
for i in tqdm(range(len(test_set))):
  output = inference_biseq2seq(bilstm_gru_model,tokenized_test_data[i][0],eng_vocab,hindi_vocab)
  output = " ".join([eng_vocab.itos[t] for t in output])
  total_bleu_score += sentence_bleu([test_set[i][1].split(" ")], output.split(" "))
  total_bleu_score_p += sentence_bleu([test_set[i][2].split(" ")], postprocess_eng(output).split(" "))
  total_meteor_score += single_meteor_score(test_set[i][1],output)
  total_meteor_score_p += single_meteor_score(test_set[i][2],postprocess_eng(output))
  total_meteor_score_unk += single_meteor_score(test_set[i][2],postprocess_eng(output,remove_unk=True))

l = len(test_set)
print("\nbleu score {:.4f}, bleu score with on actual {:.4f}".format(total_bleu_score/l,total_bleu_score_p/l))
print("meteor score {:.4f}, meteor score with on actual {:.4f}".format(total_meteor_score/l,total_meteor_score_p/l))
print("meteor score removing unk {:.3f}".format(total_meteor_score_unk/l))

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
100%|██████████| 8289/8289 [00:50<00:00, 162.54it/s]


bleu score 0.0356, bleu score with on actual 0.0073
meteor score 0.2689, meteor score with on actual 0.1819
meteor score removing unk 0.182



