# WITHOUT ATTENTION (Note Book #1)

---
### Declaration

In this project, I've utilized various online resources to deepen my understanding of important concepts, refine my code structure, and appropriately recognize these resources.

### Significant resources that have influenced my assignment greneral DL :

- **Lecture Slides:** 
  - Prof. Mitesh Khapra's course CS6910 - Fundamentals of Deep Learning.  
    [CS6910 - Fundamentals of Deep Learning](http://www.cse.iitm.ac.in/~miteshk/CS6910.html)
- **YouTube Lectures:** 
  - Prof. Mitesh Khapra's DeepLearning course lectures on deep learning fundamentals.  
    [DeepLearning course lectures](https://www.youtube.com/playlist?list=PLyqSpQzTE6M9gCgajvQbc68Hk_JKGBAYT)
- **Official Documentation:** 
  - Python NumPy and wandb.ai.  
    [Python Documentation](https://docs.python.org/3/)  
    [NumPy Documentation](https://numpy.org/doc/)  
    [wandb.ai Documentation](https://docs.wandb.ai/tutorials)
- **GitHub Repositories:** 
  - Open-source code repositories relevant to deep learning and neural networks.  
    [Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning)  
    [Awesome Artificial Intelligence](https://github.com/owainlewis/awesome-artificial-intelligence)
- **Academic Research:** 
  - Papers from arXiv/academic journals providing theoretical insights and recent advancements in deep learning.
    1. **"Adam: A Method for Stochastic Optimization" by Kingma and Ba (2014)**  
       [arXiv:1412.6980](https://arxiv.org/abs/1412.6980)
    2. **"On the Convergence of Adam and Beyond" by Reddi, Kale, and Kumar (2018)**  
       [arXiv:1904.09237](https://arxiv.org/abs/1904.09237)
    3. **"Averaged Stochastic Gradient Descent with Weight Dropped Convergence Rate" by Junchi Li, Fadime Sener, and Vladlen Koltun (2021)**  
       [arXiv:2106.01409](https://arxiv.org/abs/2106.01409)
- **Online Forums:** 
  - Reddit's r/MachineLearning and r/deeplearning for discussions and knowledge sharing.  
    [r/MachineLearning](https://www.reddit.com/r/MachineLearning/)  
    [r/deeplearning](https://www.reddit.com/r/deeplearning/)
- **Coursera Courses:** 
  - Andrew Ng's ML Specialization and DL Specialization on Coursera.  
    [Machine Learning Specialization](https://www.deeplearning.ai/courses/machine-learning-specialization/)  
    [Deep Learning Specialization](https://www.deeplearning.ai/courses/deep-learning-specialization/)
- **Additional Resources:** 
  - [Optimization in Deep Learning: AdaGrad, RMSProp, Adam](https://artemoppermann.com/optimization-in-deep-learning-adagrad-rmsprop-adam/)
  - [Difference between RMSprop with momentum and Adam optimizers](https://datascience.stackexchange.com/questions/26792/difference-between-rmsprop-with-momentum-and-adam-optimizers)
  - [Optimization Techniques in Deep Learning](https://blogs.brain-mentors.com/optimization-techniques-in-deep-learning/)
  - [An overview of gradient descent optimization algorithms by Sebastian Ruder](https://www.ruder.io/optimizing-gradient-descent/)

### Additionally, I've referred to the following resources for This specific topics:


- **Understanding LSTM Networks by Christopher Olah (blog post):**
  - A clear and intuitive explanation of Long Short-Term Memory (LSTM) networks, with insightful visualizations.  
    [Link](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

- **Attention Is All You Need by Vaswani et al. (research paper):**
  - This paper introduced the Transformer architecture, which revolutionized NLP by relying solely on attention mechanisms.  
    [Link](https://arxiv.org/abs/1706.03762)

- **Neural Machine Translation by Jointly Learning to Align and Translate by Bahdanau, Cho, and Bengio (research paper):**
  - Proposed the first widely-used attention mechanism for neural machine translation, significantly improving performance over previous approaches.  
    [Link](https://arxiv.org/abs/1409.0473)

- **The Illustrated Transformer by Jay Alammar (blog post):**
  - A visually engaging and easy-to-follow guide to the Transformer architecture, explaining its components and how they work together.  
    [Link](http://jalammar.github.io/illustrated-transformer/)

- **Deep Learning for Natural Language Processing (Course 5: Sequence Models) by Andrew Ng (Coursera):**
  - Course materials from Andrew Ng's Sequence Models course on Coursera, providing foundational knowledge in deep learning.  
    [Coursera](https://www.coursera.org/learn/nlp-sequence-models)


---

### Imprort library

In [1]:
####importing necessary packages#######
import pandas as pd
import numpy as np
import torch
from torch.utils.data import TensorDataset
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm
import wandb

### Installing Wandb

- Wandb is used to keep track of various experiments performed and for efficient logging while doing hyperparameter tuning. 
- The report for this project is also created using wandb

In [2]:
# Initializing a new Weights & Biases run with the specified project name
project_name='Without attention'
wandb.login(key="116333c8f36584386af5e16706d08ce3fc4d59df")
wandb.init(project=project_name)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


[34m[1mwandb[0m: Currently logged in as: [33mahmecse[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/ge22m009/.netrc


# Training and Validation Data Preparation

**References**:

1. [ documentation on LSTM Seq2Seq](https://.io/examples/nlp/lstm_seq2seq/)
2. [Machine Learning Mastery: Define Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation in Keras](https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-/)
3. [Stack Overflow: How can I prepare data for a Seq2Seq model?](https://stackoverflow.com/questions/59035736/how-can-i-do-prepare-data-for-a-seq2seq-model)
4. [Analytics Vidhya: A Simple Introduction to Sequence-to-Sequence Models](https://www.analyticsvidhya.com/blog/2020/08/a-simple-introduction-to-sequence-to-sequence-models/)
5. [Towards Data Science: How to Implement Seq2Seq LSTM Model in ](https://towardsdatascience.com/how-to-implement-seq2seq-lstm-model-in--shortcutnlp-6f355f3e5639)

In [3]:
train_data = pd.read_csv('/home/ge22m009/Data/aksharantar_sampled/mal/mal_train.csv', header=None, names=['English', 'Malayalam'])
val_data = pd.read_csv('/home/ge22m009/Data/aksharantar_sampled/mal/mal_valid.csv', header=None, names=['English', 'Malayalam'])
test_data = pd.read_csv('/home/ge22m009/Data/aksharantar_sampled/mal/mal_test.csv', header=None, names=['English', 'Malayalam'])


In [4]:
eng_train = train_data['English']  # Extract English training data
mal_train = train_data['Malayalam']  # Extract Malayalam training data
eng_val = val_data['English']  # Extract English validation data
mal_val = val_data['Malayalam']  # Extract Malayalam validation data
eng_test = test_data['English']  # Extract English test data
mal_test = test_data['Malayalam']  # Extract Malayalam test data
english_all = pd.concat([eng_train, eng_val, eng_test], ignore_index=True)  # Concatenate all English data
malayalam_all = pd.concat([mal_train, mal_val, mal_test], ignore_index=True)  # Concatenate all Malayalam data


In [5]:
def unique_char(lan):
    unique_cha = ''  # Initialize an empty string to store unique characters
    for word in lan:  # Iterate through each word in the list
        for char in word:  # Iterate through each character in the word
            if char not in unique_cha:  # Check if the character is not already in the unique_cha string
                unique_cha += char  # Append the unique character to the string
    return unique_cha  # Return the string containing all unique characters


In [6]:
class LANG:
    def __init__(self, lang):
        self.lang = lang  # Initialize the language data
        self.word2index = {}  # Dictionary to map characters to indices
        self.index2word = {0: 'SOS', 1: 'EOS'}  # Dictionary to map indices to characters
        self.max_length = 0  # Variable to store the length of the longest word
        self.count = 2  # Counter for indexing, starting after SOS and EOS
        self.max_word = ''  # Variable to store the longest word

    def addchar(self):
        for word in self.lang:  # Iterate through each word in the language data
            length = len(word)  # Get the length of the current word
            if length > self.max_length:  # Check if the current word is the longest so far
                self.max_length = length  # Update the maximum word length
                self.max_word = word  # Update the longest word
            for char in word:  # Iterate through each character in the word
                if char not in self.word2index:  # Check if the character is not already indexed
                    self.word2index[char] = self.count  # Assign the current count as the index
                    self.index2word[self.count] = char  # Map the current count to the character
                    self.count += 1  # Increment the counter
        return self.word2index, self.index2word, self.max_length, self.max_word  # Return the mappings and max values


In [7]:
# Initialize language objects for English and Malayalam
eng = LANG(english_all)
# Get mappings and max values for English
eng_word2index, eng_index2word, eng_maxlength, eng_word = eng.addchar()
access_eng = eng_word2index, eng_index2word, eng_maxlength, eng_word

mal = LANG(malayalam_all)
# Get mappings and max values for Malayalam
mal_word2index, mal_index2word, mal_maxlength, mal_word = mal.addchar()
access_mal = mal_word2index, mal_index2word, mal_maxlength, mal_word

In [8]:
def Tensorpair(eng, mal, access_eng, access_mal):
    # Unpack access_eng and access_mal
    eng_word2index, eng_index2word, eng_maxlength, eng_word = access_eng
    mal_word2index, mal_index2word, mal_maxlength, mal_word = access_mal
    
    n = len(eng)
    # Initialize tensors for input and target sequences
    input_ids = torch.zeros((n, eng_maxlength + 1), dtype=torch.int32)
    target_ids = torch.zeros((n, eng_maxlength + 1), dtype=torch.int32)
    
    for i, (eng_word, mal_word) in enumerate(zip(eng, mal)):
        try:
            # Convert characters to indices using word2index mappings
            input_indx = [eng_word2index[char] for char in eng_word]
            input_indx.append(1)  # Append EOS token index
            input_indx = torch.tensor(input_indx, dtype=torch.long)
            
            target_indx = [mal_word2index[char] for char in mal_word]
            target_indx.append(1)  # Append EOS token index
            target_indx = torch.tensor(target_indx, dtype=torch.long)
            
            # Update input and target tensors
            input_ids[i, :len(input_indx)] = input_indx
            target_ids[i, :len(target_indx)] = target_indx
        except Exception as e:
            print(e)  # Print any exception that occurs
    
    # Create a TensorDataset from input and target tensors
    tensor_data = TensorDataset(input_ids, target_ids)
    
    return tensor_data


In [9]:
# Generating tensor pairs for training, validation, and testing data
train_data=Tensorpair(eng_train,mal_train,access_eng,access_mal)
val_data=Tensorpair(eng_val,mal_val,access_eng,access_mal)
test_data=Tensorpair(eng_test,mal_test,access_eng,access_mal)

In [10]:
# Setting batch size and creating data loaders for training, validation, and testing
batch_size=64
train_loader=DataLoader(dataset=train_data,batch_size=batch_size,shuffle=True,drop_last=True)
val_loader=DataLoader(dataset=val_data,batch_size=batch_size,drop_last=True)
test_loader=DataLoader(dataset=test_data,batch_size=batch_size,drop_last=True)

# Cell functions

**References**

1. [LSTM Seq2Seq Example](https://keras.io/examples/nlp/lstm_seq2seq/)
2. [Define Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation in ](https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/)
3. [How to Implement Seq2Seq LSTM Model in ](https://towardsdatascience.com/how-to-implement-seq2seq-lstm-model-in-keras-shortcutnlp-6f355f3e5639)
4. [A Practical Guide to RNN and LSTM in ](https://towardsdatascience.com/a-practical-guide-to-rnn-and-lstm-in-keras-980f176271bc)
5. [LSTM Layer Explained for Beginners with Example](https://machinelearningknowledge.ai/keras-lstm-layer-explained-for-beginners-with-example/)

For Attention:

1. [Attention Layers Documentation](https://keras.io/api/layers/attention_layers/)
2. [A Beginner's Guide to Using Attention Layer in Neural Networks](https://analyticsindiamag.com/a-beginners-guide-to-using-attention-layer-in-neural-networks/)
3. [Google Colab Notebook: Neural Machine Translation with Attention](https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/nmt_with_attention.ipynb?hl=fi)
4. [Kaggle Discussion: Attention Mechanism](https://www.kaggle.com/questions-and-answers/279309)
5. [LSTM Layer Explained for Beginners with Example](https://machinelearningknowledge.ai/keras-lstm-layer-explained-for-beginners-with-example/)


In [11]:
class Encoder(nn.Module):
    def __init__(self, hidden_size=512, num_layers=2, drop_out=0.2, embedding_size=256, bidirection=False, model='LSTM'):
        super().__init__()
        # Initialize the embedding layer
        self.embedding = nn.Embedding(len(eng_index2word), embedding_size)
        # Define the parameters
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.drop_out = drop_out
        # Choose the RNN model (LSTM/GRU/RNN)
        if model == 'LSTM':
            self.model = nn.LSTM(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        elif model == 'GRU':
            self.model = nn.GRU(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        elif model == 'RNN':
            self.model = nn.RNN(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        else:
            raise ValueError('Given model is not found')
        # Apply dropout regularization
        self.drop_out = nn.Dropout(p=drop_out)
    
    def forward(self, input):
        # Embed the input sequence
        embedded = self.drop_out(self.embedding(input))
        # Pass the embedded sequence through the RNN model
        output, hidden = self.model(embedded)
        return output, hidden


# Inferencing model

**References:**

1. [LSTM Seq2Seq Example](https://keras.io/examples/nlp/lstm_seq2seq/)
2. [Define Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation  ](https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/)
3. [Seq2Seq Part D: Encoder-Decoder with Teacher Forcing](https://medium.com/deep-learning-with-keras/seq2seq-part-d-encoder-decoder-with-teacher-forcing-18a3a09a096)
4. [Kaggle Notebook: Seq2Seq RNN Models (Attention + Teacher Forcing)](https://www.kaggle.com/code/residentmario/seq-to-seq-rnn-models-attention-teacher-forcing/notebook)
5. [What is Teacher Forcing?](https://towardsdatascience.com/what-is-teacher-forcing-3da6217fed1c)


In [12]:
class Decoder(nn.Module):
    def __init__(self, hidden_size=512, num_layers=2, drop_out=0.2, embedding_size=256, bidirection=False, model='LSTM'):
        super().__init__()
        # Initialize the embedding layer
        self.embedding = nn.Embedding(len(mal_index2word), embedding_size)
        # Define the parameters
        self.hidden_size = hidden_size
        self.num_layer = num_layers
        self.drop_out = drop_out
        # Choose the RNN model (LSTM/GRU/RNN)
        if model == 'LSTM':
            self.model = nn.LSTM(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        elif model == 'GRU':
            self.model = nn.GRU(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        elif model == 'RNN':
            self.model = nn.RNN(embedding_size, hidden_size, num_layers, bidirectional=bidirection, batch_first=True)
        else:
            raise ValueError('Given model is not found')
        # Apply dropout regularization
        self.drop_out = nn.Dropout(p=drop_out)
        # Define the output layer
        if bidirection:
            self.out = nn.Linear(2 * hidden_size, len(mal_index2word))
        else:
            self.out = nn.Linear(hidden_size, len(mal_index2word))
            
    def forward(self, input, hidden):
        # Embed the input sequence
        embedded = self.drop_out(self.embedding(input))
        # Pass the embedded sequence through the RNN model
        output, hidden = self.model(embedded, hidden)
        # Predict the output
        pred = self.out(output)
        return pred, hidden


In [13]:
class seq2seq(nn.Module):
    def __init__(self, encoder, decoder):
        super().__init__()
        # Initialize the encoder and decoder
        self.encoder = encoder
        self.decoder = decoder
    
    def forward(self, inputs, targets, teacher_force_ratio):
        # Pass the input sequence through the encoder
        encoder_output, encoder_hidden = self.encoder(inputs)
        # Initialize the decoder input with zeros
        decoder_input = torch.empty(targets.shape[0], 1, dtype=torch.long, device='cuda').fill_(0)
        decoder_hidden = encoder_hidden
        # Initialize the output tensor
        output = torch.zeros((targets.shape[0], targets.shape[1], len(mal_index2word)), device='cuda')
        
        for i in range(output.shape[1]):
            # Pass the decoder input and hidden state through the decoder
            pred, decoder_hidden = self.decoder(decoder_input, decoder_hidden)
            pred = torch.squeeze(pred)
            # Store the predicted output
            output[:, i, :] = pred
            # Update the decoder input for the next time step
            best_guess = torch.argmax(pred, axis=-1).view(-1, 1)
            decoder_input = best_guess if np.random.rand() > teacher_force_ratio else targets[:, i].view(-1, 1)
            # Keep the decoder hidden state unchanged
            decoder_hidden = decoder_hidden
        return output


In [14]:
def word_finder_eng(eng_):
    # Initialize an empty list to store English words
    full = []
    for eng in eng_:
        # Find the index of the end-of-sequence token (EOS)
        eng_eos = np.where(eng == 1)[0][0]
        # Convert the numerical representation to an English word
        eng_word = num_word(eng[0:eng_eos], eng_index2word)
        full.append(eng_word)
    return np.array(full)

def word_finder_mal(mal_):
    # Initialize an empty list to store Malayalam words
    full = []
    for mal in mal_:
        # Find the index of the end-of-sequence token (EOS) if present, else use the length of the sequence
        mal_eos = np.where(mal == 1)[0][0] if 1 in mal else len(mal)
        # Convert the numerical representation to a Malayalam word
        mal_word = num_word(mal[0:mal_eos], mal_index2word)
        full.append(mal_word)
    return np.array(full)

def num_word(number, converter):
    # Convert a sequence of numerical representations to a word using the provided converter
    number = np.array(number)
    word = ''.join(converter[num] for num in number)
    return word


In [24]:
import numpy as np

def word_finder_eng(eng_):
    # Initialize an empty list to store English words
    full = []
    for eng in eng_:
        # Find the index of the end-of-sequence token (EOS) if present, else use the length of the sequence
        eng_eos_indices = np.where(eng == 1)[0]
        eng_eos = eng_eos_indices[0] if len(eng_eos_indices) > 0 else len(eng)
        # Convert the numerical representation to an English word
        eng_word = num_word(eng[:eng_eos], eng_index2word)
        full.append(eng_word)
    return np.array(full)

def word_finder_mal(mal_):
    # Initialize an empty list to store Malayalam words
    full = []
    for mal in mal_:
        # Find the index of the end-of-sequence token (EOS) if present, else use the length of the sequence
        mal_eos_indices = np.where(mal == 1)[0]
        mal_eos = mal_eos_indices[0] if len(mal_eos_indices) > 0 else len(mal)
        # Convert the numerical representation to a Malayalam word
        mal_word = num_word(mal[:mal_eos], mal_index2word)
        full.append(mal_word)
    return np.array(full)

def num_word(number, converter):
    # Convert a sequence of numerical representations to a word using the provided converter
    number = np.array(number)
    word = ''.join(converter[num] for num in number)
    return word


In [25]:
def train_model(epochs=30, hidden_size=512, num_layers=3, encoder_drop_out=0.2, decoder_drop_out=0.2, embedding_size=256,
                bidirection=False, model='LSTM', lr=1e-3, optimizer_='adam', teacher_force_ratio=0.5):
    # Set hyperparameters
    epochs = epochs
    hidden_size = hidden_size
    num_layers = num_layers
    encoder_drop_out = encoder_drop_out
    decoder_drop_out = decoder_drop_out
    embedding_size = embedding_size
    bidirection = bidirection
    model = model
    lr = lr
    optimizer_ = optimizer_
    teacher_force_ratio = teacher_force_ratio

    # Initialize encoder and decoder
    encoder = Encoder(hidden_size=hidden_size, num_layers=num_layers, drop_out=encoder_drop_out, embedding_size=embedding_size,
                      bidirection=bidirection, model=model)
    decoder = Decoder(hidden_size=hidden_size, num_layers=num_layers, drop_out=decoder_drop_out, embedding_size=embedding_size,
                      bidirection=bidirection, model=model)
    # Create seq2seq model
    model = seq2seq(encoder, decoder)
    model = model.to('cuda')

    # Define loss function
    lossfun = nn.CrossEntropyLoss()

    # Choose optimizer
    if optimizer_ == 'adam':
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    elif optimizer_ == 'sgd':
        optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    else:
        raise ValueError('Optimizer not found')

    # Initialize lists to store training and validation metrics
    train_loss, train_acc, val_loss, val_acc = [], [], [], []

    # Main training loop
    for i in range(epochs):
        # Initialize lists to store current epoch metrics
        train_loss_curr, train_acc_curr, val_loss_curr, val_acc_curr = [], [], [], []

        # Set model to training mode
        model.train()

        # Iterate over training data
        for eng, mal in tqdm(train_loader):
            eng = eng.to('cuda')
            mal = mal.to('cuda')

            # Forward pass
            pred = model(eng, mal, teacher_force_ratio=teacher_force_ratio)
            pred_loss = pred.reshape(-1, pred.shape[2])
            mal_loss = mal.reshape(-1,).long()
            loss = lossfun(pred_loss, mal_loss)

            # Compute and store training loss
            train_loss_curr.append(loss.cpu().detach().numpy())

            # Backpropagation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Compute training accuracy
            mal_ground = word_finder_mal(mal.cpu().detach().numpy())
            pred_acc = np.argmax(pred.cpu().detach().numpy(), axis=-1).astype(np.int64)
            mal_predicted = word_finder_mal(pred_acc)
            acc = np.sum(mal_ground == mal_predicted) / len(mal_ground)
            train_acc_curr.append(acc)

        # Set model to evaluation mode
        model.eval()

        # Iterate over validation data
        for eng, mal in val_loader:
            eng = eng.to('cuda')
            mal = mal.to('cuda')

            # Forward pass (no teacher forcing)
            with torch.no_grad():
                pred = model(eng, mal, teacher_force_ratio=0)

            pred_loss = pred.reshape(-1, pred.shape[2])
            mal_loss = mal.reshape(-1,).long()

            # Compute and store validation loss
            loss = lossfun(pred_loss, mal_loss)
            val_loss_curr.append(loss.cpu().detach().numpy())

            # Compute validation accuracy
            mal_ground = word_finder_mal(mal.cpu().detach().numpy())
            pred_acc = np.argmax(pred.cpu().detach().numpy(), axis=-1)
            mal_predicted = word_finder_mal(pred_acc)
            acc = np.sum(mal_ground == mal_predicted) / len(mal_ground)
            val_acc_curr.append(acc)

        # Compute average metrics for the epoch
        train_loss.append(np.average(train_loss_curr))
        val_loss.append(np.average(val_loss_curr))
        train_acc.append(np.average(train_acc_curr))
        val_acc.append(np.average(val_acc_curr))

        # Log metrics using wandb
        wandb.log({"Train_Accuracy": np.round(train_acc[i] * 100, 2), "Train_Loss": train_loss[i],
                   "Val_Accuracy": np.round(val_acc[i] * 100, 2), "Val_Loss": val_loss[i], "Epoch": i + 1})


In [16]:
# Configuration for hyperparameter tuning using Bayesian optimization method
sweep_config = {
    'method': 'bayes',  # Bayesian optimization method
    'metric': {
        'name': 'Val_Accuracy',  # Metric to optimize
        'goal': 'maximize'  # Goal: maximize the validation accuracy
    },
}

# Define the parameters to tune
parameters_dict = {
    'epochs': {
        'values': [10, 15]  # Number of epochs
    },
    'hidden_size': {
        'values': [32, 64, 256, 512]  # Hidden size of the model
    },
    'num_layers': {
        'values': [1, 2, 3]  # Number of layers in the encoder and decoder
    },
    'encoder_drop_out': {
        'values': [0.2, 0.3, 0.5]  # Dropout probability in the encoder
    },
    'decoder_drop_out': {
        'values': [0.2, 0.3, 0.5]  # Dropout probability in the decoder
    },
    'embedding_size': {
        'values': [32, 64, 256, 512]  # Size of the embedding layer
    },
    'bidirectional': {
        'values': [True, False]  # Whether to use bidirectional RNNs or not
    },
    'model': {
        'values': ['LSTM', 'GRU', 'RNN']  # Type of RNN model
    },
    'lr': {
        'values': [1e-3, 1e-4, 1e-5]  # Learning rate
    },
    'teacher_force_ratio': {
        'values': [0.2, 0.3, 0.4, 0.5]  # Teacher forcing ratio during training
    },
    'optimizer': {
        'values': ['adam', 'sgd']  # Optimizer to use
    }
}

sweep_config['parameters'] = parameters_dict  # Add parameters to the sweep configuration


In [17]:
def train():
    # Initialize Weights and Biases
    wandb.init()
    # Retrieve hyperparameters from the configuration
    config = wandb.config
    # Set a name for the run based on hyperparameters
    wandb.run.name = "hidden_" + str(config.hidden_size) + "_layer_" + str(config.num_layers) + "_embedd_" + str(config.embedding_size) + "_bidirect_" + str(config.bidirectional) + "_model_" + str(config.model)
    
    # Train the model with the specified hyperparameters
    parameters = train_model(epochs=config.epochs, hidden_size=config.hidden_size, num_layers=config.num_layers,
                             encoder_drop_out=config.encoder_drop_out, decoder_drop_out=config.decoder_drop_out,
                             embedding_size=config.embedding_size, bidirection=config.bidirectional, model=config.model,
                             lr=config.lr, optimizer_=config.optimizer, teacher_force_ratio=config.teacher_force_ratio)
    
    # Finish the Weights and Biases run
    wandb.finish()


In [18]:
# wandb.init()
# sweep_id=wandb.sweep(sweep_config,project=project_name)
# wandb.agent(sweep_id,train)
# wandb.finish()

In [19]:
# import wandb
wandb.init()
sweep_id = 'pxmwdyhc'  # Replace with your actual sweep ID
wandb.agent(sweep_id, function=train, project=project_name)  
wandb.finish()


VBox(children=(Label(value='0.002 MB of 0.014 MB uploaded\r'), FloatProgress(value=0.13920434752316588, max=1.…

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112677888660175, max=1.0…



VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>


[34m[1mwandb[0m: Agent Starting Run: paxk11ye with config:
[34m[1mwandb[0m: 	bidirectional: True
[34m[1mwandb[0m: 	decoder_drop_out: 0.2
[34m[1mwandb[0m: 	embedding_size: 256
[34m[1mwandb[0m: 	encoder_drop_out: 0.2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_size: 512
[34m[1mwandb[0m: 	lr: 0.0001
[34m[1mwandb[0m: 	model: RNN
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_force_ratio: 0.4


  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

VBox(children=(Label(value='0.002 MB of 0.006 MB uploaded\r'), FloatProgress(value=0.29687964338781575, max=1.…

0,1
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Train_Accuracy,▁▁▁▂▃▃▄▅▅▆▆▇▇██
Train_Loss,█▅▄▃▃▃▂▂▂▂▁▁▁▁▁
Val_Accuracy,▁▂▃▄▅▆▆▆▇▇▇████
Val_Loss,█▆▄▃▃▂▂▂▂▁▂▁▁▁▁

0,1
Epoch,15.0
Train_Accuracy,41.6
Train_Loss,0.19656
Val_Accuracy,35.84
Val_Loss,0.43544


[34m[1mwandb[0m: Agent Starting Run: 0bhb7mvj with config:
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	decoder_drop_out: 0.3
[34m[1mwandb[0m: 	embedding_size: 32
[34m[1mwandb[0m: 	encoder_drop_out: 0.2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_size: 32
[34m[1mwandb[0m: 	lr: 0.0001
[34m[1mwandb[0m: 	model: RNN
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	teacher_force_ratio: 0.2


  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Train_Accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train_Loss,██▇▆▅▄▃▂▂▁▁▁▁▁▁
Val_Accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Val_Loss,█▇▇▆▅▄▃▂▂▁▁▁▁▁▁

0,1
Epoch,15.0
Train_Accuracy,0.0
Train_Loss,2.50478
Val_Accuracy,0.0
Val_Loss,2.01726


[34m[1mwandb[0m: Agent Starting Run: 1ot1zlrj with config:
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	decoder_drop_out: 0.2
[34m[1mwandb[0m: 	embedding_size: 32
[34m[1mwandb[0m: 	encoder_drop_out: 0.2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_size: 32
[34m[1mwandb[0m: 	lr: 0.0001
[34m[1mwandb[0m: 	model: LSTM
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_force_ratio: 0.4


  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Train_Accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train_Loss,█▃▂▂▂▂▂▂▂▂▁▁▁▁▁
Val_Accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Val_Loss,█▄▃▂▂▂▂▁▁▁▁▂▁▁▁

0,1
Epoch,15.0
Train_Accuracy,0.0
Train_Loss,1.33172
Val_Accuracy,0.0
Val_Loss,1.12017


[34m[1mwandb[0m: Agent Starting Run: avieagqb with config:
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	decoder_drop_out: 0.5
[34m[1mwandb[0m: 	embedding_size: 32
[34m[1mwandb[0m: 	encoder_drop_out: 0.5
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_size: 256
[34m[1mwandb[0m: 	lr: 0.0001
[34m[1mwandb[0m: 	model: GRU
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_force_ratio: 0.5


  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

  0%|          | 0/800 [00:00<?, ?it/s]

VBox(children=(Label(value='0.002 MB of 0.014 MB uploaded\r'), FloatProgress(value=0.13306236673773988, max=1.…

0,1
Epoch,▁▂▃▃▄▅▆▆▇█
Train_Accuracy,▁▁▁▁▁▁▁▃▅█
Train_Loss,█▆▅▅▄▃▂▂▁▁
Val_Accuracy,▁▁▁▁▁▂▃▄▅█
Val_Loss,█▇▆▅▄▄▃▂▂▁

0,1
Epoch,10.0
Train_Accuracy,0.88
Train_Loss,0.68474
Val_Accuracy,4.44
Val_Loss,0.70101


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


In [27]:
correct_count=0
for eng,mal in train_data:
    eng_word=word_finder_eng(eng)
    mal_word=word_finder_mal(mal)
    print(eng_word)
    print(mal_word)
    eng=eng.unsqueeze(0).to('cuda')
    mal=mal.unsqueeze(0).to('cuda')
    
    pred=model(eng,mal,teacher_force_ratio=0)
    pred=torch.squeeze(pred).cpu().detach().numpy()
    pred=np.argmax(pred,axis=-1)
    predicted=word_finder_mal(pred)
    if predicted==mal_word:
        correct_count+=1
    count+=1
    print(predicted)
    print("")
    
    

In [31]:
def word_finder_eng(eng):
    eng_eos=np.where(eng==1)[0][0]
    eng_word=num_word(eng[0:eng_eos],eng_index2word)
    return eng_word
def word_finder_mal(mal):
    mal_eos=np.where(mal==1)[0][0]
    mal_word=num_word(mal[0:mal_eos],mal_index2word)
    return mal_word
def num_word(number,converter):
    number=np.array(number)
    word=''.join(converter[num]for num in number)
    return word

In [33]:
train_model(epochs=15,hidden_size=512,num_layers=2,encoder_drop_out=0.5,decoder_drop_out=0.2,embedding_size=32,
    bidirection=True,model='LSTM',lr=0.001,optimizer_='adam',teacher_force_ratio=0.5)

In [34]:
def train_model(epochs=30,hidden_size=512,num_layers=3,encoder_drop_out=0.2,decoder_drop_out=0.2,embedding_size=256,
                bidirection=False,model='LSTM',lr=1e-3,optimizer_='adam',teacher_force_ratio=0.5):
    epochs=epochs
    hidden_size=hidden_size
    num_layers=num_layers
    encoder_drop_out=encoder_drop_out
    decoder_drop_out=decoder_drop_out
    embedding_size=embedding_size
    bidirection=bidirection
    model=model
    lr=lr
    optimizer_=optimizer_
    teacher_force_ratio=teacher_force_ratio

    encoder=Encoder(hidden_size=hidden_size,num_layers=num_layers,drop_out=encoder_drop_out,embedding_size=embedding_size,
                    bidirection=bidirection,model=model)
    decoder=Decoder(hidden_size=hidden_size,num_layers=num_layers,drop_out=decoder_drop_out,embedding_size=embedding_size,
                    bidirection=bidirection,model=model)
    model=seq2seq(encoder,decoder)
    model=model.to('cuda')
    lossfun=nn.CrossEntropyLoss()
    if optimizer_=='adam':
        optimizer=torch.optim.Adam(model.parameters(),lr=lr)
    elif optimizer_=='sgd':
        optimizer=torch.optim.SGD(model.parameters(),lr=lr)
    else:
        raise ValueError('Optimizer not found')
    train_loss,train_acc,val_loss,val_acc=[],[],[],[]
    curr_best_acc=0
    for i in range(epochs):
        train_loss_curr,train_acc_curr,val_loss_curr,val_acc_curr=[],[],[],[]
        model.train()
        for eng,mal in tqdm(train_loader):
            eng=eng.to('cuda')
            mal=mal.to('cuda')
            pred=model(eng,mal,teacher_force_ratio=teacher_force_ratio)
            #pred=nn.Softmax(dim=-1)(pred)
            pred_loss=pred.reshape(-1,pred.shape[2])
            mal_loss=mal.reshape(-1,).long()
            loss=lossfun(pred_loss,mal_loss)

            train_loss_curr.append(loss.cpu().detach().numpy())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            mal_ground=word_finder_mal(mal.cpu().detach().numpy())
            pred_acc=np.argmax(pred.cpu().detach().numpy(),axis=-1).astype(np.int64)
            mal_predicted=word_finder_mal(pred_acc)
            acc=np.sum(mal_ground==mal_predicted)/len(mal_ground)
            train_acc_curr.append(acc)
        model.eval()
        for eng,mal in test_loader:
            eng=eng.to('cuda')
            mal=mal.to('cuda')
            with torch.no_grad():
                pred=model(eng,mal,teacher_force_ratio=0)
            #pred=nn.Softmax(dim=-1)(pred)
            pred_loss=pred.reshape(-1,pred.shape[2])
            mal_loss=mal.reshape(-1,).long()

            loss=lossfun(pred_loss,mal_loss)
            val_loss_curr.append(loss.cpu().detach().numpy())


            mal_ground=word_finder_mal(mal.cpu().detach().numpy())
            pred_acc=np.argmax(pred.cpu().detach().numpy(),axis=-1)

            mal_predicted=word_finder_mal(pred_acc)
            acc=np.sum(mal_ground==mal_predicted)/len(mal_ground)
            val_acc_curr.append(acc)
        train_loss.append(np.average(train_loss_curr))
        val_loss.append(np.average(val_loss_curr))
        train_acc.append(np.average(train_acc_curr))
        val_acc.append(np.average(val_acc_curr))
        if val_acc[i]>curr_best_acc:
            torch.save(model.state_dict(),'weight_without_attention.pth')
            curr_best_acc=val_acc[i]
            print(f'epochs{i} weights saved')
        print(f'Epochs {i} completed, train loss and accuracy ={train_loss[i],train_acc[i]}' 
          f',and val loss and accuracy ={val_loss[i],val_acc[i]} ')
        '''
        wandb.log({"Train_Accuracy":np.round(train_acc[i]*100,2),"Train_Loss":train_loss[i],
                   "Val_Accuracy":np.round(val_acc[i]*100,2),"Val_Loss":val_loss[i],"Epoch":i+1})
        '''
    