# Neural Machine Translation: Urdu to Roman Urdu
## BiLSTM Encoder-Decoder Architecture

**Assignment**: Project1 - Neural Machine Translation (15 Abs)  
**Objective**: Build a sequence-to-sequence model using BiLSTM encoder-decoder to translate Urdu text into Roman Urdu transliteration.

**Architecture**:
- Encoder: 2-layer Bidirectional LSTM
- Decoder: 4-layer LSTM
- Custom BPE Tokenization (implemented from scratch)

**Dataset**: urdu_ghazals_rekhta - Classical Urdu poetry with Roman transliterations


## 1. Setup and Dependencies


In [None]:
# Install required packages
%pip install torch torchtext nltk sacrebleu editdistance streamlit
%pip install matplotlib seaborn tqdm pandas numpy


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import re
import os
import json
import pickle
from collections import Counter, defaultdict
import random
from typing import List, Tuple, Dict, Any
import warnings
warnings.filterwarnings('ignore')

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)


## 2. Clone Dataset and Download Required Files


In [None]:
# Clone the dataset repository
!git clone https://github.com/amir9ume/urdu_ghazals_rekhta.git

# Change to work with local dataset path
import os
if os.path.exists('/content/urdu_ghazals_rekhta/dataset'):
    dataset_path = '/content/urdu_ghazals_rekhta/dataset'
elif os.path.exists('urdu_ghazals_rekhta/dataset'):
    dataset_path = 'urdu_ghazals_rekhta/dataset'
else:
    dataset_path = 'dataset/dataset'  # Local path

print(f"Dataset path: {dataset_path}")
print("Setup completed!")
