# NNLM (Neural Network Language Model)

<div align="center">
  <a href="https://colab.research.google.com/github/Coder-Starcom/NLP/blob/main/NNLM.ipynb" target="_blank">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
  </a>
</div>


## 🧠 What Are We Building?

We're building a **Neural Network Language Model (NNLM)** that learns to predict the **next word** in a sentence based on previous words. This is an example of a **causal language model**, i.e., it only looks at past context (not future words).

Example:

```
Input:  ["i", "like"]
Target: "dog"
```

## 📘 1: Imports & Configuration

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim

### 📘 1. **Imports**

```python
import torch
import torch.nn as nn
import torch.optim as optim
```

We import PyTorch’s modules to define the neural network (`nn`), train it (`optim`), and use tensors (`torch`).

## 📘 2: Data Preparation

In [4]:
def preprocess_sentences(sentences):
    word_list = list(set(" ".join(sentences).split()))
    word_dict = {w: i for i, w in enumerate(word_list)}
    number_dict = {i: w for w, i in word_dict.items()}
    return word_dict, number_dict, len(word_dict)

In [5]:
def make_batch(sentences, word_dict, n_step):
    input_batch = []
    target_batch = []

    for sen in sentences:
        words = sen.split()
        input = [word_dict[n] for n in words[:-1]]
        target = word_dict[words[-1]]
        input_batch.append(input)
        target_batch.append(target)

    return torch.LongTensor(input_batch), torch.LongTensor(target_batch)

### 📘 2. **Data Preprocessing**

#### `preprocess_sentences(sentences)`

```python
def preprocess_sentences(sentences):
    word_list = list(set(" ".join(sentences).split()))
    word_dict = {w: i for i, w in enumerate(word_list)}
    number_dict = {i: w for w, i in word_dict.items()}
    return word_dict, number_dict, len(word_dict)
```

* **Input:** List of sentences, e.g., `["i like dog", "i love coffee", "i hate milk"]`
* **Goal:** Create a vocabulary and mappings:

  * `word_dict`: maps word → index
  * `number_dict`: maps index → word
  * `n_class`: size of vocabulary

#### `make_batch(sentences, word_dict, n_step)`

```python
def make_batch(sentences, word_dict, n_step):
    ...
```

This prepares the training data:

* For each sentence like `"i like dog"`, we split into:

  * **Input:** `["i", "like"]` → `[index_i, index_like]`
  * **Target:** `"dog"` → `index_dog`

All inputs are stored in `input_batch`, and targets in `target_batch`, which are converted to PyTorch `LongTensors`.

## 📘 3: Define the NNLM Model

In [6]:
class NNLM(nn.Module):
    def __init__(self, n_class, m, n_hidden, n_step):
        super(NNLM, self).__init__()
        self.n_step = n_step
        self.m = m
        self.n_class = n_class

        self.C = nn.Embedding(n_class, m)
        self.H = nn.Linear(n_step * m, n_hidden, bias=False)
        self.d = nn.Parameter(torch.ones(n_hidden))
        self.U = nn.Linear(n_hidden, n_class, bias=False)
        self.W = nn.Linear(n_step * m, n_class, bias=False)
        self.b = nn.Parameter(torch.ones(n_class))

    def forward(self, X):
        X = self.C(X)                          # [batch_size, n_step, m]
        X = X.view(-1, self.n_step * self.m)   # [batch_size, n_step * m]
        tanh = torch.tanh(self.d + self.H(X))  # [batch_size, n_hidden]
        output = self.b + self.W(X) + self.U(tanh)  # [batch_size, n_class]
        return output

### 📘 3. **Model Definition: `NNLM`**

```python
class NNLM(nn.Module):
```

This class defines the neural network. The architecture follows:

#### 🔢 Inputs:

* A batch of token indices representing `(n-1)` words: shape `[batch_size, n_step]`

#### 🔍 Layers:

1. `self.C = nn.Embedding(n_class, m)`

   * Learn an **embedding vector** of size `m` for each word.
   * Output shape: `[batch_size, n_step, m]`

2. `self.H = nn.Linear(n_step * m, n_hidden)`

   * A hidden layer (linear transform) with no bias.

3. `self.d = nn.Parameter(torch.ones(n_hidden))`

   * A learnable bias added after `H`.

4. `self.U = nn.Linear(n_hidden, n_class)`

   * Projects hidden layer to output logits.

5. `self.W = nn.Linear(n_step * m, n_class)`

   * Another projection directly from embeddings to logits.

6. `self.b = nn.Parameter(torch.ones(n_class))`

   * Learnable bias for final output.

#### 📤 Forward Pass

```python
X = self.C(X)                 # embeddings
X = X.view(-1, n_step * m)    # flatten
tanh = torch.tanh(self.d + self.H(X)) 
output = self.b + self.W(X) + self.U(tanh)
```

This equation is taken from the NNLM paper:

```
y = b + W·x + U·tanh(H·x + d)
```

Where:

* `x` is the concatenated embeddings of previous words
* `y` is the unnormalized score for each possible next word (logits)
* The `softmax` is handled internally in `CrossEntropyLoss`

## 📘 4: Hyperparameters & Dataset

In [7]:
# Sample sentences
sentences = ['i like chess',
 'i want art',
 'i want exercise',
 'i feel excited',
 'i prefer math',
 'i build dog',
 'i like dog',
 'i cook workout',
 'i solve sleep',
 'i like dinner',
 'i enjoy dog',
 'i solve dinner',
 'i build reading',
 'i love football',
 'i practice guitar',
 'i want tea',
 'i write workout',
 'i drink dinner',
 'i need icecream',
 'i feel happy',
 'i solve game',
 'i solve math',
 'i play water',
 'i study dog',
 'i watch puzzles',
 'i play notes',
 'i play icecream',
 'i prefer movies',
 'i try milk',
 'i write icecream',
 'i love task',
 'i try movies',
 'i eat chess',
 'i build coffee',
 'i need workout',
 'i eat tea',
 'i like water',
 'i watch code',
 'i feel bored',
 'i build notes',
 'i practice cricket',
 'i prefer code',
 'i write icecream',
 'i love smart',
 'i prefer guitar',
 'i feel bored',
 'i write cricket',
 'i feel hungry',
 'i try coffee',
 'i hate icecream',
 'i love projects',
 'i eat puzzles',
 'i like projects',
 'i build projects',
 'i practice cricket',
 'i feel sleepy',
 'i write exercise',
 'i enjoy sleep',
 'i play math',
 'i want math',
 'i drink sleep',
 'i enjoy exercise',
 'i want python',
 'i am happy',
 'i enjoy notes',
 'i prefer sleep',
 'i practice art',
 'i am tired',
 'i write movies',
 'i eat workout',
 'i need game',
 'i practice python',
 'i try sleep',
 'i drink exercise',
 'i write workout',
 'i study tea',
 'i prefer tea',
 'i need notes',
 'i cook projects',
 'i watch tea',
 'i hate movies',
 'i want code',
 'i drink dinner',
 'i try exercise',
 'i study pictures',
 'i hate workout',
 'i cook projects',
 'i eat football',
 'i enjoy python',
 'i try art',
 'i play music',
 'i love football',
 'i play reading',
 'i hate icecream',
 'i drink sleep',
 'i need pictures',
 'i am lazy',
 'i try puzzles',
 'i love workout',
 'i watch movies',
 'i solve math',
 'i eat chess',
 'i play task',
 'i drink chess',
 'i feel lazy',
 'i practice milk',
 'i feel happy',
 'i want dinner',
 'i build cricket',
 'i love notes',
 'i cook icecream',
 'i love sleep',
 'i feel confused',
 'i am tired',
 'i feel tired',
 'i play cricket',
 'i build notes',
 'i write dog',
 'i write water',
 'i play cricket',
 'i prefer tea',
 'i drink music',
 'i hate pictures',
 'i prefer water',
 'i need cricket',
 'i practice dinner',
 'i prefer pizza',
 'i hate reading',
 'i watch sleep',
 'i need game',
 'i eat cricket',
 'i prefer smart',
 'i prefer python',
 'i cook puzzles',
 'i enjoy dinner',
 'i eat reading',
 'i hate puzzles',
 'i solve cricket',
 'i want math',
 'i feel hungry',
 'i cook dinner',
 'i love smart',
 'i need milk',
 'i drink projects',
 'i hate music',
 'i play coffee',
 'i write dinner',
 'i watch workout',
 'i love pictures',
 'i like art',
 'i cook code',
 'i solve puzzles',
 'i write task',
 'i practice sleep',
 'i try notes',
 'i am bored',
 'i enjoy music',
 'i write chess',
 'i want guitar',
 'i love exercise',
 'i want task',
 'i hate projects',
 'i love milk',
 'i enjoy cats',
 'i want code',
 'i love art',
 'i enjoy task',
 'i like math',
 'i practice dinner',
 'i feel hungry',
 'i study cats',
 'i watch chess',
 'i watch milk',
 'i want guitar',
 'i love sleep',
 'i write music',
 'i need art',
 'i solve tea',
 'i write cats',
 'i like exercise',
 'i cook smart',
 'i want puzzles',
 'i like guitar',
 'i am lazy',
 'i feel hungry',
 'i love guitar',
 'i study cats',
 'i solve projects',
 'i enjoy math',
 'i feel bored',
 'i write milk',
 'i eat notes',
 'i enjoy pictures',
 'i need water',
 'i practice puzzles',
 'i practice cricket',
 'i practice workout',
 'i solve dinner',
 'i want art',
 'i feel tired',
 'i watch music',
 'i hate tea',
 'i watch chess',
 'i practice chess',
 'i love pictures',
 'i study cats',
 'i prefer reading',
 'i cook guitar',
 'i drink chess',
 'i write dinner',
 'i write pictures',
 'i want coffee',
 'i practice task',
 'i try tea',
 'i need math',
 'i drink sleep',
 'i cook icecream',
 'i eat projects',
 'i write dinner',
 'i prefer smart',
 'i write puzzles',
 'i hate movies',
 'i practice guitar',
 'i love exercise',
 'i build music',
 'i drink python',
 'i study dog',
 'i practice smart',
 'i feel energetic',
 'i am excited',
 'i want football',
 'i watch icecream',
 'i practice guitar',
 'i like math',
 'i hate notes',
 'i prefer smart',
 'i need reading',
 'i am tired',
 'i am energetic',
 'i play notes',
 'i am confused',
 'i am energetic',
 'i play coffee',
 'i try workout',
 'i feel tired',
 'i try art',
 'i practice exercise',
 'i watch math',
 'i solve music',
 'i am energetic',
 'i want art',
 'i eat exercise',
 'i play icecream',
 'i play notes',
 'i study guitar',
 'i prefer football',
 'i play dog',
 'i want cats',
 'i build milk',
 'i want task',
 'i need pictures',
 'i need code',
 'i need projects',
 'i solve smart',
 'i drink milk',
 'i am lazy',
 'i want milk',
 'i write art',
 'i eat movies',
 'i study dinner',
 'i love projects',
 'i love movies',
 'i love projects',
 'i enjoy math',
 'i practice tea',
 'i solve cats',
 'i hate cricket',
 'i feel happy',
 'i prefer task',
 'i want pictures',
 'i need dog',
 'i study tea',
 'i love notes',
 'i practice code',
 'i am tired',
 'i am energetic',
 'i solve dog',
 'i enjoy sleep',
 'i hate math',
 'i am energetic',
 'i prefer python',
 'i solve pizza',
 'i solve game',
 'i hate chess',
 'i love dog',
 'i want cricket',
 'i drink dog',
 'i hate task',
 'i cook coffee',
 'i am lazy',
 'i watch cats',
 'i watch projects',
 'i cook game',
 'i practice puzzles',
 'i feel sleepy',
 'i drink dinner',
 'i enjoy pizza',
 'i enjoy workout',
 'i try cats',
 'i solve movies',
 'i need notes',
 'i build exercise',
 'i write chess',
 'i write exercise',
 'i like task',
 'i want cricket',
 'i practice milk',
 'i study projects',
 'i am excited',
 'i feel sleepy',
 'i study music',
 'i watch puzzles',
 'i practice football',
 'i solve chess',
 'i am tired',
 'i am happy',
 'i cook tea',
 'i enjoy milk',
 'i solve puzzles',
 'i cook cats',
 'i build puzzles',
 'i study art',
 'i enjoy movies',
 'i play music',
 'i want music',
 'i love pictures',
 'i solve task',
 'i love notes',
 'i solve guitar',
 'i try python',
 'i like reading',
 'i feel excited',
 'i watch movies',
 'i hate sleep',
 'i write workout',
 'i practice notes',
 'i watch smart',
 'i like coffee',
 'i feel confused',
 'i need movies',
 'i love chess',
 'i need pictures',
 'i prefer cats',
 'i am tired',
 'i study python',
 'i try task',
 'i write chess',
 'i drink puzzles',
 'i love workout',
 'i love projects',
 'i write smart',
 'i am excited',
 'i build cricket',
 'i want math',
 'i like coffee',
 'i solve workout',
 'i practice dinner',
 'i hate math',
 'i write exercise',
 'i drink python',
 'i prefer football',
 'i hate chess',
 'i practice game',
 'i try game',
 'i build chess',
 'i play code',
 'i want projects',
 'i eat icecream',
 'i build reading',
 'i prefer water',
 'i eat projects',
 'i enjoy milk',
 'i feel confused',
 'i hate reading',
 'i build reading',
 'i build exercise',
 'i prefer smart',
 'i eat smart',
 'i prefer code',
 'i feel happy',
 'i play task',
 'i write cricket',
 'i solve workout',
 'i watch movies',
 'i play projects',
 'i love task',
 'i practice game',
 'i eat dinner',
 'i practice coffee',
 'i play task',
 'i hate exercise',
 'i love projects',
 'i prefer code',
 'i write icecream',
 'i drink movies',
 'i prefer pictures',
 'i solve projects',
 'i hate music',
 'i want water',
 'i prefer dinner',
 'i cook milk',
 'i study football',
 'i write python',
 'i prefer cricket',
 'i play water',
 'i solve cats',
 'i write art',
 'i need movies',
 'i watch reading',
 'i enjoy chess',
 'i like reading',
 'i like dog',
 'i study sleep',
 'i try water',
 'i write pizza',
 'i am tired',
 'i am energetic',
 'i need exercise',
 'i try python',
 'i am happy',
 'i feel tired',
 'i like football',
 'i need sleep',
 'i love football',
 'i like guitar',
 'i love exercise',
 'i feel bored',
 'i love task',
 'i need cats',
 'i enjoy pizza',
 'i play smart',
 'i love coffee',
 'i practice workout',
 'i cook dinner',
 'i eat coffee',
 'i watch code',
 'i need guitar',
 'i want cricket',
 'i prefer art',
 'i need workout',
 'i prefer game',
 'i watch icecream',
 'i build sleep',
 'i want movies',
 'i write game',
 'i write tea',
 'i am bored',
 'i cook football',
 'i build workout',
 'i build milk',
 'i like football',
 'i am sleepy',
 'i watch football',
 'i solve dog',
 'i eat milk',
 'i write workout',
 'i solve sleep',
 'i write pizza',
 'i solve notes',
 'i drink task',
 'i prefer workout',
 'i build projects',
 'i try milk',
 'i want code',
 'i am energetic',
 'i like chess',
 'i write art',
 'i enjoy dog',
 'i feel lazy',
 'i study football',
 'i like math',
 'i want python',
 'i eat smart',
 'i solve coffee',
 'i write exercise',
 'i build code',
 'i play milk',
 'i eat cats',
 'i eat notes',
 'i study smart',
 'i like movies',
 'i need notes',
 'i watch art',
 'i try sleep',
 'i feel confused',
 'i practice dog',
 'i eat task',
 'i enjoy cats',
 'i drink exercise',
 'i study icecream']

In [8]:
# Parameters
n_step = 2        # number of steps (n-1)
n_hidden = 2      # hidden layer size
m = 2             # embedding size

# Preprocess
word_dict, number_dict, n_class = preprocess_sentences(sentences)
input_batch, target_batch = make_batch(sentences, word_dict, n_step)

### 📘 4. **Dataset & Hyperparameters**

```python
sentences = ["i like dog", "i love coffee", "i hate milk"]
n_step = 2        # how many previous words to consider
n_hidden = 2      # size of hidden layer
m = 2             # embedding size
```

We use 3 example sentences to keep the vocabulary small and interpretable.

## 📘 5: Train the Model

In [9]:
model = NNLM(n_class=n_class, m=m, n_hidden=n_hidden, n_step=n_step)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [10]:
def train(model, input_batch, target_batch, epochs=5000, print_every=1000):
    for epoch in range(1, epochs + 1):
        optimizer.zero_grad()
        output = model(input_batch)
        loss = criterion(output, target_batch)
        loss.backward()
        optimizer.step()

        if epoch % print_every == 0:
            print(f"Epoch {epoch:04d} | Loss: {loss.item():.6f}")

train(model, input_batch, target_batch)

Epoch 1000 | Loss: 3.186145
Epoch 2000 | Loss: 3.077170
Epoch 3000 | Loss: 3.034629
Epoch 4000 | Loss: 2.990668
Epoch 5000 | Loss: 2.973706


### 📘 5. **Training the Model**

```python
def train(model, input_batch, target_batch, epochs=5000, print_every=1000):
    ...
```

This function:

* Trains the model using **Adam optimizer** and **CrossEntropyLoss**
* At each epoch:

  * Forward pass: predict next words
  * Compute loss between predictions and true target indices
  * Backpropagate gradients
  * Update weights

The loss should decrease over time as the model learns the training data.

## 📘 6: Prediction & Test

In [11]:
def predict(model, input_batch, number_dict, sentences):
    model.eval()
    with torch.no_grad():
        predict = model(input_batch).data.max(1, keepdim=True)[1]
    input_words = [sen.split()[:2] for sen in sentences]
    predicted_words = [number_dict[n.item()] for n in predict.squeeze()]
    return input_words, predicted_words

inputs, outputs = predict(model, input_batch, number_dict, sentences)
print(inputs, '->', outputs)


[['i', 'like'], ['i', 'want'], ['i', 'want'], ['i', 'feel'], ['i', 'prefer'], ['i', 'build'], ['i', 'like'], ['i', 'cook'], ['i', 'solve'], ['i', 'like'], ['i', 'enjoy'], ['i', 'solve'], ['i', 'build'], ['i', 'love'], ['i', 'practice'], ['i', 'want'], ['i', 'write'], ['i', 'drink'], ['i', 'need'], ['i', 'feel'], ['i', 'solve'], ['i', 'solve'], ['i', 'play'], ['i', 'study'], ['i', 'watch'], ['i', 'play'], ['i', 'play'], ['i', 'prefer'], ['i', 'try'], ['i', 'write'], ['i', 'love'], ['i', 'try'], ['i', 'eat'], ['i', 'build'], ['i', 'need'], ['i', 'eat'], ['i', 'like'], ['i', 'watch'], ['i', 'feel'], ['i', 'build'], ['i', 'practice'], ['i', 'prefer'], ['i', 'write'], ['i', 'love'], ['i', 'prefer'], ['i', 'feel'], ['i', 'write'], ['i', 'feel'], ['i', 'try'], ['i', 'hate'], ['i', 'love'], ['i', 'eat'], ['i', 'like'], ['i', 'build'], ['i', 'practice'], ['i', 'feel'], ['i', 'write'], ['i', 'enjoy'], ['i', 'play'], ['i', 'want'], ['i', 'drink'], ['i', 'enjoy'], ['i', 'want'], ['i', 'am'], ['i',

### 📘 6. **Prediction & Evaluation**

```python
def predict(model, input_batch, number_dict, sentences):
    ...
```

* We compute the `argmax` of the model's output to get predicted indices.
* Then we map those back to words using `number_dict`.
* Finally, we print the input pairs (like `"i like"`) and their predicted next word (`"dog"`).

# Output

### ✅ Example Output

Given:

```python
sentences = ["i like dog", "i love coffee", "i hate milk"]
```

The model should learn:

* `"i like"` → `"dog"`
* `"i love"` → `"coffee"`
* `"i hate"` → `"milk"`

And return:

```python
[['i', 'like'], ['i', 'love'], ['i', 'hate']] -> ['dog', 'coffee', 'milk']
```

If this is achieved, your NNLM has successfully **memorized** this toy dataset and mapped patterns to predictions.

## 🧠 Summary of Learning

| Concept           | Description                                          |
| ----------------- | ---------------------------------------------------- |
| Word Embedding    | Maps words to continuous vector space                |
| Language Modeling | Predict next word based on context                   |
| PyTorch Model     | Custom NN architecture with embedding + hidden layer |
| Backpropagation   | Update parameters to minimize prediction loss        |
| Inference         | Predict the most likely next word for given input    |

In [12]:
def custom_predict(model, input_text, word_dict, number_dict, n_step):
    model.eval()
    words = input_text.strip().split()

    # Sanity check
    if len(words) != n_step:
        print(f"❌ Please enter exactly {n_step} words.")
        return

    # Convert to indices
    try:
        input_ids = [word_dict[word] for word in words]
    except KeyError as e:
        print(f"❌ Word not in vocabulary: {e}")
        return

    input_tensor = torch.LongTensor([input_ids])  # shape [1, n_step]

    with torch.no_grad():
        output = model(input_tensor)
        predicted_idx = output.data.max(1, keepdim=True)[1].item()
        predicted_word = number_dict[predicted_idx]
    
    print(f"📝 Input : {words}")
    print(f"🔮 Prediction : {predicted_word}")


In [32]:
custom_predict(model, "i study", word_dict, number_dict, n_step)

📝 Input : ['i', 'study']
🔮 Prediction : cats
