<a href="https://colab.research.google.com/github/Dhanush-adk/intro_to_dl/blob/main/Assignment_5/Assignment_5_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install ipython-autotime
%load_ext autotime

Collecting ipython-autotime
  Downloading ipython_autotime-0.3.2-py2.py3-none-any.whl (7.0 kB)
Collecting jedi>=0.16 (from ipython->ipython-autotime)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, ipython-autotime
Successfully installed ipython-autotime-0.3.2 jedi-0.19.1
time: 361 µs (started: 2024-04-26 18:48:19 +00:00)


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.model_selection import train_test_split
from torchsummary import summary

time: 4.56 s (started: 2024-04-26 18:48:19 +00:00)


In [3]:

text = '''Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.

At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.

One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.

Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.

Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.

In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology.'''

time: 382 µs (started: 2024-04-26 18:48:24 +00:00)


In [4]:
# Preparing the dataset for sequence prediction
max_length = 10  # Maximum length of input sequences
sequences = [text[i:i + max_length] for i in range(len(text) - max_length)]
labels = [text[i + max_length] for i in range(len(text) - max_length)]

time: 1.22 ms (started: 2024-04-26 18:48:24 +00:00)


In [5]:
# Creating character vocabulary
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

time: 1.49 ms (started: 2024-04-26 18:48:24 +00:00)


In [6]:
X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype = torch.long)

y = torch.tensor([char_to_ix[label] for label in labels], dtype = torch.long)

time: 38.3 ms (started: 2024-04-26 18:48:24 +00:00)


In [7]:
# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 100)

time: 24.1 ms (started: 2024-04-26 18:48:24 +00:00)


In [8]:
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output


time: 718 µs (started: 2024-04-26 18:48:24 +00:00)


In [19]:
hidden_size = 128
num_layers = 3
nhead = 2
learning_rate = 0.01
epochs = 100

time: 481 µs (started: 2024-04-26 18:53:59 +00:00)


In [20]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

time: 603 µs (started: 2024-04-26 18:54:00 +00:00)


In [21]:
X_train = X_train.to(device)
y_train = y_train.to(device)
X_val = X_val.to(device)
y_val = y_val.to(device)

time: 584 µs (started: 2024-04-26 18:54:00 +00:00)


In [22]:
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

time: 25.6 ms (started: 2024-04-26 18:54:01 +00:00)




In [23]:
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()

    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, train Loss: {loss.item()}, val loss: {val_loss.item()}, val accuracy: {val_accuracy.item()}')


Epoch 10, train Loss: 3.0992724895477295, val loss: 3.0189085006713867, val accuracy: 0.14255765080451965
Epoch 20, train Loss: 3.07077956199646, val loss: 2.9924402236938477, val accuracy: 0.14255765080451965
Epoch 30, train Loss: 3.0612008571624756, val loss: 2.981186628341675, val accuracy: 0.14255765080451965
Epoch 40, train Loss: 3.059032678604126, val loss: 2.979858636856079, val accuracy: 0.14255765080451965
Epoch 50, train Loss: 3.0505378246307373, val loss: 2.982666492462158, val accuracy: 0.14255765080451965
Epoch 60, train Loss: 3.0548887252807617, val loss: 2.9809279441833496, val accuracy: 0.14255765080451965
Epoch 70, train Loss: 3.056021213531494, val loss: 2.981275796890259, val accuracy: 0.14255765080451965
Epoch 80, train Loss: 3.0548524856567383, val loss: 2.9816489219665527, val accuracy: 0.14255765080451965
Epoch 90, train Loss: 3.053952217102051, val loss: 2.9817793369293213, val accuracy: 0.14255765080451965
Epoch 100, train Loss: 3.052248239517212, val loss: 2.9

In [24]:
!pip install torchinfo
import torchinfo
torchinfo.summary(model, input_data=X_train)



Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1904, 45]                --
├─Embedding: 1-1                              [1904, 10, 128]           5,760
├─TransformerEncoder: 1-2                     [1904, 10, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1904, 10, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1904, 10, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1904, 10, 128]           593,024
├─Linear: 1-3                                 [1904, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.03
Input size (MB): 0.15
Forward/backward pass size (MB): 1131.51
Params size (MB): 6.37
Estimated Total Size (MB): 1138.03

time: 4.74 s (started: 2024-04-26 18:54:46 +00:00)


for sequence 20

In [26]:
max_length = 20
sequences = [text[i:i + max_length] for i in range(len(text) - max_length)]
labels = [text[i + max_length] for i in range(len(text) - max_length)]
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype = torch.long)
y = torch.tensor([char_to_ix[label] for label in labels], dtype = torch.long)

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 100)

X_train = X_train.to(device)
y_train = y_train.to(device)
X_val = X_val.to(device)
y_val = y_val.to(device)

model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

time: 29.2 ms (started: 2024-04-26 18:58:46 +00:00)


In [27]:
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()

    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, train Loss: {loss.item()}, val loss: {val_loss.item()}, val accuracy: {val_accuracy.item()}')


Epoch 10, train Loss: 3.09967041015625, val loss: 3.0496561527252197, val accuracy: 0.14526315033435822
Epoch 20, train Loss: 3.059847354888916, val loss: 3.018106460571289, val accuracy: 0.14526315033435822
Epoch 30, train Loss: 3.0574605464935303, val loss: 3.0293846130371094, val accuracy: 0.14526315033435822
Epoch 40, train Loss: 3.0512354373931885, val loss: 3.027135133743286, val accuracy: 0.14526315033435822
Epoch 50, train Loss: 3.0505423545837402, val loss: 3.0262324810028076, val accuracy: 0.14526315033435822
Epoch 60, train Loss: 3.0527420043945312, val loss: 3.022688388824463, val accuracy: 0.14526315033435822
Epoch 70, train Loss: 3.0533649921417236, val loss: 3.021456480026245, val accuracy: 0.14526315033435822
Epoch 80, train Loss: 3.0488805770874023, val loss: 3.021554946899414, val accuracy: 0.14526315033435822
Epoch 90, train Loss: 3.048494338989258, val loss: 3.0223448276519775, val accuracy: 0.14526315033435822
Epoch 100, train Loss: 3.0530264377593994, val loss: 3.

In [28]:
torchinfo.summary(model, input_data = X_train)

Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1896, 45]                --
├─Embedding: 1-1                              [1896, 20, 128]           5,760
├─TransformerEncoder: 1-2                     [1896, 20, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1896, 20, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1896, 20, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1896, 20, 128]           593,024
├─Linear: 1-3                                 [1896, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.02
Input size (MB): 0.30
Forward/backward pass size (MB): 2252.83
Params size (MB): 6.37
Estimated Total Size (MB): 2259.50

time: 29.1 ms (started: 2024-04-26 18:59:31 +00:00)


for sequences 30

In [29]:
max_length = 30
sequences = [text[i:i + max_length] for i in range(len(text) - max_length)]
labels = [text[i + max_length] for i in range(len(text) - max_length)]
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype = torch.long)
y = torch.tensor([char_to_ix[label] for label in labels], dtype = torch.long)

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 100)

X_train = X_train.to(device)
y_train = y_train.to(device)
X_val = X_val.to(device)
y_val = y_val.to(device)

model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

time: 60.3 ms (started: 2024-04-26 19:00:45 +00:00)


In [30]:
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()

    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, train Loss: {loss.item()}, val loss: {val_loss.item()}, val accuracy: {val_accuracy.item()}')


Epoch 10, train Loss: 3.0752315521240234, val loss: 3.165795087814331, val accuracy: 0.12896405160427094
Epoch 20, train Loss: 3.0430662631988525, val loss: 3.134247303009033, val accuracy: 0.12896405160427094
Epoch 30, train Loss: 3.0320165157318115, val loss: 3.150113821029663, val accuracy: 0.12896405160427094
Epoch 40, train Loss: 3.0287702083587646, val loss: 3.135131597518921, val accuracy: 0.12896405160427094
Epoch 50, train Loss: 3.0228207111358643, val loss: 3.1354637145996094, val accuracy: 0.12896405160427094
Epoch 60, train Loss: 3.0268828868865967, val loss: 3.1326334476470947, val accuracy: 0.12896405160427094
Epoch 70, train Loss: 3.024582624435425, val loss: 3.1361446380615234, val accuracy: 0.12896405160427094
Epoch 80, train Loss: 3.0216739177703857, val loss: 3.135178565979004, val accuracy: 0.12896405160427094
Epoch 90, train Loss: 3.0231547355651855, val loss: 3.135715961456299, val accuracy: 0.12896405160427094
Epoch 100, train Loss: 3.021343946456909, val loss: 3

In [31]:
torchinfo.summary(model, input_data = X_train)

Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1888, 45]                --
├─Embedding: 1-1                              [1888, 30, 128]           5,760
├─TransformerEncoder: 1-2                     [1888, 30, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1888, 30, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1888, 30, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1888, 30, 128]           593,024
├─Linear: 1-3                                 [1888, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.01
Input size (MB): 0.45
Forward/backward pass size (MB): 3364.64
Params size (MB): 6.37
Estimated Total Size (MB): 3371.47

time: 23.7 ms (started: 2024-04-26 19:02:03 +00:00)
