<a href="https://colab.research.google.com/github/dhanush852/intro_to-ml/blob/main/Homework5_1_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install ipython-autotime
%load_ext autotime

Collecting ipython-autotime
  Downloading ipython_autotime-0.3.2-py2.py3-none-any.whl (7.0 kB)
Collecting jedi>=0.16 (from ipython->ipython-autotime)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, ipython-autotime
Successfully installed ipython-autotime-0.3.2 jedi-0.19.1
time: 374 µs (started: 2024-04-27 20:24:14 +00:00)


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.model_selection import train_test_split
from torchsummary import summary

time: 4.48 s (started: 2024-04-27 20:24:14 +00:00)


In [3]:

text = '''Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.

At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.

One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.

Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.

Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.

In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology.'''

time: 5.51 ms (started: 2024-04-27 20:24:18 +00:00)


In [4]:
# Define the maximum length for each input sequence
max_length = 10
sequences = [text[start_idx:start_idx + max_length] for start_idx in range(len(text) - max_length)]
labels = [text[start_idx + max_length] for start_idx in range(len(text) - max_length)]


time: 8.63 ms (started: 2024-04-27 20:24:18 +00:00)


In [5]:
# Creating character vocabulary
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

time: 571 µs (started: 2024-04-27 20:24:18 +00:00)


In [6]:
X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype = torch.long)

y = torch.tensor([char_to_ix[label] for label in labels], dtype = torch.long)

time: 35.9 ms (started: 2024-04-27 20:24:18 +00:00)


In [7]:
# Creating character vocabulary
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

time: 550 µs (started: 2024-04-27 20:24:18 +00:00)


In [14]:
# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 100)

time: 42.1 ms (started: 2024-04-27 20:26:13 +00:00)


In [15]:
import torch.nn as nn

class CharTransformer(nn.Module):
    def __init__(self, vocab_size, hidden_dim, num_classes, num_layers, num_heads):
        super(CharTransformer, self).__init__()
        # Embedding layer to convert input indices to vectors
        self.embedding = nn.Embedding(vocab_size, hidden_dim)
        # Encoder layer for the transformer
        encoder_layer = nn.TransformerEncoderLayer(hidden_dim, num_heads)
        # Transformer encoder
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers)
        # Fully connected layer to map transformer outputs to class scores
        self.fc_out = nn.Linear(hidden_dim, num_classes)

    def forward(self, input_indices):
        # Convert indices to embeddings
        embeddings = self.embedding(input_indices)
        # Pass embeddings through the transformer encoder
        transformer_out = self.transformer_encoder(embeddings)
        # Output the transformed last token for prediction
        final_output = self.fc_out(transformer_out[:, -1, :])
        return final_output


time: 737 µs (started: 2024-04-27 20:26:15 +00:00)


In [29]:
hidden_dim = 128
num_layers = 3       # Number of layers in the model
num_heads = 2        # Number of attention heads in the transformer model
learning_rate = 0.01  # Learning rate for training the model
epoch_count = 100     # Total number of training epochs


time: 813 µs (started: 2024-04-27 20:39:58 +00:00)


In [30]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

time: 527 µs (started: 2024-04-27 20:40:00 +00:00)


In [31]:
# Moving training and validation data to the specified device (e.g., GPU or CPU)
datasets = (X_train, y_train, X_val, y_val) = [tensor.to(device) for tensor in (X_train, y_train, X_val, y_val)]


time: 890 µs (started: 2024-04-27 20:40:01 +00:00)


In [33]:
import torch.nn as nn
import torch.optim as optim

model = CharTransformer(len(chars),  hidden_dim, len(chars), num_layers, num_heads)

model = model.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)




time: 2.07 s (started: 2024-04-27 20:40:20 +00:00)


In [34]:
for epoch in range(100):  # Train for 100 epochs
    model.train()  # Set the model to training mode
    optimizer.zero_grad()  # Clear gradients before each step

    train_output = model(X_train)
    train_loss = criterion(train_output, y_train)
    train_loss.backward()  # Backpropagation
    optimizer.step()  # Update model parameters


    model.eval()
    with torch.no_grad():  # Disable gradient calculation for evaluation
        val_output = model(X_val)  # Forward pass with validation data
        val_loss = criterion(val_output, y_val)  # Calculate loss
        _, predictions = torch.max(val_output, 1)  # Predictions
        val_accuracy = (predictions == y_val).float().mean()  # Calculate accuracy

    # Log performance every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch {epoch + 1}, Train Loss: {train_loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')


Epoch 10, Train Loss: 3.0886178016662598, Validation Loss: 3.038475275039673, Validation Accuracy: 0.1446540802717209
Epoch 20, Train Loss: 3.0617117881774902, Validation Loss: 3.0223729610443115, Validation Accuracy: 0.1446540802717209
Epoch 30, Train Loss: 3.052079916000366, Validation Loss: 3.0179507732391357, Validation Accuracy: 0.1446540802717209
Epoch 40, Train Loss: 3.0544931888580322, Validation Loss: 3.01273250579834, Validation Accuracy: 0.1446540802717209
Epoch 50, Train Loss: 3.0529589653015137, Validation Loss: 3.013514518737793, Validation Accuracy: 0.1446540802717209
Epoch 60, Train Loss: 3.0458195209503174, Validation Loss: 3.013235569000244, Validation Accuracy: 0.1446540802717209
Epoch 70, Train Loss: 3.0453591346740723, Validation Loss: 3.0126118659973145, Validation Accuracy: 0.1446540802717209
Epoch 80, Train Loss: 3.0419070720672607, Validation Loss: 3.01328182220459, Validation Accuracy: 0.1446540802717209
Epoch 90, Train Loss: 3.0490903854370117, Validation Los

In [35]:
!pip install torchinfo
import torchinfo
torchinfo.summary(model, input_data=X_train)

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1904, 45]                --
├─Embedding: 1-1                              [1904, 10, 128]           5,760
├─TransformerEncoder: 1-2                     [1904, 10, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1904, 10, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1904, 10, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1904, 10, 128]           593,024
├─Linear: 1-3                                 [1904, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.03
Input size (MB): 0.15
Forward/backward pass size (MB): 1131.51
Params size (MB): 6.37
Estimated Total Size (MB): 1138.03

time: 5.44 s (started: 2024-04-27 20:44:17 +00:00)


sequence =**20**

In [37]:
max_length = 20

# Extract sequences and corresponding labels from the text
sequences = [text[i:i + max_length] for i in range(len(text) - max_length)]
labels = [text[i + max_length] for i in range(len(text) - max_length)]

# Generate a unique set of characters and map each character to an index
chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}

# Convert sequences and labels into tensors of indices
X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype=torch.long)
y = torch.tensor([char_to_ix[label] for label in labels], dtype=torch.long)

# Split the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=100)

# Move tensors to the specified device
X_train = X_train.to(device)
y_train = y_train.to(device)
X_val = X_val.to(device)
y_val = y_val.to(device)

# Initialize the CharTransformer model with predefined settings
model = CharTransformer(len(chars),  hidden_dim, len(chars), num_layers, num_heads)
model = model.to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

time: 38.3 ms (started: 2024-04-27 20:48:16 +00:00)




In [38]:
for epoch in range(100):
    model.train()
    optimizer.zero_grad()

    train_output = model(X_train)
    train_loss = criterion(train_output, y_train)
    train_loss.backward()  # Backpropagation
    optimizer.step()  # Update model parameters

    model.eval()
    with torch.no_grad():  # Disable gradient calculation for evaluation
        val_output = model(X_val)  # Forward pass with validation data
        val_loss = criterion(val_output, y_val)  # Calculate loss
        _, predictions = torch.max(val_output, 1)  # Predictions
        val_accuracy = (predictions == y_val).float().mean()  # Calculate accuracy

    if (epoch + 1) % 10 == 0:
        print(f'Epoch {epoch + 1}, Train Loss: {train_loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')


Epoch 10, Train Loss: 3.086601495742798, Validation Loss: 3.0331029891967773, Validation Accuracy: 0.14526315033435822
Epoch 20, Train Loss: 3.0611352920532227, Validation Loss: 3.0283470153808594, Validation Accuracy: 0.14526315033435822
Epoch 30, Train Loss: 3.0613150596618652, Validation Loss: 3.025937080383301, Validation Accuracy: 0.14526315033435822
Epoch 40, Train Loss: 3.0593910217285156, Validation Loss: 3.0231242179870605, Validation Accuracy: 0.14526315033435822
Epoch 50, Train Loss: 3.057526111602783, Validation Loss: 3.022125244140625, Validation Accuracy: 0.14526315033435822
Epoch 60, Train Loss: 3.05175518989563, Validation Loss: 3.023834705352783, Validation Accuracy: 0.14526315033435822
Epoch 70, Train Loss: 3.0567452907562256, Validation Loss: 3.024379253387451, Validation Accuracy: 0.14526315033435822
Epoch 80, Train Loss: 3.050891637802124, Validation Loss: 3.023371934890747, Validation Accuracy: 0.14526315033435822
Epoch 90, Train Loss: 3.0479085445404053, Validati

In [39]:
torchinfo.summary(model, input_data = X_train)

Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1896, 45]                --
├─Embedding: 1-1                              [1896, 20, 128]           5,760
├─TransformerEncoder: 1-2                     [1896, 20, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1896, 20, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1896, 20, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1896, 20, 128]           593,024
├─Linear: 1-3                                 [1896, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.02
Input size (MB): 0.30
Forward/backward pass size (MB): 2252.83
Params size (MB): 6.37
Estimated Total Size (MB): 2259.50

time: 13.2 ms (started: 2024-04-27 20:52:14 +00:00)


sequence = 30

In [41]:
max_length = 30


sequences = [text[i:i + max_length] for i in range(len(text) - max_length)]
labels = [text[i + max_length] for i in range(len(text) - max_length)]


chars = sorted(list(set(text)))
char_to_ix = {ch: i for i, ch in enumerate(chars)}


X = torch.tensor([[char_to_ix[ch] for ch in seq] for seq in sequences], dtype=torch.long)
y = torch.tensor([char_to_ix[label] for label in labels], dtype=torch.long)

# Split the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=100)

X_train = X_train.to(device)
y_train = y_train.to(device)
X_val = X_val.to(device)
y_val = y_val.to(device)

# Initialize the CharTransformer model with the specified configuration
model = CharTransformer(len(chars),  hidden_dim, len(chars), num_layers, num_heads)
model = model.to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

time: 178 ms (started: 2024-04-27 20:54:38 +00:00)




In [42]:
for epoch in range(100):
    model.train()  # Set the model to training mode
    optimizer.zero_grad()  # Clear gradients before each step

    train_output = model(X_train)
    train_loss = criterion(train_output, y_train)
    train_loss.backward()  # Backpropagation
    optimizer.step()  # Update model parameters


    model.eval()
    with torch.no_grad():  # Disable gradient calculation for evaluation
        val_output = model(X_val)  # Forward pass with validation data
        val_loss = criterion(val_output, y_val)  # Calculate loss
        _, predictions = torch.max(val_output, 1)  # Get predictions from the max logit
        val_accuracy = (predictions == y_val).float().mean()  # Calculate accuracy


    if (epoch + 1) % 10 == 0:
        print(f'Epoch {epoch + 1}, Train Loss: {train_loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')


Epoch 10, Train Loss: 3.06044602394104, Validation Loss: 3.1337132453918457, Validation Accuracy: 0.12896405160427094
Epoch 20, Train Loss: 3.0313990116119385, Validation Loss: 3.1407930850982666, Validation Accuracy: 0.12896405160427094
Epoch 30, Train Loss: 3.0332605838775635, Validation Loss: 3.1422150135040283, Validation Accuracy: 0.12896405160427094
Epoch 40, Train Loss: 3.0287342071533203, Validation Loss: 3.1328587532043457, Validation Accuracy: 0.12896405160427094
Epoch 50, Train Loss: 3.023965835571289, Validation Loss: 3.136594295501709, Validation Accuracy: 0.12896405160427094
Epoch 60, Train Loss: 3.0283520221710205, Validation Loss: 3.136806011199951, Validation Accuracy: 0.12896405160427094
Epoch 70, Train Loss: 3.0271029472351074, Validation Loss: 3.1333842277526855, Validation Accuracy: 0.12896405160427094
Epoch 80, Train Loss: 3.0184733867645264, Validation Loss: 3.1359405517578125, Validation Accuracy: 0.12896405160427094
Epoch 90, Train Loss: 3.020108699798584, Vali

In [43]:
torchinfo.summary(model, input_data = X_train)

Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [1888, 45]                --
├─Embedding: 1-1                              [1888, 30, 128]           5,760
├─TransformerEncoder: 1-2                     [1888, 30, 128]           --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [1888, 30, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-2      [1888, 30, 128]           593,024
│    │    └─TransformerEncoderLayer: 3-3      [1888, 30, 128]           593,024
├─Linear: 1-3                                 [1888, 45]                5,805
Total params: 1,790,637
Trainable params: 1,790,637
Non-trainable params: 0
Total mult-adds (G): 3.01
Input size (MB): 0.45
Forward/backward pass size (MB): 3364.64
Params size (MB): 6.37
Estimated Total Size (MB): 3371.47

time: 12.6 ms (started: 2024-04-27 20:57:01 +00:00)


question 2 (1)

In [46]:
import requests
import torch
from torch.utils.data import Dataset, DataLoader, random_split

url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text


sequence_length = 20

chars = sorted(list(set(text)))
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

encoded_text = [char_to_int[ch] for ch in text]


sequences = [encoded_text[i:i + sequence_length] for i in range(len(encoded_text) - sequence_length)]
targets = [encoded_text[i + sequence_length] for i in range(len(encoded_text) - sequence_length)]

sequences_tensor = torch.tensor(sequences, dtype=torch.long)
targets_tensor = torch.tensor(targets, dtype=torch.long)

class CharDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, index):
        return self.sequences[index], self.targets[index]

dataset = CharDataset(sequences_tensor, targets_tensor)

batch_size = 128
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)


time: 5 s (started: 2024-04-27 21:05:13 +00:00)


In [47]:
import torch.nn as nn

class CharTransformer(nn.Module):
    def __init__(self, vocab_size, emb_dim, num_classes, layers_count, heads_count):
        super(CharTransformer, self).__init__()

        self.embedding = nn.Embedding(vocab_size, emb_dim)
        encoder_layer = nn.TransformerEncoderLayer(d_model=emb_dim, nhead=heads_count)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=layers_count)
        self.output_layer = nn.Linear(emb_dim, num_classes)

    def forward(self, inputs):
        embeddings = self.embedding(inputs)
        encoder_outputs = self.transformer_encoder(embeddings)
        final_output = self.output_layer(encoder_outputs[:, -1])
        return final_output


time: 1.12 ms (started: 2024-04-27 21:08:31 +00:00)


In [48]:
hidden_size = 512
num_layers = 2
num_heads = 2
learning_rate = 0.001
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
epochs = 15

time: 883 µs (started: 2024-04-27 21:08:45 +00:00)


In [53]:
import torch.optim as optim
import torch.nn as nn

# Instantiate the model with specific configurations and send to compute device
model = CharTransformer(vocab_size=len(chars), emb_dim=128, num_classes=len(chars),
                        layers_count=3, heads_count=2).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training and validation loop
for epoch in range(10):
    model.train()  # Set model to training mode
    total_train_loss = 0

    # Training phase
    for batch_inputs, batch_targets in train_loader:
        batch_inputs, batch_targets = batch_inputs.to(device), batch_targets.to(device)
        optimizer.zero_grad()
        train_outputs = model(batch_inputs)
        train_loss = criterion(train_outputs, batch_targets)
        train_loss.backward()
        optimizer.step()
        total_train_loss += train_loss.item() * batch_inputs.size(0)

    average_train_loss = total_train_loss / len(train_loader.dataset)

    # Validation phase
    model.eval()  # Set model to evaluation mode
    total_val_loss = 0
    correct_predictions = 0
    total_predictions = 0

    with torch.no_grad():
        for batch_inputs, batch_targets in test_loader:
            batch_inputs, batch_targets = batch_inputs.to(device), batch_targets.to(device)
            val_outputs = model(batch_inputs)
            val_loss = criterion(val_outputs, batch_targets)
            total_val_loss += val_loss.item() * batch_inputs.size(0)
            _, predictions = torch.max(val_outputs, 1)
            total_predictions += batch_targets.size(0)
            correct_predictions += (predictions == batch_targets).sum().item()

    average_val_loss = total_val_loss / len(test_loader.dataset)
    validation_accuracy = correct_predictions / total_predictions

    # Output training and validation results
    if (epoch + 1) % 1 == 0:
        print(f'Epoch {epoch + 1}, Train Loss: {average_train_loss:.4f}, '
              f'Validation Loss: {average_val_loss:.4f}, Validation Accuracy: {validation_accuracy:.4f}')


Epoch 1, Train Loss: 3.3241, Validation Loss: 3.3182, Validation Accuracy: 0.1525
Epoch 2, Train Loss: 3.3186, Validation Loss: 3.3204, Validation Accuracy: 0.1525
Epoch 3, Train Loss: 3.3182, Validation Loss: 3.3193, Validation Accuracy: 0.1525
Epoch 4, Train Loss: 3.3181, Validation Loss: 3.3189, Validation Accuracy: 0.1525
Epoch 5, Train Loss: 3.3181, Validation Loss: 3.3185, Validation Accuracy: 0.1525
Epoch 6, Train Loss: 3.3182, Validation Loss: 3.3148, Validation Accuracy: 0.1525
Epoch 7, Train Loss: 3.3184, Validation Loss: 3.3204, Validation Accuracy: 0.1525
Epoch 8, Train Loss: 3.3184, Validation Loss: 3.3164, Validation Accuracy: 0.1525
Epoch 9, Train Loss: 3.3184, Validation Loss: 3.3188, Validation Accuracy: 0.1525
Epoch 10, Train Loss: 3.3183, Validation Loss: 3.3163, Validation Accuracy: 0.1525
time: 17min 40s (started: 2024-04-27 21:49:59 +00:00)


In [55]:
!pip install torchinfo
import torchinfo

time: 5.12 s (started: 2024-04-27 22:07:39 +00:00)


In [56]:

dataiter = iter(train_loader)
inputs, labels = next(dataiter)  # Get one batch of data

model.to(inputs.device)
summary = torchinfo.summary(model, input_data=(inputs,))
print(summary)



Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [128, 65]                 --
├─Embedding: 1-1                              [128, 20, 128]            8,320
├─TransformerEncoder: 1-2                     [128, 20, 128]            --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [128, 20, 128]            593,024
│    │    └─TransformerEncoderLayer: 3-2      [128, 20, 128]            593,024
│    │    └─TransformerEncoderLayer: 3-3      [128, 20, 128]            593,024
├─Linear: 1-3                                 [128, 65]                 8,385
Total params: 1,795,777
Trainable params: 1,795,777
Non-trainable params: 0
Total mult-adds (M): 204.50
Input size (MB): 0.02
Forward/backward pass size (MB): 152.11
Params size (MB): 6.39
Estimated Total Size (MB): 158.52
time: 360 ms (started: 2024-04-27 22:07:44 +00:00)


Sequence = 30

In [57]:
import requests
import torch
from torch.utils.data import Dataset, DataLoader, random_split

url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text


sequence_length = 30

chars = sorted(list(set(text)))
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

encoded_text = [char_to_int[ch] for ch in text]


sequences = [encoded_text[i:i + sequence_length] for i in range(len(encoded_text) - sequence_length)]
targets = [encoded_text[i + sequence_length] for i in range(len(encoded_text) - sequence_length)]

sequences_tensor = torch.tensor(sequences, dtype=torch.long)
targets_tensor = torch.tensor(targets, dtype=torch.long)

class CharDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, index):
        return self.sequences[index], self.targets[index]

dataset = CharDataset(sequences_tensor, targets_tensor)

batch_size = 128
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)


time: 6.31 s (started: 2024-04-27 22:07:45 +00:00)


In [54]:
import torch.nn as nn

class CharTransformer(nn.Module):
    def __init__(self, vocab_size, emb_dim, num_classes, layers_count, heads_count):
        super(CharTransformer, self).__init__()

        self.embedding = nn.Embedding(vocab_size, emb_dim)
        encoder_layer = nn.TransformerEncoderLayer(d_model=emb_dim, nhead=heads_count)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=layers_count)
        self.output_layer = nn.Linear(emb_dim, num_classes)

    def forward(self, inputs):
        embeddings = self.embedding(inputs)
        encoder_outputs = self.transformer_encoder(embeddings)
        final_output = self.output_layer(encoder_outputs[:, -1])
        return final_output


time: 699 µs (started: 2024-04-27 22:07:39 +00:00)


In [58]:
hidden_size = 512
num_layers = 2
num_heads = 2
learning_rate = 0.001
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
epochs = 15

time: 754 µs (started: 2024-04-27 22:07:51 +00:00)


In [59]:
import torch.optim as optim
import torch.nn as nn

# Instantiate the model with specific configurations and send to compute device
model = CharTransformer(vocab_size=len(chars), emb_dim=128, num_classes=len(chars),
                        layers_count=3, heads_count=2).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training and validation loop
for epoch in range(10):
    model.train()  # Set model to training mode
    total_train_loss = 0

    # Training phase
    for batch_inputs, batch_targets in train_loader:
        batch_inputs, batch_targets = batch_inputs.to(device), batch_targets.to(device)
        optimizer.zero_grad()
        train_outputs = model(batch_inputs)
        train_loss = criterion(train_outputs, batch_targets)
        train_loss.backward()
        optimizer.step()
        total_train_loss += train_loss.item() * batch_inputs.size(0)

    average_train_loss = total_train_loss / len(train_loader.dataset)

    # Validation phase
    model.eval()  # Set model to evaluation mode
    total_val_loss = 0
    correct_predictions = 0
    total_predictions = 0

    with torch.no_grad():
        for batch_inputs, batch_targets in test_loader:
            batch_inputs, batch_targets = batch_inputs.to(device), batch_targets.to(device)
            val_outputs = model(batch_inputs)
            val_loss = criterion(val_outputs, batch_targets)
            total_val_loss += val_loss.item() * batch_inputs.size(0)
            _, predictions = torch.max(val_outputs, 1)
            total_predictions += batch_targets.size(0)
            correct_predictions += (predictions == batch_targets).sum().item()

    average_val_loss = total_val_loss / len(test_loader.dataset)
    validation_accuracy = correct_predictions / total_predictions

    # Output training and validation results
    if (epoch + 1) % 1 == 0:
        print(f'Epoch {epoch + 1}, Train Loss: {average_train_loss:.4f}, '
              f'Validation Loss: {average_val_loss:.4f}, Validation Accuracy: {validation_accuracy:.4f}')


Epoch 1, Train Loss: 3.3233, Validation Loss: 3.3200, Validation Accuracy: 0.1522
Epoch 2, Train Loss: 3.3178, Validation Loss: 3.3251, Validation Accuracy: 0.1522
Epoch 3, Train Loss: 3.3176, Validation Loss: 3.3212, Validation Accuracy: 0.1522
Epoch 4, Train Loss: 3.3175, Validation Loss: 3.3207, Validation Accuracy: 0.1522
Epoch 5, Train Loss: 3.3176, Validation Loss: 3.3239, Validation Accuracy: 0.1522
Epoch 6, Train Loss: 3.3176, Validation Loss: 3.3205, Validation Accuracy: 0.1522
Epoch 7, Train Loss: 3.3176, Validation Loss: 3.3228, Validation Accuracy: 0.1522
Epoch 8, Train Loss: 3.3177, Validation Loss: 3.3218, Validation Accuracy: 0.1522
Epoch 9, Train Loss: 3.3177, Validation Loss: 3.3220, Validation Accuracy: 0.1522
Epoch 10, Train Loss: 3.3177, Validation Loss: 3.3196, Validation Accuracy: 0.1522
time: 24min 38s (started: 2024-04-27 22:07:51 +00:00)


In [60]:

!pip install torchinfo
import torchinfo


time: 4.92 s (started: 2024-04-27 22:32:29 +00:00)


In [61]:

dataiter = iter(train_loader)
inputs, labels = next(dataiter)  # Get one batch of data

model.to(inputs.device)
summary = torchinfo.summary(model, input_data=(inputs,))
print(summary)


Layer (type:depth-idx)                        Output Shape              Param #
CharTransformer                               [128, 65]                 --
├─Embedding: 1-1                              [128, 30, 128]            8,320
├─TransformerEncoder: 1-2                     [128, 30, 128]            --
│    └─ModuleList: 2-1                        --                        --
│    │    └─TransformerEncoderLayer: 3-1      [128, 30, 128]            593,024
│    │    └─TransformerEncoderLayer: 3-2      [128, 30, 128]            593,024
│    │    └─TransformerEncoderLayer: 3-3      [128, 30, 128]            593,024
├─Linear: 1-3                                 [128, 65]                 8,385
Total params: 1,795,777
Trainable params: 1,795,777
Non-trainable params: 0
Total mult-adds (M): 204.50
Input size (MB): 0.03
Forward/backward pass size (MB): 228.13
Params size (MB): 6.39
Estimated Total Size (MB): 234.55
time: 487 ms (started: 2024-04-27 22:32:34 +00:00)
