In [1]:
!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O shakespeare.txt

--2024-09-16 13:47:05--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1.1M) [text/plain]
Saving to: ‘shakespeare.txt’


2024-09-16 13:47:05 (27.3 MB/s) - ‘shakespeare.txt’ saved [1115394/1115394]



# **Step 1: Dataset Preparation**
We will use Shakespearean text data as the corpus for training the model.

**Code to load and preprocess the data:**

In [2]:
import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Load dataset (Shakespeare corpus)
with open("shakespeare.txt", 'r') as f:
    text = f.read()

# Create a character-to-index and index-to-character mapping
chars = sorted(list(set(text)))
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for i, ch in enumerate(chars)}

# Convert the text into integer indices
text_as_int = np.array([char_to_idx[c] for c in text])

# Set sequence length for training
SEQ_LENGTH = 100
BATCH_SIZE = 64

# Create input-output sequences
def create_sequences(text, seq_length):
    inputs = []
    targets = []
    for i in range(0, len(text) - seq_length):
        inputs.append(text[i:i+seq_length])
        targets.append(text[i+seq_length])
    return np.array(inputs), np.array(targets)

inputs, targets = create_sequences(text_as_int, SEQ_LENGTH)

# Create a PyTorch dataset
class TextDataset(Dataset):
    def __init__(self, inputs, targets):
        self.inputs = torch.tensor(inputs, dtype=torch.long)
        self.targets = torch.tensor(targets, dtype=torch.long)

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return self.inputs[idx], self.targets[idx]

dataset = TextDataset(inputs, targets)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

# **Step 2: Model Definition (Transformer)**
We'll define a Transformer model for text generation.

**Code to define the Transformer-based text generation model:**

In [3]:
class TransformerModel(nn.Module):
    def __init__(self, vocab_size, d_model, num_heads, num_layers, dropout=0.1):
        super(TransformerModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoder = nn.Embedding(SEQ_LENGTH, d_model)
        self.transformer = nn.Transformer(d_model, num_heads, num_layers, num_layers, dropout=dropout)
        self.fc = nn.Linear(d_model, vocab_size)

    def forward(self, src):
        # Add positional encoding to the input
        seq_length = src.shape[1]
        pos = torch.arange(0, seq_length).unsqueeze(0).repeat(src.size(0), 1).to(src.device)
        embedded = self.embedding(src) + self.pos_encoder(pos)

        # Transformer expects (sequence_length, batch_size, embedding_dim)
        embedded = embedded.transpose(0, 1)
        transformer_output = self.transformer(embedded, embedded)
        output = self.fc(transformer_output[-1])  # Take the output of the last token
        return output

# **Step 3: Training Loop**
We'll now define the training loop to train the model on the dataset.

**Training code:**

In [4]:
# Model parameters
VOCAB_SIZE = len(chars)
D_MODEL = 128
NUM_HEADS = 8
NUM_LAYERS = 4
EPOCHS = 1

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = TransformerModel(VOCAB_SIZE, D_MODEL, NUM_HEADS, NUM_LAYERS).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

def train(model, dataloader, optimizer, loss_fn, epochs):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for batch, (x, y) in enumerate(dataloader):
            x, y = x.to(device), y.to(device)

            optimizer.zero_grad()
            output = model(x)
            loss = loss_fn(output, y)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {total_loss / len(dataloader)}")

# Train the model
train(model, dataloader, optimizer, loss_fn, EPOCHS)

# Save model checkpoints
torch.save(model.state_dict(), "transformer_text_gen.pth")



Epoch 1, Loss: 3.3200660777829385


# **Step 4: Text Generation**
To generate text, we'll use the trained model to predict the next character, using a softmax function to sample the most likely next character.

**Text generation code:**

In [5]:
def generate_text(model, start_text, gen_length):
    model.eval()
    input_eval = [char_to_idx[s] for s in start_text]
    input_eval = torch.tensor(input_eval, dtype=torch.long).unsqueeze(0).to(device)

    generated_text = start_text
    for _ in range(gen_length):
        with torch.no_grad():
            output = model(input_eval)
            prediction = torch.softmax(output, dim=-1)
            next_char_idx = torch.multinomial(prediction, num_samples=1).item()
            next_char = idx_to_char[next_char_idx]
            generated_text += next_char

            # Prepare the input for the next step
            input_eval = torch.cat([input_eval[:, 1:], torch.tensor([[next_char_idx]], dtype=torch.long).to(device)], dim=1)

    return generated_text

# Generate a text sample
start_text = "To be, or not to be, "
generated = generate_text(model, start_text, 500)
print(generated)

To be, or not to be, dla
IhfS
le cAh h.:soth lhm!u IhbheesnsarysoVempwfCrei gt,np neipitS
vse Vetil gatpvrYy tmaua  etYctytonR srte 
kuar eb
ttwa o uoH hrw,fDtcwlr 
tr, cumie  Fo  ethnuteshekteb oRt :s
 tnucw
nnleeh,Waeymd, eHrnpI meta asa Elne zLCAt . t aukheWafl eb Rouydielot,Npe i,hcnib.Les

oueheM,,nwo   f,taacunttFslto   
hoyy i
,eA  rcohslh,l hfrydn tdsuaeehdcscwieehog dwirosmarroedeasl u wO iike,ieb!M nDe'L'fe Heyaasmmus nBoIft gth asy
b sr 
oesshm eeet, PsUscuo
 ? a;tl:v,gietsn et iode a ra
teuadsbI 
hri
  s


the efficiancy increases when the epoch is increased but it needs large compute power