In [1]:
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
import numpy as np

2025-10-07 14:47:57.340271: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-10-07 14:47:57.437220: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-10-07 14:47:57.498178: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759828677.567821    8148 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759828677.590214    8148 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1759828677.731274    8148 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

This cell imports required libraries:

- `tensorflow` for building and training the model.
- `pad_sequences` and `to_categorical` from Keras utilities for data preprocessing.
- `numpy` for numerical operations and array handling.

Expectations: these are standard imports. Ensure TensorFlow is installed in the environment; GPU availability is optional and indicated by `nvidia-smi` and TensorFlow logs.

In [2]:
path_to_file = tf.keras.utils.get_file("shakespeare.txt", "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt")

text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

This cell downloads a sample text file (tiny Shakespeare) and reads it into memory as a single string `text`.

Notes:
- `tf.keras.utils.get_file` caches the download so repeated runs are faster.
- The file is opened in binary mode and decoded to UTF-8 to preserve all characters.
- `text` will be used to create character-level sequences for training.

In [3]:
chars = sorted(set(text))
char2idx = {c: i for i,c in enumerate(chars)}
idx2char = np.array(chars)

This cell builds character-level mappings:

- `chars` is the sorted list of unique characters present in `text`.
- `char2idx` maps each character to a unique integer index (used for encoding).
- `idx2char` is the reverse mapping (array indexed by integer to retrieve the character).

These mappings convert the raw text into integer sequences that can be fed into the model.

In [4]:
seq_length = 100
sequences = []
for i in range(len(text) - seq_length):
    sequences.append([char2idx[c] for c in text[i:i+seq_length]])

This cell creates training sequences of length `seq_length` characters:

- For each position `i`, it takes the slice `text[i:i+seq_length]` and converts it to integer indices using `char2idx`.
- Each sequence is a fixed-length array of indices representing a short character window.
- `sequences` will be used to build input-target pairs where the last character is the target.

In [5]:
X = np.array([seq[:-1] for seq in sequences])
y = np.array([seq[-1] for seq in sequences])
y = to_categorical(y,num_classes = len(chars))

This cell converts `sequences` into model inputs `X` and targets `y`:

- `X` contains all sequences except the last character of each window (shape: `[num_sequences, seq_length-1]`).
- `y` contains the last character index for each sequence (shape: `[num_sequences]`).
- `to_categorical` turns `y` into a one-hot encoded matrix of shape `[num_sequences, len(chars)]`, suitable for categorical cross-entropy loss.

In [6]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(len(chars), 64),
    tf.keras.layers.LSTM(256, return_sequences=True),
    tf.keras.layers.LSTM(256),
    tf.keras.layers.Dense(len(chars), activation='softmax')
])

E0000 00:00:1759828696.711798    8148 cuda_executor.cc:1228] INTERNAL: CUDA Runtime error: Failed call to cudaGetRuntimeVersion: Error loading CUDA libraries. GPU will not be used.: Error loading CUDA libraries. GPU will not be used.
W0000 00:00:1759828696.716671    8148 gpu_device.cc:2341] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


This cell constructs the Keras model:

- `Embedding(len(chars), 64)` maps character indices (0..len(chars)-1) to 64-dimensional vectors. This layer learns character embeddings during training.
- Two stacked `LSTM(256)` layers model temporal dependencies; the first returns sequences to feed the second.
- The final `Dense(len(chars), activation='softmax')` outputs a probability distribution over the next character.

Notes:
- Removing `input_length` from `Embedding` avoids deprecation warnings. The model expects integer input sequences for embedding.

In [8]:
model.compile(loss = 'categorical_crossentropy',optimizer = 'adam',metrics=['accuracy'])
model.fit(X,y,epochs = 1, batch_size = 256)

2025-10-07 14:48:32.135961: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 883312848 exceeds 10% of free system memory.
2025-10-07 14:48:32.997997: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 289976440 exceeds 10% of free system memory.


[1m4357/4357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2556s[0m 586ms/step - accuracy: 0.3675 - loss: 2.2280


<keras.src.callbacks.history.History at 0x76fbcee995e0>

This cell compiles and trains the model:

- `loss='categorical_crossentropy'` matches the one-hot encoded targets (`y`).
- `optimizer='adam'` provides adaptive learning rates.
- `metrics=['accuracy']` reports character prediction accuracy during training; accuracy on character prediction can be low but loss is informative.
- `model.fit(...)` trains for the specified epochs and batch size. Training time and GPU usage depend on your environment; on CPU this will be slower.

In [9]:
def generate_text(seed_text,num_chars):
    for _ in range(num_chars):
        seed_seq = np.array([char2idx[c] for c in seed_text[-seq_length+1: ]])
        seed_seq = seed_seq.reshape(1,-1)
        pred = model.predict(seed_seq)
        next_char = idx2char[np.argmax(pred)]
        seed_text += next_char
    return seed_text

print(generate_text("To be, or not to be, that is the question:", 100) )

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 245ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 150ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 

This cell defines a simple text generation helper:

- `generate_text(seed_text, num_chars)` iteratively predicts the next character and appends it to the input seed.
- It converts the last `seq_length-1` characters of the current seed to indices and reshapes to the model input shape `(1, seq_length-1)`.
- `model.predict` returns a probability vector over characters; `np.argmax` selects the most likely next character.
- The function appends each predicted character to `seed_text` and returns the generated string.

Limitations: this is greedy sampling (always choose argmax). To get more varied text, use temperature sampling on the probability distribution instead of argmax.

## Notebook summary and next steps

This notebook demonstrates a simple character-level language model using LSTM:

- Data: tiny Shakespeare dataset loaded and converted to character-index sequences.
- Model: Embedding -> LSTM -> LSTM -> Dense(softmax) to predict next character.
- Training: categorical cross-entropy with one-hot targets.
- Generation: greedy sampling using argmax; consider temperature sampling for variety.

Next steps:
- Train longer and with more data for coherent output.
- Use temperature-based sampling for richer generation.
- Save and load model weights for repeated experiments.