# GPT text generation from scratch with KerasHub

**Author:** [Jesse Chan](https://github.com/jessechancy)<br>
**Date created:** 2022/07/25<br>
**Last modified:** 2022/07/25<br>
**Description:** Using KerasHub to train a mini-GPT model for text generation.

## Introduction

In this example, we will use KerasHub to build a scaled down Generative
Pre-Trained (GPT) model. GPT is a Transformer-based model that allows you to generate
sophisticated text from a prompt.

We will train the model on the [simplebooks-92](https://arxiv.org/abs/1911.12391) corpus,
which is a dataset made from several novels. It is a good dataset for this example since
it has a small vocabulary and high word frequency, which is beneficial when training a
model with few parameters.

This example combines concepts from
[Text generation with a miniature GPT](https://keras.io/examples/generative/text_generation_with_miniature_gpt/)
with KerasHub abstractions. We will demonstrate how KerasHub tokenization, layers and
metrics simplify the training
process, and then show how to generate output text using the KerasHub sampling utilities.

Note: If you are running this example on a Colab,
make sure to enable GPU runtime for faster training.

This example requires KerasHub. You can install it via the following command:
`pip install keras-hub`

## Setup

In [1]:
!pip install -q --upgrade keras-hub
!pip install -q --upgrade keras  # Upgrade to Keras 3.

In [2]:
import os
import keras_hub
import keras

import tensorflow.data as tf_data
import tensorflow.strings as tf_strings

## Settings & hyperparameters

In [3]:
# Data
BATCH_SIZE = 64
MIN_STRING_LEN = 512  # Strings shorter than this will be discarded
SEQ_LEN = 128  # Length of training sequences, in tokens

# Model
EMBED_DIM = 256
FEED_FORWARD_DIM = 128
NUM_HEADS = 3
NUM_LAYERS = 2
VOCAB_SIZE = 5000  # Limits parameters in model.

# Training
EPOCHS = 5

# Inference
NUM_TOKENS_TO_GENERATE = 80

## Load the data

Now, let's download the dataset! The SimpleBooks dataset consists of 1,573 Gutenberg books, and has
one of the smallest vocabulary size to word-level tokens ratio. It has a vocabulary size of ~98k,
a third of WikiText-103's, with around the same number of tokens (~100M). This makes it easy to fit a small model.

In [4]:
keras.utils.get_file(
    origin="https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip",
    extract=True,
)
dir = os.path.expanduser("~/.keras/datasets/simplebooks/")

# Load simplebooks-92 train set and filter out short lines.
raw_train_ds = (
    tf_data.TextLineDataset(dir + "simplebooks-92-raw/train.txt")
    .filter(lambda x: tf_strings.length(x) > MIN_STRING_LEN)
    .batch(BATCH_SIZE)
    .shuffle(buffer_size=256)
)

# Load simplebooks-92 validation set and filter out short lines.
raw_val_ds = (
    tf_data.TextLineDataset(dir + "simplebooks-92-raw/valid.txt")
    .filter(lambda x: tf_strings.length(x) > MIN_STRING_LEN)
    .batch(BATCH_SIZE)
)

Downloading data from https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip
[1m282386239/282386239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 0us/step


## Train the tokenizer

We train the tokenizer from the training dataset for a vocabulary size of `VOCAB_SIZE`,
which is a tuned hyperparameter. We want to limit the vocabulary as much as possible, as
we will see later on
that it has a large effect on the number of model parameters. We also don't want to include
*too few* vocabulary terms, or there would be too many out-of-vocabulary (OOV) sub-words. In
addition, three tokens are reserved in the vocabulary:

- `"[PAD]"` for padding sequences to `SEQ_LEN`. This token has index 0 in both
`reserved_tokens` and `vocab`, since `WordPieceTokenizer` (and other layers) consider
`0`/`vocab[0]` as the default padding.
- `"[UNK]"` for OOV sub-words, which should match the default `oov_token="[UNK]"` in
`WordPieceTokenizer`.
- `"[BOS]"` stands for beginning of sentence, but here technically it is a token
representing the beginning of each line of training data.

In [5]:
import os
import zipfile
import keras
# Get current working directory
cwd = os.getcwd()
# Download the dataset to the current working directory
file_path = keras.utils.get_file(
   fname="simplebooks.zip",
   origin="https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip",
   extract=False,  # Do not extract immediately
   cache_dir=cwd  # Save it in the current working directory
)
# Extract the zip file manually to the current working directory
with zipfile.ZipFile(file_path, 'r') as zip_ref:
   zip_ref.extractall(cwd)
# Now set the dataset directory based on your current working directory
dir = os.path.join(cwd, "simplebooks/")
# Load simplebooks-92 train set and filter out short lines.
raw_train_ds = (
   tf_data.TextLineDataset(dir + "simplebooks-92-raw/train.txt")
   .filter(lambda x: tf_strings.length(x) > MIN_STRING_LEN)
   .batch(BATCH_SIZE)
   .shuffle(buffer_size=256)
)
# Load simplebooks-92 validation set and filter out short lines.
raw_val_ds = (
   tf_data.TextLineDataset(dir + "simplebooks-92-raw/valid.txt")
   .filter(lambda x: tf_strings.length(x) > MIN_STRING_LEN)
   .batch(BATCH_SIZE)
)
print(f"Dataset extracted to: {dir}")

Downloading data from https://dldata-public.s3.us-east-2.amazonaws.com/simplebooks.zip
[1m282386239/282386239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 0us/step
Dataset extracted to: /content/simplebooks/


In [6]:
# Train tokenizer vocabulary
vocab = keras_hub.tokenizers.compute_word_piece_vocabulary(
    raw_train_ds,
    vocabulary_size=VOCAB_SIZE,
    lowercase=True,
    reserved_tokens=["[PAD]", "[UNK]", "[BOS]"],
)

## Load tokenizer

We use the vocabulary data to initialize
`keras_hub.tokenizers.WordPieceTokenizer`. WordPieceTokenizer is an efficient
implementation of the WordPiece algorithm used by BERT and other models. It will strip,
lower-case and do other irreversible preprocessing operations.

In [7]:
tokenizer = keras_hub.tokenizers.WordPieceTokenizer(
    vocabulary=vocab,
    sequence_length=SEQ_LEN,
    lowercase=True,
)

## Tokenize data

We preprocess the dataset by tokenizing and splitting it into `features` and `labels`.

In [8]:
# packer adds a start token
start_packer = keras_hub.layers.StartEndPacker(
    sequence_length=SEQ_LEN,
    start_value=tokenizer.token_to_id("[BOS]"),
)


def preprocess(inputs):
    outputs = tokenizer(inputs)
    features = start_packer(outputs)
    labels = outputs
    return features, labels


# Tokenize and split into train and label sequences.
train_ds = raw_train_ds.map(preprocess, num_parallel_calls=tf_data.AUTOTUNE).prefetch(
    tf_data.AUTOTUNE
)
val_ds = raw_val_ds.map(preprocess, num_parallel_calls=tf_data.AUTOTUNE).prefetch(
    tf_data.AUTOTUNE
)

## Build the model

We create our scaled down GPT model with the following layers:

- One `keras_hub.layers.TokenAndPositionEmbedding` layer, which combines the embedding
for the token and its position.
- Multiple `keras_hub.layers.TransformerDecoder` layers, with the default causal masking.
The layer has no cross-attention when run with decoder sequence only.
- One final dense linear layer

In [9]:
inputs = keras.layers.Input(shape=(None,), dtype="int32")
# Embedding.
embedding_layer = keras_hub.layers.TokenAndPositionEmbedding(
    vocabulary_size=VOCAB_SIZE,
    sequence_length=SEQ_LEN,
    embedding_dim=EMBED_DIM,
    mask_zero=True,
)
x = embedding_layer(inputs)
# Transformer decoders.
for _ in range(NUM_LAYERS):
    decoder_layer = keras_hub.layers.TransformerDecoder(
        num_heads=NUM_HEADS,
        intermediate_dim=FEED_FORWARD_DIM,
    )
    x = decoder_layer(x)  # Giving one argument only skips cross-attention.
# Output.
outputs = keras.layers.Dense(VOCAB_SIZE)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
perplexity = keras_hub.metrics.Perplexity(from_logits=True, mask_token_id=0)
model.compile(optimizer="adam", loss=loss_fn, metrics=[perplexity])

Let's take a look at our model summary - a large majority of the
parameters are in the `token_and_position_embedding` and the output `dense` layer!
This means that the vocabulary size (`VOCAB_SIZE`) has a large effect on the size of the model,
while the number of Transformer decoder layers (`NUM_LAYERS`) doesn't affect it as much.

In [10]:
model.summary()

## Training

Now that we have our model, let's train it with the `fit()` method.

In [11]:
model.fit(train_ds, validation_data=val_ds, epochs=EPOCHS)

Epoch 1/5




   2435/Unknown [1m157s[0m 57ms/step - loss: 5.0015 - perplexity: 179.4257

  self.gen.throw(typ, value, traceback)


[1m2445/2445[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m162s[0m 60ms/step - loss: 4.9993 - perplexity: 179.0399 - val_loss: 4.2581 - val_perplexity: 70.8194
Epoch 2/5
[1m2445/2445[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 55ms/step - loss: 4.1725 - perplexity: 64.9378 - val_loss: 4.0726 - val_perplexity: 58.8025
Epoch 3/5
[1m2445/2445[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m133s[0m 52ms/step - loss: 4.0359 - perplexity: 56.6336 - val_loss: 4.0471 - val_perplexity: 57.3419
Epoch 4/5
[1m2445/2445[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 52ms/step - loss: 3.9650 - perplexity: 52.7517 - val_loss: 4.0162 - val_perplexity: 55.5328
Epoch 5/5
[1m2445/2445[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 51ms/step - loss: 3.9163 - perplexity: 50.2445 - val_loss: 4.0287 - val_perplexity: 56.2439


<keras.src.callbacks.history.History at 0x7f370e590940>

## Inference

With our trained model, we can test it out to gauge its performance. To do this
we can seed our model with an input sequence starting with the `"[BOS]"` token,
and progressively sample the model by making predictions for each subsequent
token in a loop.

To start lets build a prompt with the same shape as our model inputs, containing
only the `"[BOS]"` token.

In [12]:
# The "packer" layers adds the [BOS] token for us.
prompt_tokens = start_packer(tokenizer([""]))
prompt_tokens

<tf.Tensor: shape=(1, 128), dtype=int32, numpy=
array([[2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32)>

We will use the `keras_hub.samplers` module for inference, which requires a
callback function wrapping the model we just trained. This wrapper calls
the model and returns the logit predictions for the current token we are
generating.

Note: There are two pieces of more advanced functionality available when
defining your callback. The first is the ability to take in a `cache` of states
computed in previous generation steps, which can be used to speed up generation.
The second is the ability to output the final dense "hidden state" of each
generated token. This is used by `keras_hub.samplers.ContrastiveSampler`, which
avoids repetition by penalizing repeated hidden states. Both are optional, and
we will ignore them for now.

In [13]:

def next(prompt, cache, index):
    logits = model(prompt)[:, index - 1, :]
    # Ignore hidden states for now; only needed for contrastive search.
    hidden_states = None
    return logits, hidden_states, cache


Creating the wrapper function is the most complex part of using these functions. Now that
it's done, let's test out the different utilities, starting with greedy search.

### Greedy search

We greedily pick the most probable token at each timestep. In other words, we get the
argmax of the model output.

In [14]:
sampler = keras_hub.samplers.GreedySampler()
output_tokens = sampler(
    next=next,
    prompt=prompt_tokens,
    index=1,  # Start sampling immediately after the [BOS] token.
)
txt = tokenizer.detokenize(output_tokens)
print(f"Greedy search generated text: \n{txt}\n")

Greedy search generated text: 
['[BOS] " i have been thinking of the matter , " he said , " but i have been a good deal more than i have been , and i have been a good deal more than i have been in the habit of doing it . i have been a good deal more comfortable , and i have been a good deal more comfortable , and i have been a good deal more comfortable , and i have been a good deal more comfortable than i have been . i have been a good deal worse than i have been , but i have been a good deal worse than i have been , and i have been very much disappointed , and i have been very']



As you can see, greedy search starts out making some sense, but quickly starts repeating
itself. This is a common problem with text generation that can be fixed by some of the
probabilistic text generation utilities shown later on!

### Beam search

At a high-level, beam search keeps track of the `num_beams` most probable sequences at
each timestep, and predicts the best next token from all sequences. It is an improvement
over greedy search since it stores more possibilities. However, it is less efficient than
greedy search since it has to compute and store multiple potential sequences.

**Note:** beam search with `num_beams=1` is identical to greedy search.

In [15]:
sampler = keras_hub.samplers.BeamSampler(num_beams=10)
output_tokens = sampler(
    next=next,
    prompt=prompt_tokens,
    index=1,
)
txt = tokenizer.detokenize(output_tokens)
print(f"Beam search generated text: \n{txt}\n")

Beam search generated text: 
['[BOS] " i don \' t know , " he said , " but i don \' t know what it means , and i don \' t know what it means , but i don \' t know what it means . i don \' t think it \' s better , but i don \' t think it \' s better . i don \' t think it \' s better , but i don \' t think it \' s better , but i don \' t think it \' s better . i don \' t think it \' s better , but i don \' t think it \' s likely enough , but i \' m not sure , but i \'']



Similar to greedy search, beam search quickly starts repeating itself, since it is still
a deterministic method.

### Random search

Random search is our first probabilistic method. At each time step, it samples the next
token using the softmax probabilities provided by the model.

In [16]:
sampler = keras_hub.samplers.RandomSampler()
output_tokens = sampler(
    next=next,
    prompt=prompt_tokens,
    index=1,
)
txt = tokenizer.detokenize(output_tokens)
print(f"Random search generated text: \n{txt}\n")

Random search generated text: 
["[BOS] guessed wonder that his affection of his person could not see every fellow - - but he was certain that he would or he turned into farmer brown ' s boy , but they seemed to have thought of his time to reach the tree . make him happy . he thought the animals would be away from the cliffs ( you gave him ) ; he was socks they for whom he had only wanted to know , and the matter didn ' t seem as it usually too in the way of ever , but he could know better . when it was there , the exertions that is all the world , or you may have brought up enough"]



Voilà, no repetitions! However, with random search, we may see some nonsensical words
appearing since any word in the vocabulary has a chance of appearing with this sampling
method. This is fixed by our next search utility, top-k search.

### Top-K search

Similar to random search, we sample the next token from the probability distribution
provided by the model. The only difference is that here, we select out the top `k` most
probable tokens, and distribute the probability mass over them before sampling. This way,
we won't be sampling from low probability tokens, and hence we would have less
nonsensical words!

In [17]:
sampler = keras_hub.samplers.TopKSampler(k=10)
output_tokens = sampler(
    next=next,
    prompt=prompt_tokens,
    index=1,
)
txt = tokenizer.detokenize(output_tokens)
print(f"Top-K search generated text: \n{txt}\n")

Top-K search generated text: 
['[BOS] it was a beautiful morning , and a few hours passed by , the little girl had been in the morning of the palace . she had taken up the lamp to find the room , and then she saw them going to a palace , which had been built . the palace was very beautiful . she was dressed in the center and the courtyard . it was a beautiful picture - - a picture . it was a beautiful sight ; the beautiful blue and gold screen into the palace . in the center of the court palace , the palace was beautiful , white and yellow , and gold . there was a lovely blue silk gown , and a']



### Top-P search

Even with the top-k search, there is something to improve upon. With top-k search, the
number `k` is fixed, which means it selects the same number of tokens for any probability
distribution. Consider two scenarios, one where the probability mass is concentrated over
2 words and another where the probability mass is evenly concentrated across 10. Should
we choose `k=2` or `k=10`? There is no one size that fits all `k` here.

This is where top-p search comes in! Instead of choosing a `k`, we choose a probability
`p` that we want the probabilities of the top tokens to sum up to. This way, we can
dynamically adjust the `k` based on the probability distribution. By setting `p=0.9`, if
90% of the probability mass is concentrated on the top 2 tokens, we can filter out the
top 2 tokens to sample from. If instead the 90% is distributed over 10 tokens, it will
similarly filter out the top 10 tokens to sample from.

In [18]:
sampler = keras_hub.samplers.TopPSampler(p=0.5)
output_tokens = sampler(
    next=next,
    prompt=prompt_tokens,
    index=1,
)
txt = tokenizer.detokenize(output_tokens)
print(f"Top-P search generated text: \n{txt}\n")

Top-P search generated text: 
['[BOS] " i will give you to the court of the court . i have not heard this in a large room . i shall have no idea of having been profitable , and have taken your steps . i shall see you again , and , at any time i shall not have to be able to understand what it is necessary . i shall have to say that the king \' s orders are to be sent to my master , and i shall have to give you the exploits to the end of the court . i shall have to be found , for the court ladies , and i shall have to carry my orders to the']



### Using callbacks for text generation

We can also wrap the utilities in a callback, which allows you to print out a prediction
sequence for every epoch of the model! Here is an example of a callback for top-k search:

In [19]:

class TopKTextGenerator(keras.callbacks.Callback):
    """A callback to generate text from a trained model using top-k."""

    def __init__(self, k):
        self.sampler = keras_hub.samplers.TopKSampler(k)

    def on_epoch_end(self, epoch, logs=None):
        output_tokens = self.sampler(
            next=next,
            prompt=prompt_tokens,
            index=1,
        )
        txt = tokenizer.detokenize(output_tokens)
        print(f"Top-K search generated text: \n{txt}\n")


text_generation_callback = TopKTextGenerator(k=10)
# Dummy training loop to demonstrate callback.
model.fit(train_ds.take(1), verbose=2, epochs=2, callbacks=[text_generation_callback])

Epoch 1/2
Top-K search generated text: 
['[BOS] " i shall be very well , " sir archie said , " sir knight , sir archie . " and sir gareth rode down the hall , and sir gareth of sir gareth of his knights and sir sir john . he said so sir knight sir knightly and sir gareth sir tristram with sir gareth sir arthur , for sir lath sir tristram , sir lath sir gareth sir knight . and sir tristram said sir la sir , saying sir knight sir launcelot , sir gareth sir la sir gareth sir launcelot , and sir sir sir gareth sir knight sir gare']

1/1 - 13s - 13s/step - loss: 3.9650 - perplexity: 52.7177
Epoch 2/2
Top-K search generated text: 
['[BOS] he had a very singular idea that they were very beautiful , in the land of the land and sea , and the sun had risen to be leveled , and that it was all right to the westward ; and , he made a beautiful maiden , who stood with his hands and arms , in his arms and legs , while the sun was so bright and the sun was shining , and he perceived that it was so dar

<keras.src.callbacks.history.History at 0x7f36c9478970>

## Conclusion

To recap, in this example, we use KerasHub layers to train a sub-word vocabulary,
tokenize training data, create a miniature GPT model, and perform inference with the
text generation library.

If you would like to understand how Transformers work, or learn more about training the
full GPT model, here are some further readings:

- Attention Is All You Need [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)
- GPT-3 Paper [Brown et al., 2020](https://arxiv.org/abs/2005.14165)

In [20]:
!pip install transformers





In [21]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2")




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [22]:
import torch

# Example fine-tuning code (assuming you have input_ids ready)
# gpt2_model.train()  # Set the model to training mode

# Fine-tuning (simplified for demonstration)
# You would need to prepare your data loader here
# gpt2_model.fit(input_ids, input_ids, epochs=3, batch_size=2)

# Text generation function
def generate_text(prompt, max_length=50):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")  # Convert prompt to input IDs
    output_ids = gpt2_model.generate(input_ids, max_length=max_length)
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Example prompt
prompt = "Once upon a time in a land far away"
generated_text = generate_text(prompt)
print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Text: Once upon a time in a land far away, the sun was shining, and the moon was shining. The sun was shining, and the moon was shining. The sun was shining, and the moon was shining. The sun was shining, and the


In [22]:
# PART 2


In [4]:
!pip install git+https://github.com/keras-team/keras-hub.git -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [5]:
import os

os.environ["KERAS_BACKEND"] = "jax"  # or "tensorflow" or "torch"

import keras_hub
import keras
import tensorflow as tf
import time

keras.mixed_precision.set_global_policy("mixed_float16")

In [6]:
# To speed up training and generation, we use preprocessor of length 128
# instead of full length 1024.
preprocessor = keras_hub.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_base_en",
    sequence_length=128,
)
gpt2_lm = keras_hub.models.GPT2CausalLM.from_preset(
    "gpt2_base_en", preprocessor=preprocessor
)

Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/config.json...


100%|██████████| 484/484 [00:00<00:00, 298kB/s]


Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/tokenizer.json...


100%|██████████| 448/448 [00:00<00:00, 589kB/s]


Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/assets/tokenizer/vocabulary.json...


100%|██████████| 0.99M/0.99M [00:00<00:00, 2.21MB/s]


Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/assets/tokenizer/merges.txt...


100%|██████████| 446k/446k [00:00<00:00, 1.48MB/s]


Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/model.weights.h5...


100%|██████████| 475M/475M [00:10<00:00, 47.6MB/s]


In [7]:
start = time.time()

output = gpt2_lm.generate("My trip to Yosemite was", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
My trip to Yosemite was the first I ever took. It took me about three hours to walk to Yosemite National Park. The trail was very steep. I was not sure how long I was going to take but it was worth it. The hike up to the top was a little steep.

The hike was a little longer than I expected but I was very happy with how it went.

The trail was pretty straight and pretty flat.

The view from the top of the canyon to the top of the canyon was pretty cool and I had a blast.

The hike up to the top was a little bit harder than I expected but I was able to make it through.

I had a great time on the climb down.

The views from the top of the canyon to the top of the canyon were a little bit different than I expected.

The views on the top of the canyon to the top of the canyon were a little different than I
TOTAL TIME ELAPSED: 13.79s


In [8]:
start = time.time()

output = gpt2_lm.generate("That Italian restaurant is", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
That Italian restaurant is a little different from the rest of the world, but it has a very good reputation for being the best. It's a small restaurant that serves Italian food and it is open from 8 am to 5 pm. It is a great place for family and friends to come and enjoy Italian food.

I am really impressed with the service at this restaurant. It is very friendly and very knowledgeable about Italian food.

I've been to a lot of Italian restaurants before but I've never been to a restaurant like this one. The service was very good. I'm very impressed by how the staff are friendly and knowledgeable. The service was good too. The staff was really attentive and knowledgeable. I really enjoyed the service.

This place is a great place to get your Italian food and I'm very glad that they have opened up. The food was great and I love the place. I've had the chicken soup for a while and I've always been a fan
TOTAL TIME ELAPSED: 2.23s


In [11]:
# BERT Example

import keras
import keras_hub
import numpy as np


In [12]:
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.BertClassifier.from_preset(
    "bert_base_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 24s/step - loss: 1.4449 - sparse_categorical_accuracy: 0.0000e+00
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8s/step - loss: 1.4651 - sparse_categorical_accuracy: 0.0000e+00


<keras.src.callbacks.history.History at 0x7b4ee2300760>

In [13]:
features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.BertClassifier.from_preset(
    "bert_base_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 27s/step - loss: 1.5023 - sparse_categorical_accuracy: 0.0000e+00


<keras.src.callbacks.history.History at 0x7b4ee22cbc10>

In [None]:
# YOLOV8 Example

In [8]:
! pip install keras_cv

Collecting keras_cv
  Downloading keras_cv-0.9.0-py3-none-any.whl.metadata (12 kB)
Collecting keras-core (from keras_cv)
  Downloading keras_core-0.1.7-py3-none-any.whl.metadata (4.3 kB)
Downloading keras_cv-0.9.0-py3-none-any.whl (650 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m650.7/650.7 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading keras_core-0.1.7-py3-none-any.whl (950 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: keras-core, keras_cv
Successfully installed keras-core-0.1.7 keras_cv-0.9.0


In [9]:
import keras_cv
import keras_core as keras
import tensorflow as tf


Using TensorFlow backend


In [10]:
input_data = tf.ones(shape=(8, 224, 224, 3))

# Pretrained backbone
model = keras_cv.models.YOLOV8Backbone.from_preset(
    "yolo_v8_l_backbone"
)
output = model(input_data)

# Randomly initialized backbone with a custom config
model = keras_cv.models.YOLOV8Backbone(
    stackwise_channels=[128, 256, 512, 1024],
    stackwise_depth=[3, 9, 9, 3],
    include_rescaling=False,
)
output = model(input_data)


Downloading from https://www.kaggle.com/api/v1/models/keras/yolov8/keras/yolo_v8_l_backbone/2/download/config.json...


100%|██████████| 633/633 [00:00<00:00, 395kB/s]


InvalidArgumentError: Exception encountered when calling Conv2D.call().

[1m{{function_node __wrapped__Conv2D_device_/job:localhost/replica:0/task:0/device:GPU:0}} No DNN in stream executor. [Op:Conv2D][0m

Arguments received by Conv2D.call():
  • inputs=tf.Tensor(shape=(8, 226, 226, 3), dtype=float32)

In [13]:
import keras_hub

In [15]:
image_size=1024
batch_size=2
input_data = {
    "images": np.ones(
        (batch_size, image_size, image_size, 3),
        dtype="float32",
    ),
    "points": np.ones((batch_size, 1, 2), dtype="float32"),
    "labels": np.ones((batch_size, 1), dtype="float32"),
    "boxes": np.ones((batch_size, 1, 2, 2), dtype="float32"),
    "masks": np.zeros(
        (batch_size, 0, image_size, image_size, 1)
    ),
}
sam = keras_hub.models.SAMImageSegmenter.from_preset('sam_base_sa1b')
outputs = sam.predict(input_data)
masks, iou_pred = outputs["masks"], outputs["iou_pred"]


FailedPreconditionError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main

  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code

  File "/usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py", line 37, in <module>

  File "/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py", line 992, in launch_instance

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.py", line 619, in start

  File "/usr/local/lib/python3.10/dist-packages/tornado/platform/asyncio.py", line 195, in start

  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever

  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once

  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run

  File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 685, in <lambda>

  File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 738, in _run_callback

  File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 825, in inner

  File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 786, in run

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 361, in process_one

  File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 234, in wrapper

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell

  File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 234, in wrapper

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 539, in execute_request

  File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 234, in wrapper

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py", line 302, in do_execute

  File "/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py", line 539, in run_cell

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 2975, in run_cell

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code

  File "<ipython-input-15-3449713ba5d4>", line 16, in <cell line: 16>

  File "/usr/local/lib/python3.10/dist-packages/keras_hub/src/utils/pipeline_model.py", line 220, in predict

  File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/trainer.py", line 510, in predict

  File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/trainer.py", line 208, in one_step_on_data_distributed

DNN library initialization failed. Look at the errors above for more details.
	 [[{{node StatefulPartitionedCall}}]] [Op:__inference_one_step_on_data_distributed_20637]