## Install KerasNLP, Choose Backend and Import Dependencies

This examples uses [Keras Core](https://keras.io/keras_core/) to work in any of
`"tensorflow"`, `"jax"` or `"torch"`. Support for Keras Core is baked into
KerasNLP, simply change the `"KERAS_BACKEND"` environment variable to select
the backend of your choice. We select the JAX backend below.

In [1]:
!pip install git+https://github.com/keras-team/keras-nlp.git -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for keras-nlp (pyproject.toml) ... [?25l[?25hdone


In [2]:
import os

os.environ["KERAS_BACKEND"] = "jax"

import keras_nlp
import tensorflow as tf
import keras_core as keras
import time

Using JAX backend.


## Pretrained Model

In [3]:
# To speed up training and generation, we use preprocessor of length 128
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_base_en",
    sequence_length=128,
)
gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_base_en", preprocessor=preprocessor
)

Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_base_en/v1/vocab.json
[1m1042301/1042301[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step       
Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_base_en/v1/merges.txt
[1m456318/456318[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step       
Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_base_en/v1/model.h5
[1m497986112/497986112[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 0us/step


In [4]:
output = gpt2_lm.generate("My Assignments are still pending for the submission because", max_length=200)
print("\nGPT-2 output:")
print(output)


GPT-2 output:
My Assignments are still pending for the submission because I am not sure if I will get my assignments done. So, I am going to go over all the information that I have available on the website to give you a better understanding about where I am going and how I am doing this. This information may be a little confusing, but I hope this information will help you to get to know me better.

My name is Daniel and I have a Bachelor of Science Degree.

This is my job as a Software Engineer.

I am also a Software Engineer in the Department of Engineering and Computer Science.

I am also a Software Engineer in the Department of Computer Science.

I am also a Software Engineer in the Department of Computer Science.

I am also a Software Engineer in the Department of Engineering and Computer Science.

I am also a Software Engineer in the Department of Computer Science.

I am also Software Engineer in the Department of Computer


In [5]:
output = gpt2_lm.generate("That Italian restaurant is", max_length=200)
print("\nGPT-2 output:")
print(output)


GPT-2 output:
That Italian restaurant is a little bit more interesting than I expected, but it's still a great one. The menu features the same menu items as it does in other Italian restaurants (and even the menu in the U.S.), but the menu has more of a modern twist on a classic Italian dish. I like the fact that there are two sides, one for the main course and one for the dessert. There are also three desserts in this menu that have a different texture from the dessert menu.

There was one thing that really stood out. It wasn't a big deal when it came to the dessert menu. The desserts are a bit different from the other menu items. The dessert menu was a little bit more of a mix of a traditional dessert and a dessert that I had never tried. I think this is because the dessert menu was more like the traditional one. There is a lot of different desserts in the dessert menu, but I think that this is because the dessert menu was


# More Pretrained Model in KerasNLP

[To Explore More](https://keras.io/api/keras_nlp/models/)

## Understanding Working of GPT2


The code of GPT2 can be found
[here](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gpt2/).
Conceptually the `GPT2CausalLM` can be hierarchically broken down into several
modules in KerasNLP, all of which have a *from_preset()* function that loads a
pretrained model:

- `keras_nlp.models.GPT2Tokenizer`: The tokenizer used by GPT2 model, which is a
    [byte-pair encoder](https://huggingface.co/course/chapter6/5?fw=pt).
- `keras_nlp.models.GPT2CausalLMPreprocessor`: the preprocessor used by GPT2
    causal LM training. It does the tokenization along with other preprocessing
    works such as creating the label and appending the end token.
- `keras_nlp.models.GPT2Backbone`: the GPT2 model, which is a stack of
    `keras_nlp.layers.TransformerDecoder`. This is usually just referred as
    `GPT2`.
- `keras_nlp.models.GPT2CausalLM`: wraps `GPT2Backbone`, it multiplies the
    output of `GPT2Backbone` by embedding matrix to generate logits over
    vocab tokens.

In [6]:
import tensorflow_datasets as tfds

reddit_ds = tfds.load("reddit_tifu", split="train", as_supervised=True)

Downloading and preparing dataset 639.54 MiB (download: 639.54 MiB, generated: 141.46 MiB, total: 781.00 MiB) to /root/tensorflow_datasets/reddit_tifu/short/1.1.2...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/79740 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/reddit_tifu/short/1.1.2.incompleteWJ6V8S/reddit_tifu-train.tfrecord*...:  …

Dataset reddit_tifu downloaded and prepared to /root/tensorflow_datasets/reddit_tifu/short/1.1.2. Subsequent calls will reuse this data.


# To learn more about the reddit tify dataset

[About Dataset](https://www.tensorflow.org/datasets/catalog/reddit_tifu)


document: text of the post.


title: the title.

In [7]:
for document, title in reddit_ds:
    print(document.numpy())
    print(title.numpy())
    break

b"me and a friend decided to go to the beach last sunday. we loaded up and headed out. we were about half way there when i decided that i was not leaving till i had seafood. \n\nnow i'm not talking about red lobster. no friends i'm talking about a low country boil. i found the restaurant and got directions. i don't know if any of you have heard about the crab shack on tybee island but let me tell you it's worth it. \n\nwe arrived and was seated quickly. we decided to get a seafood sampler for two and split it. the waitress bought it out on separate platters for us. the amount of food was staggering. two types of crab, shrimp, mussels, crawfish, andouille sausage, red potatoes, and corn on the cob. i managed to finish it and some of my friends crawfish and mussels. it was a day to be a fat ass. we finished paid for our food and headed to the beach. \n\nfunny thing about seafood. it runs through me faster than a kenyan \n\nwe arrived and walked around a bit. it was about 45min since we a

The .numpy() method is used to convert the data from TensorFlow tensors (if they are tensors) to NumPy arrays so that it can be easily printed.

One the first element


In [8]:
train_ds = (
    reddit_ds.map(lambda document, _: document)
    .batch(32)
    .cache()
    .prefetch(tf.data.AUTOTUNE)
)

Caching is used to speed up data access during training. Once a dataset is cached, it can be quickly retrieved without reprocessing it from the source.

Now you can finetune the model using the familiar *fit()* function. Note that
`preprocessor` will be automatically called inside `fit` method since
`GPT2CausalLM` is a `keras_nlp.models.Task` instance.

In [9]:
train_ds = train_ds.take(500)
num_epochs = 1

# Linearly decaying learning rate.
learning_rate = keras.optimizers.schedules.PolynomialDecay(
    5e-5,
    decay_steps=train_ds.cardinality() * num_epochs,
    end_learning_rate=0.0,
)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
gpt2_lm.compile(
    optimizer=keras.optimizers.Adam(learning_rate),
    loss=loss,
    weighted_metrics=["accuracy"],
)

gpt2_lm.fit(train_ds, epochs=num_epochs)

[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m509s[0m 953ms/step - accuracy: 0.3192 - loss: 3.3599


<keras_core.src.callbacks.history.History at 0x796bb031a4d0>

Sparse Categorical Cross-Entropy Loss, which is commonly used for text classification tasks. The from_logits=True argument indicates that the model's output consists of logits (raw predictions) rather than probabilities.

In [10]:
output = gpt2_lm.generate("I like basketball", max_length=200)
print("\nGPT-2 output:")
print(output)


GPT-2 output:
I like basketball, but i don't know how to handle the game. i've been a big fan of it since i was little and i'm still a big fan. my dad was a big fan and my mom was a little bit of a fan. i was really into basketball at the time and was always going out and getting a good deal.

so this summer i went to a basketball tournament with my brother and i got a scholarship. we were supposed to play basketball at the end of the summer, so i went home to go watch the game. my dad was a little bit worried about my dad being too close with my family and i didn't want him to be too close with the other kids


## Into the Sampling Method

In KerasNLP, we offer a few sampling methods, e.g., contrastive search,
Top-K and beam sampling. By default, our `GPT2CausalLM` uses Top-k search, but
you can choose your own sampling method.

Much like optimizer and activations, there are two ways to specify your custom
sampler:

- Use a string identifier, such as "greedy", you are using the default
configuration via this way.
- Pass a `keras_nlp.samplers.Sampler` instance, you can use custom configuration
via this way.

In [11]:
gpt2_lm.compile(sampler="top_k")
output = gpt2_lm.generate("I like basketball", max_length=200)
print("\nGPT-2 output:")
print(output)


GPT-2 output:
I like basketball, and it's fun, but the game isn't as fun. 

so i was playing in the morning, and i was in the middle of the court, and i saw a guy with a ball. i didn't notice him, but i saw him with a baseball bat and the ball in his hand. 

he looked at me, then he looked at me, but i just looked away, like i was trying to be funny, and he was just trying to get me to stop laughing, so i was laughing. 

so now i'm like "

GPT-2 output:
I like basketball, but i don't really like the game. 

so i was playing basketball at my local high school, and i was playing with my friends. 

i was playing with my friends, and i was playing with my brother, who was playing basketball with his brother. 

so i was playing with my brother, and he was playing with his brother's brother. 

so i was playing with my brother, and he was playing with his brother's brother's brother. 

so i was playing with my brother, and he was playing with his brother's brother's brother's brother's broth

In [None]:
# Use a `Sampler` instance. `GreedySampler` tends to repeat itself,
# It always picking up the token of the largest probability as the next token.
greedy_sampler = keras_nlp.samplers.GreedySampler()
gpt2_lm.compile(sampler=greedy_sampler)

output = gpt2_lm.generate("I like basketball", max_length=200)
print("\nGPT-2 output:")
print(output)

In [16]:
random_sampler = keras_nlp.samplers.RandomSampler()
gpt2_lm.compile(sampler=random_sampler)

output = gpt2_lm.generate("I like basketball", max_length=200)
print("\nGPT-2 output:")
print(output)


GPT-2 output:
I like basketball and love intersections between directions. even though my mom recently moved here, i'm still feeling a little steamy alot during the game (well about triple that new hubby referred to by his nickname). 
i send plenty of text via text and a technology indicating the expected traffic lights.  the other important thing is that my mom and pa were pitch perfect for a verbal spar with this guy who looked exactly like hadnbiel.  he was talking gibberish and generally came into ms.

as anyone who calls the guy "apparently." i listen to research. i'm clueless and my wife has a group of an evening on ed
