# GPT

O código é adaptado [GPT tutorial](https://keras.io/examples/generative/text_generation_with_miniature_gpt/) criado por Apoorv Nandan.

Ref.: David Foster. Generative Deep Learning. O'Reilly Media; 2nd ed. 2023.

Link para o original: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/09_transformer/gpt/gpt.ipynb

In [None]:
%load_ext autoreload
%autoreload 2
import numpy as np
import json
import re
import string
from IPython.display import display, HTML

import tensorflow as tf
from tensorflow.keras import layers, models, losses, callbacks

## 0. Parameters <a name="parameters"></a>

In [None]:
VOCAB_SIZE = 10000
MAX_LEN = 80
EMBEDDING_DIM = 256
KEY_DIM = 256
N_HEADS = 2
FEED_FORWARD_DIM = 256
VALIDATION_SPLIT = 0.2
SEED = 42
LOAD_MODEL = False
BATCH_SIZE = 32
EPOCHS = 5

## 1. Load the data <a name="load"></a>

In [None]:
# Load the full dataset
with open("winemag-data-130k-v2.json") as json_data:
    wine_data = json.load(json_data)

In [None]:
len(wine_data)

129971

In [None]:
wine_data[10]

{'points': '87',
 'title': 'Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley)',
 'description': 'Soft, supple plum envelopes an oaky structure in this Cabernet, supported by 15% Merlot. Coffee and chocolate complete the picture, finishing strong at the end, resulting in a value-priced wine of attractive flavor and immediate accessibility.',
 'taster_name': 'Virginie Boone',
 'taster_twitter_handle': '@vboone',
 'price': 19,
 'designation': 'Mountain Cuvée',
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Kirkland Signature'}

In [None]:
# Filter the dataset
filtered_data = [
    "wine review : "
    + x["country"]
    + " : "
    + x["province"]
    + " : "
    + x["variety"]
    + " : "
    + x["description"]
    for x in wine_data
    if x["country"] is not None
    and x["province"] is not None
    and x["variety"] is not None
    and x["description"] is not None
]

In [None]:
# Count the recipes
n_wines = len(filtered_data)
print(f"{n_wines} recipes loaded")

129907 recipes loaded


In [None]:
example = filtered_data[25]
print(example)

wine review : US : California : Pinot Noir : Oak and earth intermingle around robust aromas of wet forest floor in this vineyard-designated Pinot that hails from a high-elevation site. Small in production, it offers intense, full-bodied raspberry and blackberry steeped in smoky spice and smooth texture.



## 2. Tokenize the data <a name="tokenize"></a>

In [None]:
# Pad the punctuation, to treat them as separate 'words'
def pad_punctuation(s):
    s = re.sub(f"([{string.punctuation}, '\n'])", r" \1 ", s)
    s = re.sub(" +", " ", s)
    return s


text_data = [pad_punctuation(x) for x in filtered_data]

In [None]:
# Display an example of a recipe
example_data = text_data[25]
example_data

'wine review : US : California : Pinot Noir : Oak and earth intermingle around robust aromas of wet forest floor in this vineyard - designated Pinot that hails from a high - elevation site . Small in production , it offers intense , full - bodied raspberry and blackberry steeped in smoky spice and smooth texture . '

In [None]:
# Convert to a Tensorflow Dataset
text_ds = (
    tf.data.Dataset.from_tensor_slices(text_data)
    .batch(BATCH_SIZE)
    .shuffle(1000)
)

In [None]:
# Create a vectorisation layer
vectorize_layer = layers.TextVectorization(
    standardize="lower",
    max_tokens=VOCAB_SIZE,
    output_mode="int",
    output_sequence_length=MAX_LEN + 1,
)

In [None]:
# Adapt the layer to the training set
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()

In [None]:
# Display some token:word mappings
for i, word in enumerate(vocab[:10]):
    print(f"{i}: {word}")

0: 
1: [UNK]
2: :
3: ,
4: .
5: and
6: the
7: wine
8: a
9: of


In [None]:
# Display the same example converted to ints
example_tokenised = vectorize_layer(example_data)
print(example_tokenised.numpy())

[   7   10    2   20    2   29    2   43   62    2   55    5  243 4145
  453  634   26    9  497  499  667   17   12  142   14 2214   43   25
 2484   32    8  223   14 2213  948    4  594   17  987    3   15   75
  237    3   64   14   82   97    5   74 2633   17  198   49    5  125
   77    4    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0]


## 3. Create the Training Set <a name="create"></a>

In [None]:
# Create the training set of recipes and the same text shifted by one word
def prepare_inputs(text):
    text = tf.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y


train_ds = text_ds.map(prepare_inputs)

In [None]:
example_input_output = train_ds.take(1).get_single_element()

In [None]:
# Example Input
example_input_output[0][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([   7,   10,    2,   40,    2,  404,   40,    2,   53,   27,    2,
        128,   11,  114, 6494,    5,  119,  294,    3,   12,   72,    7,
         75,   26,    9,   38,  240,    3,  257,   69,    5,  688,  288,
          4,    6,   28,  211,  114, 1110,   22,    3,  151,  277,    5,
         86,    4,   15, 1174,   23,    8,  141,    9, 1360,  217,    4,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

In [None]:
# Example Output (shifted by one token)
example_input_output[1][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([  10,    2,   40,    2,  404,   40,    2,   53,   27,    2,  128,
         11,  114, 6494,    5,  119,  294,    3,   12,   72,    7,   75,
         26,    9,   38,  240,    3,  257,   69,    5,  688,  288,    4,
          6,   28,  211,  114, 1110,   22,    3,  151,  277,    5,   86,
          4,   15, 1174,   23,    8,  141,    9, 1360,  217,    4,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

## 5. Create the causal attention mask function <a name="causal"></a>

In [None]:
def causal_attention_mask(batch_size, n_dest, n_src, dtype):
    i = tf.range(n_dest)[:, None]
    j = tf.range(n_src)
    m = i >= j - n_src + n_dest
    mask = tf.cast(m, dtype)
    mask = tf.reshape(mask, [1, n_dest, n_src])
    mult = tf.concat(
        [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)], 0
    )
    return tf.tile(mask, mult)


np.transpose(causal_attention_mask(1, 10, 10, dtype=tf.int32)[0])

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]], dtype=int32)

## 6. Create a Transformer Block layer <a name="transformer"></a>

In [None]:
class TransformerBlock(layers.Layer):
    def __init__(self, num_heads, key_dim, embed_dim, ff_dim, dropout_rate=0.1):
        super(TransformerBlock, self).__init__()
        self.num_heads = num_heads
        self.key_dim = key_dim
        self.embed_dim = embed_dim
        self.ff_dim = ff_dim
        self.dropout_rate = dropout_rate
        self.attn = layers.MultiHeadAttention(
            num_heads, key_dim, output_shape=embed_dim
        )
        self.dropout_1 = layers.Dropout(self.dropout_rate)
        self.ln_1 = layers.LayerNormalization(epsilon=1e-6)
        self.ffn_1 = layers.Dense(self.ff_dim, activation="relu")
        self.ffn_2 = layers.Dense(self.embed_dim)
        self.dropout_2 = layers.Dropout(self.dropout_rate)
        self.ln_2 = layers.LayerNormalization(epsilon=1e-6)

    def call(self, inputs):
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]
        causal_mask = causal_attention_mask(
            batch_size, seq_len, seq_len, tf.bool
        )
        attention_output, attention_scores = self.attn(
            inputs,
            inputs,
            attention_mask=causal_mask,
            return_attention_scores=True,
        )
        attention_output = self.dropout_1(attention_output)
        out1 = self.ln_1(inputs + attention_output)
        ffn_1 = self.ffn_1(out1)
        ffn_2 = self.ffn_2(ffn_1)
        ffn_output = self.dropout_2(ffn_2)
        return (self.ln_2(out1 + ffn_output), attention_scores)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "key_dim": self.key_dim,
                "embed_dim": self.embed_dim,
                "num_heads": self.num_heads,
                "ff_dim": self.ff_dim,
                "dropout_rate": self.dropout_rate,
            }
        )
        return config

## 7. Create the Token and Position Embedding <a name="embedder"></a>

In [None]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, max_len, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.max_len = max_len
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.token_emb = layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = layers.Embedding(input_dim=max_len, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "max_len": self.max_len,
                "vocab_size": self.vocab_size,
                "embed_dim": self.embed_dim,
            }
        )
        return config

## 8. Build the Transformer model <a name="transformer_decoder"></a>

In [None]:
inputs = layers.Input(shape=(None,), dtype=tf.int32)
x = TokenAndPositionEmbedding(MAX_LEN, VOCAB_SIZE, EMBEDDING_DIM)(inputs)
x, attention_scores = TransformerBlock(
    N_HEADS, KEY_DIM, EMBEDDING_DIM, FEED_FORWARD_DIM
)(x)
outputs = layers.Dense(VOCAB_SIZE, activation="softmax")(x)
gpt = models.Model(inputs=inputs, outputs=[outputs, attention_scores])
gpt.compile("adam", loss=[losses.SparseCategoricalCrossentropy(), None])

In [None]:
gpt.summary()

In [None]:
if LOAD_MODEL:
    # model.load_weights('./models/model')
    gpt = models.load_model("./models/gpt", compile=True)

## 9. Train the Transformer <a name="train"></a>

In [None]:
# Create a TextGenerator checkpoint
class TextGenerator(callbacks.Callback):
    def __init__(self, index_to_word, top_k=10):
        self.index_to_word = index_to_word
        self.word_to_index = {
            word: index for index, word in enumerate(index_to_word)
        }

    def sample_from(self, probs, temperature):
        probs = probs ** (1 / temperature)
        probs = probs / np.sum(probs)
        return np.random.choice(len(probs), p=probs), probs

    def generate(self, start_prompt, max_tokens, temperature):
        start_tokens = [
            self.word_to_index.get(x, 1) for x in start_prompt.split()
        ]
        sample_token = None
        info = []
        while len(start_tokens) < max_tokens and sample_token != 0:
            x = np.array([start_tokens])
            y, att = self.model.predict(x, verbose=0)
            sample_token, probs = self.sample_from(y[0][-1], temperature)
            info.append(
                {
                    "prompt": start_prompt,
                    "word_probs": probs,
                    "atts": att[0, :, -1, :],
                }
            )
            start_tokens.append(sample_token)
            start_prompt = start_prompt + " " + self.index_to_word[sample_token]
        print(f"\ngenerated text:\n{start_prompt}\n")
        return info

    def on_epoch_end(self, epoch, logs=None):
        self.generate("wine review", max_tokens=80, temperature=1.0)

In [None]:
# Create a model save checkpoint
model_checkpoint_callback = callbacks.ModelCheckpoint(
    filepath="./checkpoint/checkpoint.weights.h5",
    save_weights_only=True,
    save_freq="epoch",
    verbose=0,
)

tensorboard_callback = callbacks.TensorBoard(log_dir="./logs")

# Tokenize starting prompt
text_generator = TextGenerator(vocab)

In [None]:
gpt.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback, tensorboard_callback, text_generator],
)

Epoch 1/5
[1m4058/4060[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 22ms/step - loss: 2.5890
generated text:
wine review : us : california : cabernet sauvignon : this is a blend to cabernet sauvignon , so marked in acidity , and you can ' t open it . ripe and fruity , powerful and almost sweet , with brisk acidity that gives lusciousness . there is balance that it ' s some oaky , jammy blackberry and black currant flavors best things . 

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m169s[0m 38ms/step - loss: 2.5887
Epoch 2/5
[1m4059/4060[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 21ms/step - loss: 1.9746
generated text:
wine review : italy : piedmont : dolcetto : simple and tight with a luminous ruby color , this is a wine and tannic with a mineral composition and delicate mouthfeel . the wine ' s personality is ready to drink . 

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 22ms/step - loss: 1.9746
Epoch 3/5
[1m4060/4060

<keras.src.callbacks.history.History at 0x7f5e44d25210>

In [None]:
# Save the final model
gpt.save("gpt.keras")

# 3. Generate text using the Transformer

In [None]:
def print_probs(info, vocab, top_k=5):
    for i in info:
        highlighted_text = []
        for word, att_score in zip(
            i["prompt"].split(), np.mean(i["atts"], axis=0)
        ):
            highlighted_text.append(
                '<span style="background-color:rgba(135,206,250,'
                + str(att_score / max(np.mean(i["atts"], axis=0)))
                + ');">'
                + word
                + "</span>"
            )
        highlighted_text = " ".join(highlighted_text)
        display(HTML(highlighted_text))

        word_probs = i["word_probs"]
        p_sorted = np.sort(word_probs)[::-1][:top_k]
        i_sorted = np.argsort(word_probs)[::-1][:top_k]
        for p, i in zip(p_sorted, i_sorted):
            print(f"{vocab[i]}:   \t{np.round(100*p,2)}%")
        print("--------\n")

**Gerando texto**

Podemos gerar um novo texto aplicando o seguinte processo:

* Alimente a rede com uma sequência de palavras existente e peça para prever a palavra seguinte.

* Anexe esta palavra à sequência existente e repita.

A rede produzirá um conjunto de probabilidades para cada palavra da qual podemos fazer uma amostra

In [None]:
info = text_generator.generate(
    "wine review : us", max_tokens=80, temperature=1.0
)


generated text:
wine review : us : new york : saperavi : while tart blackberry , cherry and black plum shine with savory overtones and savory olive tones permeate this full - bodied cabernet franc . the power - like sheen on the palate , it ' s beautifully melding a fine - grained but penetrating , persistent persistent tannins and will finish across sunny linger long on the finish . 



In [None]:
info = text_generator.generate(
    "wine review : italy", max_tokens=80, temperature=0.5
)


generated text:
wine review : italy : tuscany : sangiovese : this is a bold , jammy wine with loads of ripe fruit intensity and a bright , candied personality . it ' s a very satisfying wine with a soft , silky texture and a lingering finish . 



Ambos são semelhantes a uma crítica de vinhos do conjunto de treinamento original.

Ambos abrem com a região e o tipo de vinho, e o tipo de vinho permanece consistente ao longo da passagem (por exemplo, não muda de cor no meio).

O texto gerado com temperatura 1,0 é mais aventureiro e, portanto, menos preciso do que o exemplo com temperatura 0,5.

A geração de múltiplas amostras com temperatura 1,0 levará, portanto, a mais variedade, pois o modelo amostra uma distribuição de probabilidade com maior variância.

In [None]:
info = text_generator.generate(
    "wine review : germany", max_tokens=80, temperature=0.5
)
print_probs(info, vocab)


generated text:
wine review : germany : mosel : riesling : an apt name for this reserve eiswein , an opulent auslese , a luscious , sweet gold color , its honey and apricot flavors touched by saffron and honey . soft on the palate , spice integrated in the mouth , finishing long and gloriously long . begin to drink through 2025 . 



::   	100.0%
grosso:   	0.0%
zealand:   	0.0%
-:   	0.0%
africa:   	0.0%
--------



mosel:   	73.78%
rheinhessen:   	9.98%
rheingau:   	8.37%
baden:   	2.52%
pfalz:   	1.45%
--------



::   	99.2%
-:   	0.73%
grosso:   	0.04%
blanc:   	0.02%
laurent:   	0.0%
--------



riesling:   	98.12%
weissburgunder:   	0.28%
mosel:   	0.19%
pinot:   	0.15%
grüner:   	0.11%
--------



::   	99.84%
-:   	0.07%
grosso:   	0.05%
blanc:   	0.03%
neagra:   	0.0%
--------



a:   	8.02%
while:   	5.55%
this:   	3.97%
dusty:   	3.06%
the:   	2.57%
--------



intensely:   	14.39%
earthen:   	7.18%
earthy:   	3.52%
ethereally:   	3.13%
off:   	2.93%
--------



word:   	24.23%
name:   	18.14%
moniker:   	8.25%
wine:   	5.07%
comparison:   	2.74%
--------



for:   	82.03%
to:   	4.27%
,:   	1.81%
and:   	1.81%
of:   	1.38%
--------



this:   	70.53%
a:   	8.09%
[UNK]:   	3.43%
the:   	2.82%
an:   	2.12%
--------



wine:   	19.46%
auslese:   	5.89%
[UNK]:   	5.53%
intensely:   	4.46%
riesling:   	2.74%
--------



riesling:   	31.94%
,:   	25.33%
auslese:   	6.71%
wine:   	5.02%
is:   	3.86%
--------



,:   	53.39%
is:   	16.84%
.:   	6.92%
(:   	4.66%
that:   	1.51%
--------



this:   	12.3%
but:   	10.75%
with:   	9.21%
it:   	6.3%
is:   	5.61%
--------



[UNK]:   	8.96%
auslese:   	6.9%
supersweet:   	5.58%
impressive:   	3.07%
ethereally:   	2.91%
--------



auslese:   	38.27%
tba:   	17.59%
,:   	13.32%
wine:   	8.78%
riesling:   	3.74%
--------



,:   	56.77%
.:   	17.18%
that:   	6.9%
of:   	1.48%
with:   	1.04%
--------



but:   	12.55%
with:   	7.84%
weighing:   	5.86%
it:   	4.85%
dripping:   	4.19%
--------



powerhouse:   	16.6%
[UNK]:   	4.5%
wine:   	2.04%
bit:   	1.86%
tba:   	1.63%
--------



,:   	43.38%
mouthfeel:   	3.4%
mix:   	3.02%
wine:   	3.01%
haze:   	2.36%
--------



unctuous:   	9.18%
opulent:   	6.67%
concentrated:   	6.61%
honeyed:   	6.48%
almost:   	3.8%
--------



wine:   	20.82%
-:   	7.16%
,:   	6.18%
riesling:   	5.87%
botrytized:   	5.52%
--------



color:   	49.54%
-:   	12.84%
,:   	4.06%
and:   	3.79%
icewine:   	3.51%
--------



.:   	35.85%
and:   	23.03%
,:   	17.73%
that:   	3.43%
of:   	2.68%
--------



with:   	9.01%
and:   	7.06%
but:   	5.02%
it:   	4.51%
this:   	3.38%
--------



honey:   	15.65%
honeyed:   	4.3%
[UNK]:   	4.21%
apricot:   	3.08%
flavors:   	2.7%
--------



and:   	54.75%
,:   	25.62%
-:   	10.19%
[UNK]:   	0.97%
aromas:   	0.63%
--------



marmalade:   	16.78%
apricot:   	6.96%
honey:   	5.05%
botrytis:   	4.27%
peach:   	3.85%
--------



jam:   	26.68%
nectar:   	17.3%
flavors:   	12.52%
preserves:   	12.37%
aromas:   	5.46%
--------



.:   	16.62%
are:   	14.49%
,:   	11.16%
[UNK]:   	3.82%
glazed:   	2.95%
--------



by:   	39.4%
with:   	35.99%
up:   	14.66%
off:   	0.69%
through:   	0.64%
--------



honey:   	32.6%
hints:   	3.43%
a:   	3.36%
sweet:   	3.28%
botrytis:   	3.22%
--------



and:   	48.74%
.:   	19.2%
,:   	17.92%
notes:   	2.05%
spice:   	1.38%
--------



honey:   	31.46%
spice:   	12.89%
caramel:   	3.89%
spices:   	3.56%
marmalade:   	1.94%
--------



.:   	53.62%
,:   	19.46%
notes:   	3.45%
[UNK]:   	1.75%
on:   	1.74%
--------



it:   	22.47%
the:   	13.61%
there:   	4.05%
a:   	2.49%
honeyed:   	1.95%
--------



,:   	36.28%
and:   	13.04%
on:   	8.96%
spoken:   	7.46%
in:   	3.52%
--------



the:   	95.3%
its:   	2.41%
a:   	0.68%
it:   	0.16%
sweet:   	0.08%
--------



palate:   	40.59%
finish:   	26.87%
midpalate:   	10.93%
attack:   	2.92%
nose:   	2.77%
--------



,:   	83.78%
and:   	2.93%
with:   	1.85%
are:   	1.4%
is:   	1.18%
--------



it:   	22.81%
but:   	9.34%
yet:   	6.54%
with:   	4.61%
there:   	2.25%
--------



and:   	50.86%
,:   	19.79%
notes:   	4.7%
tones:   	3.42%
-:   	3.16%
--------



and:   	25.78%
with:   	19.27%
into:   	11.3%
,:   	9.5%
neatly:   	3.92%
--------



the:   	35.66%
terms:   	6.64%
a:   	5.88%
sugar:   	4.62%
oak:   	2.74%
--------



midpalate:   	27.3%
mouth:   	19.32%
background:   	12.31%
form:   	5.27%
finish:   	4.61%
--------



,:   	45.74%
and:   	25.51%
.:   	12.08%
with:   	7.52%
yet:   	1.41%
--------



it:   	17.42%
with:   	10.59%
finishing:   	7.85%
but:   	5.32%
yet:   	3.98%
--------



long:   	68.06%
with:   	16.02%
on:   	1.72%
sweet:   	1.67%
dry:   	1.41%
--------



and:   	63.37%
,:   	16.52%
.:   	7.82%
with:   	6.65%
on:   	2.25%
--------



spicy:   	9.87%
long:   	5.83%
dry:   	3.36%
lingering:   	3.09%
rich:   	2.88%
--------



long:   	79.11%
lingering:   	1.92%
honeyed:   	1.9%
ripe:   	1.31%
sweet:   	1.06%
--------



.:   	79.22%
,:   	9.03%
and:   	5.77%
on:   	2.94%
with:   	0.61%
--------



:   	41.29%
drink:   	23.9%
it:   	5.77%
a:   	1.81%
the:   	1.59%
--------



drinking:   	31.61%
to:   	24.63%
with:   	9.36%
a:   	4.1%
,:   	2.94%
--------



throw:   	12.14%
describe:   	8.72%
drink:   	7.82%
open:   	7.67%
be:   	6.65%
--------



now:   	62.37%
up:   	5.49%
.:   	2.49%
from:   	2.47%
over:   	2.45%
--------



2019:   	13.11%
2025:   	12.99%
2018:   	11.9%
2021:   	9.31%
2027:   	8.73%
--------



.:   	96.7%
and:   	0.86%
,:   	0.8%
or:   	0.58%
to:   	0.32%
--------



:   	99.12%
imported:   	0.3%
drink:   	0.06%
it:   	0.05%
.:   	0.03%
--------



Ref.: David Foster. Generative Deep Learning. O'Reilly Media; 2nd ed. 2023.

Link para o original: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/09_transformer/gpt/gpt.ipynb