# Wine Review GPT

In this notebook, we'll walk through the steps required to train your own GPT model on the wine review dataset

The code is adapted from the recommended text book [Generative Deep Learning](https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/09_transformer/gpt/gpt.ipynb) 2nd Edition

<font color='red'>Change the runtime to GPU in the upper right corner for this exercise. </font>

## 0. Parameters <a name="parameters"></a>

In [None]:
%load_ext autoreload
%autoreload 2
import numpy as np
import json
import re
import string
from IPython.display import display, HTML

import tensorflow as tf
from tensorflow.keras import layers, models, losses, callbacks

In [None]:
VOCAB_SIZE = 10000
MAX_LEN = 80
EMBEDDING_DIM = 256 # dimensionality of the word embeddings
KEY_DIM = 256 # dimension of key/query in attention heads
N_HEADS = 2  # total number of heads in each transfomer block
FEED_FORWARD_DIM = 256
VALIDATION_SPLIT = 0.2
SEED = 42
LOAD_MODEL = False # train model from scratch
BATCH_SIZE = 32 # number of samples of inputs processed together
EPOCHS = 5 # number of iteration of entire dataset

## 1. Load the data <a name="load"></a>

In [None]:
# mount Google Drive

from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


<font color='red'> Instruction: replace the path with the correct one on your Google Drive. </font>

In [None]:
# Load the full dataset
with open("/content/winemag-data-130k-v2.json") as json_data:
  wine_data = json.load(json_data)

In [None]:
# print the first 3 records
wine_data[0:3]

[{'points': '87',
  'title': 'Nicosia 2013 Vulkà Bianco  (Etna)',
  'description': "Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",
  'taster_name': 'Kerin O’Keefe',
  'taster_twitter_handle': '@kerinokeefe',
  'price': None,
  'designation': 'Vulkà Bianco',
  'variety': 'White Blend',
  'region_1': 'Etna',
  'region_2': None,
  'province': 'Sicily & Sardinia',
  'country': 'Italy',
  'winery': 'Nicosia'},
 {'points': '87',
  'title': 'Quinta dos Avidagos 2011 Avidagos Red (Douro)',
  'description': "This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled out with juicy red berry fruits and freshened with acidity. It's  already drinkable, although it will certainly be better from 2016.",
  'taster_name': 'Roger Voss',
  'taster_twitter_handle': '@vossroger',
  'price': 15,
  'designation': 'Avidagos',
  'variety': 'Portugu

In [None]:
# Filter the dataset, only includes entries where country, province, variety and description are not None
filtered_data = [
    "wine review : "
    + x["country"]
    + " : "
    + x["province"]
    + " : "
    + x["variety"]
    + " : "
    + x["description"]
    for x in wine_data
    if x["country"] is not None
    and x["province"] is not None
    and x["variety"] is not None
    and x["description"] is not None
]

In [None]:
# Count the recipes
n_wines = len(filtered_data)
print(f"{n_wines} recipes loaded")

129907 recipes loaded


In [None]:
example = filtered_data[0]
print(example)

wine review : Italy : Sicily & Sardinia : White Blend : Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.


## 2. Tokenize the data <a name="tokenize"></a>

In [None]:
# Pad the punctuation, to treat them as separate 'words'
def pad_punctuation(s):
    s = re.sub(f"([{string.punctuation}, '\n'])", r" \1 ", s)
    s = re.sub(" +", " ", s)
    return s


text_data = [pad_punctuation(x) for x in filtered_data]

In [None]:
# Display an example of a recipe
example_data = text_data[0]
example_data

"wine review : Italy : Sicily & Sardinia : White Blend : Aromas include tropical fruit , broom , brimstone and dried herb . The palate isn ' t overly expressive , offering unripened apple , citrus and dried sage alongside brisk acidity . "

In [None]:
# Convert to a Tensorflow Dataset
text_ds = (
    tf.data.Dataset.from_tensor_slices(text_data)
    .batch(BATCH_SIZE)
    .shuffle(1000)
)

In [None]:
# Create a vectorization layer using https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization
vectorize_layer = layers.TextVectorization(
    standardize="lower",
    max_tokens=VOCAB_SIZE,
    output_mode="int",
    output_sequence_length=MAX_LEN + 1,
)

In [None]:
# Adapt the layer to the training set
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()

In [None]:
# Display some token:word mappings
for i, word in enumerate(vocab[:10]):
    print(f"{i}: {word}")

0: 
1: [UNK]
2: :
3: ,
4: .
5: and
6: the
7: wine
8: a
9: of


In [None]:
# Display the same example converted to ints
example_tokenised = vectorize_layer(example_data)
print(example_tokenised.numpy())

[   7   10    2   40    2  440  442  461    2   53   27    2   26  944
  236   22    3 2094    3 3774    5  114  130    4    6   28  982   18
  228 1109 1003    3  355    1   69    3   81    5  114  472  167  408
   30    4    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0]


## 3. Create the Training Set <a name="create"></a>

In [None]:
# Create the training set of recipes and the same text shifted by one word
# such that the predictions (y) for position i can depend only on the known outputs (x) at positions less than i
def prepare_inputs(text):
    text = tf.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y


train_ds = text_ds.map(prepare_inputs)

In [None]:
example_input_output = train_ds.take(1).get_single_element()

In [None]:
# Example Input
example_input_output[0][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([   7,   10,    2,  264,    2, 1327,    2,  119,    2,  148, 1045,
         33, 5527,   68,    3,   15,   41,    8, 1737, 8738,   25, 8541,
        347,    6,  570,   30,    5,    6,  596,    9, 1022,  262,    4,
       1001,    9, 1776,  368,    5, 1029,  468,   19,    6, 4946,    3,
        185,  113, 6069,   11,  158,    4,  754,   14, 8679,   33,   89,
       3249,    5,  134,    4,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

In [None]:
# Example Output (shifted by one token)
example_input_output[1][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([  10,    2,  264,    2, 1327,    2,  119,    2,  148, 1045,   33,
       5527,   68,    3,   15,   41,    8, 1737, 8738,   25, 8541,  347,
          6,  570,   30,    5,    6,  596,    9, 1022,  262,    4, 1001,
          9, 1776,  368,    5, 1029,  468,   19,    6, 4946,    3,  185,
        113, 6069,   11,  158,    4,  754,   14, 8679,   33,   89, 3249,
          5,  134,    4,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

## 4. Create the causal attention mask function <a name="causal"></a>

In [None]:
def causal_attention_mask(batch_size, n_dest, n_src, dtype):
    i = tf.range(n_dest)[:, None]
    j = tf.range(n_src)
    m = i >= j - n_src + n_dest
    mask = tf.cast(m, dtype)
    mask = tf.reshape(mask, [1, n_dest, n_src])
    mult = tf.concat(
        [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)], 0
    )
    return tf.tile(mask, mult)


np.transpose(causal_attention_mask(1, 10, 10, dtype=tf.int32)[0])

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]], dtype=int32)

## 5. Create a Transformer Block layer <a name="transformer"></a>

In [None]:
# MultiHeadAttention layer: https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention
# Dropout layer and Dense layer (feed-foward layer) as you learnt in AI Essentials course
# LayerNormalization: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization

class TransformerBlock(layers.Layer):
    def __init__(self, num_heads, key_dim, embed_dim, ff_dim, dropout_rate=0.1):
        super(TransformerBlock, self).__init__()
        self.num_heads = num_heads
        self.key_dim = key_dim
        self.embed_dim = embed_dim
        self.ff_dim = ff_dim
        self.dropout_rate = dropout_rate
        self.attn = layers.MultiHeadAttention(
            num_heads, key_dim, output_shape=embed_dim
        )
        self.dropout_1 = layers.Dropout(self.dropout_rate)
        self.ln_1 = layers.LayerNormalization(epsilon=1e-6)
        self.ffn_1 = layers.Dense(self.ff_dim, activation="relu")
        self.ffn_2 = layers.Dense(self.embed_dim)
        self.dropout_2 = layers.Dropout(self.dropout_rate)
        self.ln_2 = layers.LayerNormalization(epsilon=1e-6)

    def call(self, inputs):
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]
        causal_mask = causal_attention_mask(
            batch_size, seq_len, seq_len, tf.bool
        )
        attention_output, attention_scores = self.attn(
            inputs,
            inputs,
            attention_mask=causal_mask,
            return_attention_scores=True,
        )
        attention_output = self.dropout_1(attention_output)
        out1 = self.ln_1(inputs + attention_output)
        ffn_1 = self.ffn_1(out1)
        ffn_2 = self.ffn_2(ffn_1)
        ffn_output = self.dropout_2(ffn_2)
        return (self.ln_2(out1 + ffn_output), attention_scores)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "key_dim": self.key_dim,
                "embed_dim": self.embed_dim,
                "num_heads": self.num_heads,
                "ff_dim": self.ff_dim,
                "dropout_rate": self.dropout_rate,
            }
        )
        return config

## 6. Create the Token and Position Embedding <a name="embedder"></a>

In [None]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, max_len, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.max_len = max_len
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.token_emb = layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = layers.Embedding(input_dim=max_len, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "max_len": self.max_len,
                "vocab_size": self.vocab_size,
                "embed_dim": self.embed_dim,
            }
        )
        return config

## 7. Build the Transformer model <a name="transformer_decoder"></a>

In [None]:
inputs = layers.Input(shape=(None,), dtype=tf.int32)
x = TokenAndPositionEmbedding(MAX_LEN, VOCAB_SIZE, EMBEDDING_DIM)(inputs)
x, attention_scores = TransformerBlock(
    N_HEADS, KEY_DIM, EMBEDDING_DIM, FEED_FORWARD_DIM
)(x)
outputs = layers.Dense(VOCAB_SIZE, activation="softmax")(x)
gpt = models.Model(inputs=inputs, outputs=[outputs, attention_scores])
gpt.compile("adam", loss=[losses.SparseCategoricalCrossentropy(), None])

In [None]:
gpt.summary()

## 8. Train the Transformer <a name="train"></a>

In [None]:
# Create a TextGenerator checkpoint
# A checkpoint is a saved state of a model, specifically the values of model variables (weights and biases) at a given point during training.
class TextGenerator(callbacks.Callback):
    def __init__(self, index_to_word, top_k=10):
        self.index_to_word = index_to_word
        self.word_to_index = {
            word: index for index, word in enumerate(index_to_word)
        }

    def sample_from(self, probs, temperature):
        probs = probs ** (1 / temperature)
        probs = probs / np.sum(probs)
        return np.random.choice(len(probs), p=probs), probs

    def generate(self, start_prompt, max_tokens, temperature):
        start_tokens = [
            self.word_to_index.get(x, 1) for x in start_prompt.split()
        ]
        sample_token = None
        info = []
        while len(start_tokens) < max_tokens and sample_token != 0:
            x = np.array([start_tokens])
            y, att = self.model.predict(x, verbose=0)
            sample_token, probs = self.sample_from(y[0][-1], temperature)
            info.append(
                {
                    "prompt": start_prompt,
                    "word_probs": probs,
                    "atts": att[0, :, -1, :],
                }
            )
            start_tokens.append(sample_token)
            start_prompt = start_prompt + " " + self.index_to_word[sample_token]
        print(f"\ngenerated text:\n{start_prompt}\n")
        return info

    def on_epoch_end(self, epoch, logs=None):
        self.generate("wine review", max_tokens=80, temperature=1.0)

In [None]:
# Create a model save checkpoint
model_checkpoint_callback = callbacks.ModelCheckpoint(
    filepath="./checkpoint/checkpoint.weights.h5",
    save_weights_only=True,
    save_freq="epoch",
    verbose=0,
)

tensorboard_callback = callbacks.TensorBoard(log_dir="./logs")

# Tokenize starting prompt
text_generator = TextGenerator(vocab)

In [None]:
gpt.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback, tensorboard_callback, text_generator],
)

Epoch 1/5
[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - loss: 2.5849
generated text:
wine review : spain : northern spain : tempranillo : raisin , raisin and berry aromas feed into a woody , acidic palate that pave of sophistication and firm , with a candied finish . 

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m150s[0m 35ms/step - loss: 2.5848
Epoch 2/5
[1m4058/4060[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 24ms/step - loss: 1.9812
generated text:
wine review : us : california : tannat - cabernet sauvignon : this mingled with raisin and prune aromas last with time to know about time . after half of eight years , it ' s not fancy or tannic opened now , and instead it does not quite use many along the lines of astringency you can notice the cellar , and it will need time to develop . it won ' t find : impart a more time

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m208s[0m 38ms/step - loss: 1.9812
Epoch 3/

<keras.src.callbacks.history.History at 0x7f5d8e517190>

<font color='red'>Instruction: do not forget to replace the path.</font>

In [None]:
# Save the final model
gpt.save("/gpt.keras")

## 9. Generate text using the Transformer

In [None]:
def print_probs(info, vocab, top_k=5):
    for i in info:
        highlighted_text = []
        for word, att_score in zip(
            i["prompt"].split(), np.mean(i["atts"], axis=0)
        ):
            highlighted_text.append(
                '<span style="background-color:rgba(135,206,250,'
                + str(att_score / max(np.mean(i["atts"], axis=0)))
                + ');">'
                + word
                + "</span>"
            )
        highlighted_text = " ".join(highlighted_text)
        display(HTML(highlighted_text))

        word_probs = i["word_probs"]
        p_sorted = np.sort(word_probs)[::-1][:top_k]
        i_sorted = np.argsort(word_probs)[::-1][:top_k]
        for p, i in zip(p_sorted, i_sorted):
            print(f"{vocab[i]}:   \t{np.round(100*p,2)}%")
        print("--------\n")

In [None]:
info = text_generator.generate(
    "wine review : us", max_tokens=80, temperature=0.5
)


generated text:
wine review : us : california : chardonnay : this is a rich , simple chardonnay . it ' s an easy - drinking chardonnay , with simple pineapple and peach flavors , with a hint of oak . 



<font color='red'>Instruction: Now try it yourself with a different starting context, number of tokens, and temperature. </font>

In [None]:
info = text_generator.generate(
    "wine review : germany", max_tokens=50, temperature=0.5
)
print_probs(info, vocab)


generated text:
wine review : germany : mosel : riesling : waxy lemon - lime and lime flavors are accented by a lacy haze of acidity and a hint of sweetness , this off - dry riesling . it ' s a straightforward wine with a juicy , easy - drinking finish



::   	100.0%
grosso:   	0.0%
zealand:   	0.0%
-:   	0.0%
africa:   	0.0%
--------



mosel:   	77.69000244140625%
rheinhessen:   	13.609999656677246%
rheingau:   	4.869999885559082%
pfalz:   	3.2699999809265137%
nahe:   	0.49000000953674316%
--------



::   	100.0%
-:   	0.0%
grosso:   	0.0%
notes:   	0.0%
blend:   	0.0%
--------



riesling:   	99.98999786376953%
pinot:   	0.0%
weissburgunder:   	0.0%
grüner:   	0.0%
white:   	0.0%
--------



::   	100.0%
-:   	0.0%
grosso:   	0.0%
blend:   	0.0%
blanc:   	0.0%
--------



a:   	19.190000534057617%
this:   	15.989999771118164%
dusty:   	10.229999542236328%
whiffs:   	9.220000267028809%
while:   	8.520000457763672%
--------



lemon:   	68.47000122070312%
lanolin:   	8.430000305175781%
apple:   	4.920000076293945%
floral:   	3.9700000286102295%
peach:   	2.5299999713897705%
--------



and:   	94.19000244140625%
-:   	3.6600000858306885%
,:   	0.6800000071525574%
notes:   	0.3799999952316284%
zest:   	0.2800000011920929%
--------



lime:   	99.12000274658203%
blossom:   	0.4300000071525574%
pith:   	0.1899999976158142%
rind:   	0.05999999865889549%
zest:   	0.03999999910593033%
--------



and:   	91.97000122070312%
,:   	2.2899999618530273%
notes:   	2.009999990463257%
zest:   	1.4199999570846558%
acidity:   	0.9200000166893005%
--------



lemon:   	27.010000228881836%
tangerine:   	22.139999389648438%
grapefruit:   	14.899999618530273%
pink:   	5.309999942779541%
lime:   	4.789999961853027%
--------



flavors:   	55.06999969482422%
notes:   	26.229999542236328%
zest:   	5.179999828338623%
acidity:   	2.950000047683716%
tones:   	2.0299999713897705%
--------



are:   	89.66999816894531%
lend:   	5.260000228881836%
reverberate:   	1.840000033378601%
abound:   	1.159999966621399%
penetrate:   	0.7300000190734863%
--------



intensely:   	16.360000610351562%
accented:   	12.479999542236328%
offset:   	12.279999732971191%
concentrated:   	8.420000076293945%
delicately:   	5.559999942779541%
--------



by:   	89.69999694824219%
with:   	10.289999961853027%
in:   	0.009999999776482582%
this:   	0.0%
on:   	0.0%
--------



a:   	67.48999786376953%
saffron:   	10.5%
notes:   	4.349999904632568%
hints:   	3.490000009536743%
zesty:   	2.559999942779541%
--------



streak:   	47.0099983215332%
hint:   	9.109999656677246%
razor:   	5.940000057220459%
crush:   	5.360000133514404%
touch:   	4.489999771118164%
--------



haze:   	93.94999694824219%
sweetness:   	5.869999885559082%
veil:   	0.07999999821186066%
floral:   	0.019999999552965164%
,:   	0.009999999776482582%
--------



of:   	100.0%
.:   	0.0%
to:   	0.0%
,:   	0.0%
that:   	0.0%
--------



honey:   	24.510000228881836%
sugar:   	20.6200008392334%
sweetness:   	9.3100004196167%
saffron:   	6.949999809265137%
caramelized:   	4.03000020980835%
--------



in:   	59.619998931884766%
and:   	30.530000686645508%
on:   	3.509999990463257%
.:   	2.9700000286102295%
,:   	2.259999990463257%
--------



a:   	80.37000274658203%
minerality:   	6.039999961853027%
sweetness:   	1.2100000381469727%
savory:   	1.0700000524520874%
delicate:   	1.0299999713897705%
--------



crush:   	19.40999984741211%
hint:   	9.359999656677246%
lacy:   	6.940000057220459%
slightly:   	6.090000152587891%
whisper:   	5.340000152587891%
--------



of:   	100.0%
that:   	0.0%
.:   	0.0%
in:   	0.0%
to:   	0.0%
--------



sweetness:   	40.77000045776367%
honey:   	37.75%
saffron:   	6.679999828338623%
petrol:   	3.0%
minerality:   	1.3899999856948853%
--------



in:   	70.0%
.:   	14.050000190734863%
on:   	7.980000019073486%
,:   	5.340000152587891%
from:   	1.0800000429153442%
--------



this:   	84.56999969482422%
but:   	11.050000190734863%
and:   	1.3700000047683716%
the:   	0.6399999856948853%
which:   	0.6000000238418579%
--------



is:   	63.0099983215332%
wine:   	15.510000228881836%
off:   	12.010000228881836%
dry:   	6.949999809265137%
riesling:   	0.47999998927116394%
--------



-:   	100.0%
dry:   	0.0%
,:   	0.0%
to:   	0.0%
a:   	0.0%
--------



dry:   	100.0%
sweet:   	0.0%
smackingly:   	0.0%
[UNK]:   	0.0%
dryness:   	0.0%
--------



riesling:   	94.38999938964844%
wine:   	3.190000057220459%
,:   	1.8600000143051147%
-:   	0.3700000047683716%
spätlese:   	0.10999999940395355%
--------



.:   	99.83000183105469%
is:   	0.10999999940395355%
that:   	0.029999999329447746%
,:   	0.019999999552965164%
from:   	0.009999999776482582%
--------



it:   	93.41999816894531%
the:   	4.659999847412109%
:   	0.6800000071525574%
drink:   	0.4099999964237213%
a:   	0.1899999976158142%
--------



':   	99.7300033569336%
finishes:   	0.25%
is:   	0.019999999552965164%
should:   	0.0%
has:   	0.0%
--------



s:   	100.0%
ll:   	0.0%
[UNK]:   	0.0%
d:   	0.0%
drinks:   	0.0%
--------



a:   	74.6500015258789%
straightforward:   	3.640000104904175%
easy:   	2.1700000762939453%
not:   	1.9600000381469727%
fresh:   	1.7799999713897705%
--------



straightforward:   	51.779998779296875%
bit:   	10.1899995803833%
deeply:   	6.789999961853027%
refreshing:   	2.990000009536743%
refreshingly:   	2.940000057220459%
--------



,:   	63.47999954223633%
wine:   	22.530000686645508%
quaffer:   	9.760000228881836%
yet:   	2.119999885559082%
and:   	0.9300000071525574%
--------



for:   	27.0%
to:   	26.479999542236328%
with:   	19.219999313354492%
,:   	18.059999465942383%
that:   	8.199999809265137%
--------



a:   	97.16999816894531%
easy:   	0.7699999809265137%
an:   	0.5799999833106995%
just:   	0.3100000023841858%
its:   	0.28999999165534973%
--------



long:   	12.319999694824219%
brisk:   	8.149999618530273%
lacy:   	7.019999980926514%
lip:   	5.809999942779541%
touch:   	4.5%
--------



,:   	91.81999969482422%
finish:   	4.110000133514404%
lemon:   	0.6200000047683716%
mouthfeel:   	0.3799999952316284%
kick:   	0.3799999952316284%
--------



easy:   	17.149999618530273%
fresh:   	10.220000267028809%
juicy:   	8.989999771118164%
fruity:   	6.489999771118164%
sunny:   	6.179999828338623%
--------



-:   	85.76000213623047%
drinking:   	11.449999809265137%
finish:   	1.5299999713897705%
demeanor:   	0.9599999785423279%
,:   	0.1599999964237213%
--------



drinking:   	99.94999694824219%
quaffing:   	0.03999999910593033%
to:   	0.009999999776482582%
drink:   	0.0%
sipping:   	0.0%
--------



finish:   	64.33000183105469%
demeanor:   	11.039999961853027%
style:   	7.25%
riesling:   	5.179999828338623%
appeal:   	3.1700000762939453%
--------



## Homework Questions
1. Generate a wine review starting with 'wine review: italy', using 20 tokens and a temperature of 1.0. Print the probability distribution of the words.
2. Generate another wine review starting with 'wine review: italy', using 20 tokens and a temperature of 0.1. Print the probability distribution of the words.

In a Word document, analyze the first five generation steps at both temperatures:
1. What is the next-token distribution?
2. Which context words receive higher attention when generating this token?
3. How do different temperatures impact the next-token distribution?






In [None]:
info = text_generator.generate(
    "wine review : Italy", max_tokens=20, temperature=1.0
)
print_probs(info, vocab)


generated text:
wine review : Italy : compromised : white blend : a tart but disjointed at first , then take the



::   	99.95999908447266%
hills:   	0.009999999776482582%
-:   	0.0%
africa:   	0.0%
,:   	0.0%
--------



[UNK]:   	24.219999313354492%
switzerland:   	1.3600000143051147%
france:   	1.25%
cyprus:   	1.0700000524520874%
central:   	1.0299999713897705%
--------



::   	33.75%
by:   	17.399999618530273%
other:   	2.630000114440918%
islands:   	2.390000104904175%
[UNK]:   	1.6100000143051147%
--------



red:   	23.450000762939453%
pinot:   	9.670000076293945%
cabernet:   	8.40999984741211%
chardonnay:   	6.630000114440918%
rosé:   	5.679999828338623%
--------



blend:   	99.88999938964844%
kitchen:   	0.009999999776482582%
pepper:   	0.009999999776482582%
grape:   	0.009999999776482582%
white:   	0.009999999776482582%
--------



::   	100.0%
-:   	0.0%
of:   	0.0%
blanc:   	0.0%
grosso:   	0.0%
--------



this:   	34.36000061035156%
a:   	15.920000076293945%
the:   	7.420000076293945%
an:   	2.559999942779541%
made:   	2.3499999046325684%
--------



vegetal:   	7.730000019073486%
simple:   	5.809999942779541%
strange:   	3.450000047683716%
[UNK]:   	3.319999933242798%
blend:   	3.2699999809265137%
--------



,:   	57.70000076293945%
and:   	5.849999904632568%
wine:   	4.369999885559082%
blend:   	3.130000114440918%
apple:   	1.690000057220459%
--------



simple:   	10.460000038146973%
rather:   	2.9700000286102295%
somewhat:   	2.630000114440918%
challenged:   	2.2899999618530273%
dilute:   	2.119999885559082%
--------



wine:   	45.95000076293945%
,:   	16.549999237060547%
nose:   	10.5600004196167%
palate:   	4.260000228881836%
bouquet:   	1.3700000047683716%
--------



the:   	52.4900016784668%
first:   	19.860000610351562%
this:   	11.670000076293945%
[UNK]:   	1.6200000047683716%
present:   	1.0499999523162842%
--------



,:   	53.20000076293945%
sip:   	10.5600004196167%
impression:   	4.329999923706055%
and:   	2.180000066757202%
on:   	1.7899999618530273%
--------



this:   	35.79999923706055%
with:   	7.809999942779541%
the:   	5.53000020980835%
then:   	3.809999942779541%
and:   	3.7100000381469727%
--------



the:   	14.1899995803833%
opens:   	4.010000228881836%
[UNK]:   	3.190000057220459%
settles:   	3.0999999046325684%
turns:   	2.880000114440918%
--------



on:   	23.790000915527344%
over:   	19.489999771118164%
the:   	14.8100004196167%
a:   	11.399999618530273%
time:   	8.75%
--------



In [None]:
info = text_generator.generate(
    "wine review : Italy", max_tokens=20, temperature=0.1
)
print_probs(info, vocab)


generated text:
wine review : Italy : [UNK] : red blend : this is a blend of 50 % cabernet sauvignon and



::   	100.0%
hills:   	0.0%
-:   	0.0%
africa:   	0.0%
celebrated:   	0.0%
--------



[UNK]:   	100.0%
switzerland:   	0.0%
france:   	0.0%
cyprus:   	0.0%
central:   	0.0%
--------



::   	100.0%
[UNK]:   	0.0%
hills:   	0.0%
de:   	0.0%
,:   	0.0%
--------



red:   	99.94999694824219%
pinot:   	0.029999999329447746%
cabernet:   	0.019999999552965164%
riesling:   	0.0%
merlot:   	0.0%
--------



blend:   	100.0%
red:   	0.0%
::   	0.0%
wine:   	0.0%
[UNK]:   	0.0%
--------



::   	100.0%
celebrated:   	0.0%
coiled:   	0.0%
company:   	0.0%
confirms:   	0.0%
--------



this:   	99.97000122070312%
a:   	0.029999999329447746%
the:   	0.0%
made:   	0.0%
[UNK]:   	0.0%
--------



is:   	96.86000061035156%
blend:   	2.5799999237060547%
wine:   	0.5600000023841858%
[UNK]:   	0.0%
has:   	0.0%
--------



a:   	100.0%
an:   	0.0%
the:   	0.0%
one:   	0.0%
[UNK]:   	0.0%
--------



blend:   	100.0%
bright:   	0.0%
ripe:   	0.0%
soft:   	0.0%
full:   	0.0%
--------



of:   	100.0%
that:   	0.0%
with:   	0.0%
dominated:   	0.0%
based:   	0.0%
--------



50:   	99.94999694824219%
cabernet:   	0.05000000074505806%
80:   	0.0%
85:   	0.0%
70:   	0.0%
--------



%:   	100.0%
-:   	0.0%
/:   	0.0%
merlot:   	0.0%
cabernet:   	0.0%
--------



cabernet:   	100.0%
merlot:   	0.0%
syrah:   	0.0%
sangiovese:   	0.0%
grenache:   	0.0%
--------



sauvignon:   	100.0%
franc:   	0.0%
and:   	0.0%
,:   	0.0%
-:   	0.0%
--------



and:   	62.45000076293945%
,:   	37.54999923706055%
with:   	0.0%
(:   	0.0%
-:   	0.0%
--------

