# 10. IMDB embeddings
_Exercise: In this exercise you will download a dataset, split it, create a `tf.data.Dataset` to load it and preprocess it efficiently, then build and train a binary classification model containing an `Embedding` layer._

In [129]:
import tarfile

import urllib.request

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

import tensorflow as tf
tf.random.set_seed(42)

import tensorflow_datasets as tfds

from keras.api.models import Sequential
from keras.api.layers import Input, Dense, TextVectorization, Embedding, Lambda

## a.
_Exercise: Download the [Large Movie Review Dataset](https://homl.info/imdb), which contains 50,000 movies reviews from the [Internet Movie Database](https://imdb.com/). The data is organized in two directories, `train` and `test`, each containing a `pos` subdirectory with 12,500 positive reviews and a `neg` subdirectory with 12,500 negative reviews. Each review is stored in a separate text file. There are other files and folders (including preprocessed bag-of-words), but we will ignore them in this exercise._

In [None]:
# URL of the dataset
url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
filename = "aclImdb_v1.tar.gz"

# Download the dataset
urllib.request.urlretrieve(url, filename)

# Unzip the dataset
with tarfile.open(filename, "r:gz") as tar:
    tar.extractall()

In [105]:
path_test_positive = os.path.join("aclImdb", "test", "pos", "*.txt")
path_test_negative = os.path.join("aclImdb", "test", "neg", "*.txt")
path_train_positive = os.path.join("aclImdb", "train", "pos", "*.txt")
path_train_negative = os.path.join("aclImdb", "train", "neg", "*.txt")

filepath_dataset_test_positive = tf.data.Dataset.list_files(path_test_positive, seed=42)
filepath_dataset_test_negative = tf.data.Dataset.list_files(path_test_negative, seed=42)
filepath_dataset_train_positive = tf.data.Dataset.list_files(path_train_positive, seed=42)
filepath_dataset_train_negative = tf.data.Dataset.list_files(path_train_negative, seed=42)

In [None]:
print(len(list(filepath_dataset_test_positive)))
print(len(list(filepath_dataset_test_negative)))
print(len(list(filepath_dataset_train_positive)))
print(len(list(filepath_dataset_train_negative)))

12500


2025-02-13 11:11:37.886317: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


12500
12500
12500


In [None]:
textline_dataset_test_positive = filepath_dataset_test_positive.interleave(
    lambda filepath: tf.data.TextLineDataset(filepath),
    cycle_length=5,
    num_parallel_calls=5,
).map(lambda textline: (textline, 1))

textline_dataset_test_negative = filepath_dataset_test_negative.interleave(
    lambda filepath: tf.data.TextLineDataset(filepath),
    cycle_length=5,
    num_parallel_calls=5,
).map(lambda textline: (textline, 0))

textline_dataset_train_positive = filepath_dataset_train_positive.interleave(
    lambda filepath: tf.data.TextLineDataset(filepath),
    cycle_length=5,
    num_parallel_calls=5,
).map(lambda textline: (textline, 1))

textline_dataset_train_negative = filepath_dataset_train_negative.interleave(
    lambda filepath: tf.data.TextLineDataset(filepath),
    cycle_length=5,
    num_parallel_calls=5,
).map(lambda textline: (textline, 0))

In [45]:
for X, y in textline_dataset_test_positive.take(1):
    print(X)
    print(y)

tf.Tensor(b'For a long time, this was my favorite of the Batman films. It had the best cinematography and an edgy feel to it with two wild characters - Catwomen and The Penguin - along with the always-interesting Christopher Walken. However, after the last viewing it finally slipped in my ratings and, frankly, I now prefer the last Batman: Batman Begins, with Christian Bale.<br /><br />THE GOOD - Nonetheless, this is still the most intriguing of the five latter-day Batman films. The stylish cinematography in here is the best of any of the Batman movies. Director Tim Burton is known for his films which feature stunning visuals, as this is a great example. The three characters listed above are all very different and very interesting, almost fascinating. Of the villains, I preferred Catwomen, finding her the most fun to watch before and after she changed. Violence is not overdone here as it was in several of the other Batman stories but one is never bored watching this. As he did in the f

In [140]:
textline_dataset_test = textline_dataset_test_positive.concatenate(textline_dataset_test_negative).shuffle(25_000, seed=42)
textline_dataset_train = textline_dataset_train_positive.concatenate(textline_dataset_train_negative).shuffle(25_000, seed=42)

In [58]:
print(len(list(textline_dataset_test)))
print(len(list(textline_dataset_train)))

10000
25000


## b.
_Exercise: Split the test set into a validation set (15,000) and a test set (10,000)._

In [141]:
textline_dataset_valid = textline_dataset_test.take(15_000)
textline_dataset_test = textline_dataset_test.skip(15_000)

In [55]:
print(len(list(textline_dataset_valid)))
print(len(list(textline_dataset_test)))

15000
10000


## c.
_Exercise: Use tf.data to create an efficient dataset for each set._

In [138]:
textline_dataset_train = textline_dataset_train.cache().batch(32).prefetch(1)
textline_dataset_valid = textline_dataset_valid.cache().batch(32).prefetch(1)
textline_dataset_test = textline_dataset_test.cache().batch(32).prefetch(1)

In [142]:
%timeit -r1 for X, y in textline_dataset_train.repeat(10): pass

1min 44s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## d.
_Exercise: Create a binary classification model, using a `TextVectorization` layer to preprocess each review._

In [63]:
text_vectorization = TextVectorization(
    max_tokens=1000, output_mode="tf_idf"
)
text_vectorization.adapt(textline_dataset_train.map(lambda X, y: X))

In [64]:
text_vectorization.get_vocabulary()[:10]

['[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'it', 'i']

In [110]:
model = Sequential()
model.add(Input(shape=(1,), dtype=tf.string))
model.add(text_vectorization)
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

In [111]:
model.compile(optimizer="nadam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

In [112]:
model.fit(textline_dataset_train, validation_data=textline_dataset_valid, epochs=5)

Epoch 1/5
    779/Unknown [1m18s[0m 11ms/step - accuracy: 0.7779 - loss: 0.4920

2025-02-13 13:43:51.749002: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11882621803753019566
2025-02-13 13:43:51.749075: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13272740597450482464
2025-02-13 13:43:51.749090: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8593779601924284138
2025-02-13 13:43:51.749097: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13151749340778478445
2025-02-13 13:43:51.749104: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15837268228097035399
2025-02-13 13:43:51.749111: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 7419765789010437505


[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 25ms/step - accuracy: 0.7781 - loss: 0.4917 - val_accuracy: 0.8014 - val_loss: 0.4919
Epoch 2/5


2025-02-13 13:44:02.858438: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9426738433860876300
2025-02-13 13:44:02.858524: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 919455969698832656
2025-02-13 13:44:02.858558: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12445609544603847564
2025-02-13 13:44:02.858591: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 16081601474425598988
2025-02-13 13:44:02.858626: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8702364042601946239
2025-02-13 13:44:02.858670: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9203350541694547139


[1m780/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 11ms/step - accuracy: 0.8437 - loss: 0.3797

2025-02-13 13:44:18.076723: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11882621803753019566
2025-02-13 13:44:18.076790: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8593779601924284138
2025-02-13 13:44:18.076806: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5867249119626194961
2025-02-13 13:44:18.076812: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3399893970510311459
2025-02-13 13:44:18.076820: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 7419765789010437505
2025-02-13 13:44:18.076826: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13151749340778478445
2025-02-13 13:44:18.076833: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv i

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 25ms/step - accuracy: 0.8437 - loss: 0.3797 - val_accuracy: 0.8397 - val_loss: 0.3861
Epoch 3/5


2025-02-13 13:44:29.143166: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 919455969698832656
2025-02-13 13:44:29.143236: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 16081601474425598988
2025-02-13 13:44:29.143258: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3545296247262847257
2025-02-13 13:44:29.143281: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2361374691200963381
2025-02-13 13:44:29.143293: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9203350541694547139
2025-02-13 13:44:29.143302: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12750156007937155043
2025-02-13 13:44:29.143316: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv it

[1m780/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.8601 - loss: 0.3354

2025-02-13 13:44:47.036984: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11882621803753019566
2025-02-13 13:44:47.037084: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13272740597450482464
2025-02-13 13:44:47.037104: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8593779601924284138
2025-02-13 13:44:47.037230: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13151749340778478445
2025-02-13 13:44:47.037263: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15837268228097035399
2025-02-13 13:44:47.037312: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 7419765789010437505


[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 28ms/step - accuracy: 0.8601 - loss: 0.3354 - val_accuracy: 0.8271 - val_loss: 0.4124
Epoch 4/5


2025-02-13 13:44:59.433733: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9426738433860876300
2025-02-13 13:44:59.433794: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 919455969698832656
2025-02-13 13:44:59.433806: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12445609544603847564
2025-02-13 13:44:59.433841: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 16081601474425598988
2025-02-13 13:44:59.433882: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8702364042601946239
2025-02-13 13:44:59.433931: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9203350541694547139


[1m780/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.8862 - loss: 0.2870

2025-02-13 13:45:18.531822: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
2025-02-13 13:45:18.531896: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5867249119626194961
2025-02-13 13:45:18.531913: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3399893970510311459
2025-02-13 13:45:18.531921: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 7419765789010437505
2025-02-13 13:45:18.531929: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13151749340778478445
2025-02-13 13:45:18.531936: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15837268228097035399
2025-02-13 13:45:18.531943: I tensorflow/core/framework/local_rendezv

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 29ms/step - accuracy: 0.8861 - loss: 0.2870 - val_accuracy: 0.8353 - val_loss: 0.3950
Epoch 5/5


2025-02-13 13:45:30.947787: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9426738433860876300
2025-02-13 13:45:30.947878: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 919455969698832656
2025-02-13 13:45:30.947891: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12445609544603847564
2025-02-13 13:45:30.947898: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 16081601474425598988
2025-02-13 13:45:30.947923: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8702364042601946239
2025-02-13 13:45:30.947980: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9203350541694547139


[1m781/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9063 - loss: 0.2394

2025-02-13 13:45:50.099639: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5867249119626194961
2025-02-13 13:45:50.099716: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3399893970510311459
2025-02-13 13:45:50.099735: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 7419765789010437505
2025-02-13 13:45:50.099743: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13151749340778478445
2025-02-13 13:45:50.099752: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15837268228097035399
2025-02-13 13:45:50.099759: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12330220001666952207
2025-02-13 13:45:50.099767: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv 

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 28ms/step - accuracy: 0.9062 - loss: 0.2394 - val_accuracy: 0.8301 - val_loss: 0.4201


2025-02-13 13:46:02.382832: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9426738433860876300
2025-02-13 13:46:02.382895: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 919455969698832656
2025-02-13 13:46:02.382908: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12445609544603847564
2025-02-13 13:46:02.382921: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8702364042601946239
2025-02-13 13:46:02.382969: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 9203350541694547139
2025-02-13 13:46:02.383025: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 16081601474425598988


<keras.src.callbacks.history.History at 0x7f908c38cce0>

## e.
_Exercise: Add an `Embedding` layer and compute the mean embedding for each review, multiplied by the square root of the number of words (see Chapter 16). This rescaled mean embedding can then be passed to the rest of your model._

In [113]:
max_tokens = 1000

text_vectorization = TextVectorization(max_tokens=max_tokens, output_mode="int")
text_vectorization.adapt(textline_dataset_train.map(lambda X, y: X))

In [123]:
def compute_mean_embedding(inputs):
    not_pad = tf.math.count_nonzero(inputs, axis=-1)
    n_words = tf.math.count_nonzero(not_pad, axis=-1, keepdims=True)    
    sqrt_n_words = tf.math.sqrt(tf.cast(n_words, tf.float32))
    return tf.reduce_sum(inputs, axis=1) / sqrt_n_words

In [125]:
model = Sequential()
model.add(Input(shape=(1,), dtype=tf.string))
model.add(text_vectorization)
model.add(
    Embedding(
        input_dim=max_tokens,
        output_dim=20,
        mask_zero=True,  # <pad> tokens => zero vectors
    )
)
model.add(Lambda(compute_mean_embedding))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation="sigmoid"))



## f.
_Exercise: Train the model and see what accuracy you get. Try to optimize your pipelines to make training as fast as possible._

In [126]:
model.compile(optimizer="nadam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

In [128]:
model.fit(textline_dataset_train, validation_data=textline_dataset_valid, epochs=5)

Epoch 1/5
    780/Unknown [1m18s[0m 13ms/step - accuracy: 0.7488 - loss: 0.5175

2025-02-13 15:09:13.368338: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3707046983472941297
2025-02-13 15:09:13.368458: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5395035881269557507
2025-02-13 15:09:13.368483: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2735892035570257003
2025-02-13 15:09:13.368495: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 96710070201787411
2025-02-13 15:09:13.368508: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 157788861914625667
2025-02-13 15:09:13.368519: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12297469749172051883
2025-02-13 15:09:13.368542: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item 

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 26ms/step - accuracy: 0.7488 - loss: 0.5174 - val_accuracy: 0.8047 - val_loss: 0.4296
Epoch 2/5


2025-02-13 15:09:23.327049: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15013923345699811636
2025-02-13 15:09:23.327110: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5715525348082080504
2025-02-13 15:09:23.327123: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2395424861900995265
2025-02-13 15:09:23.327140: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5037602540668183071
2025-02-13 15:09:23.327147: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 10349323125963223809
2025-02-13 15:09:23.327152: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13288540850534904995
2025-02-13 15:09:23.327158: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv 

[1m781/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.7876 - loss: 0.4682

2025-02-13 15:09:42.412223: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3707046983472941297
2025-02-13 15:09:42.412291: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5395035881269557507
2025-02-13 15:09:42.412306: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2735892035570257003
2025-02-13 15:09:42.412315: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 96710070201787411
2025-02-13 15:09:42.412322: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 157788861914625667
2025-02-13 15:09:42.412330: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12297469749172051883
2025-02-13 15:09:42.412341: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item 

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 32ms/step - accuracy: 0.7876 - loss: 0.4681 - val_accuracy: 0.7448 - val_loss: 0.4755
Epoch 3/5


2025-02-13 15:09:56.472329: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15013923345699811636
2025-02-13 15:09:56.472413: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5715525348082080504
2025-02-13 15:09:56.472450: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12997821382069523246
2025-02-13 15:09:56.472488: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5037602540668183071
2025-02-13 15:09:56.472538: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2395424861900995265
2025-02-13 15:09:56.472555: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13288540850534904995


[1m780/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 16ms/step - accuracy: 0.8021 - loss: 0.4322

2025-02-13 15:10:17.624702: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 96710070201787411
2025-02-13 15:10:17.624767: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 157788861914625667
2025-02-13 15:10:17.624780: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5395035881269557507
2025-02-13 15:10:17.624788: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2735892035570257003
2025-02-13 15:10:17.624794: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12297469749172051883
2025-02-13 15:10:17.624804: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 6287311750836334686
2025-02-13 15:10:17.624811: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item 

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 32ms/step - accuracy: 0.8021 - loss: 0.4322 - val_accuracy: 0.7406 - val_loss: 0.4877
Epoch 4/5


2025-02-13 15:10:30.533935: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15013923345699811636
2025-02-13 15:10:30.534010: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5715525348082080504
2025-02-13 15:10:30.534030: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 10349323125963223809
2025-02-13 15:10:30.534041: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13288540850534904995
2025-02-13 15:10:30.534048: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13333826420760852253
2025-02-13 15:10:30.534056: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2395424861900995265
2025-02-13 15:10:30.534063: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv

[1m781/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 20ms/step - accuracy: 0.8214 - loss: 0.4043

2025-02-13 15:10:55.107064: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3707046983472941297
2025-02-13 15:10:55.107167: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5395035881269557507
2025-02-13 15:10:55.107196: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2735892035570257003
2025-02-13 15:10:55.107216: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 157788861914625667
2025-02-13 15:10:55.107236: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12297469749172051883
2025-02-13 15:10:55.107257: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 6287311750836334686
2025-02-13 15:11:05.126250: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:409

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 42ms/step - accuracy: 0.8214 - loss: 0.4043 - val_accuracy: 0.8150 - val_loss: 0.3954
Epoch 5/5


2025-02-13 15:11:12.646265: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15013923345699811636
2025-02-13 15:11:12.646412: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5715525348082080504
2025-02-13 15:11:12.646429: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5037602540668183071
2025-02-13 15:11:12.646508: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2395424861900995265
2025-02-13 15:11:22.708921: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:4091232: Filling up shuffle buffer (this may take a while): 19772 of 25000
2025-02-13 15:11:25.374054: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled.


[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.8251 - loss: 0.4026

2025-02-13 15:11:39.592802: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 3707046983472941297
2025-02-13 15:11:39.592866: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5395035881269557507
2025-02-13 15:11:39.592881: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2735892035570257003
2025-02-13 15:11:39.592888: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 96710070201787411
2025-02-13 15:11:39.592896: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 157788861914625667
2025-02-13 15:11:39.592903: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 12297469749172051883
2025-02-13 15:11:39.592913: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item 

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 41ms/step - accuracy: 0.8251 - loss: 0.4026 - val_accuracy: 0.8172 - val_loss: 0.3914


2025-02-13 15:11:59.193290: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15013923345699811636
2025-02-13 15:11:59.193375: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5715525348082080504
2025-02-13 15:11:59.193390: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 5037602540668183071
2025-02-13 15:11:59.193442: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2395424861900995265


<keras.src.callbacks.history.History at 0x7f9018475af0>

## g.
_Exercise: Use TFDS to load the same dataset more easily: `tfds.load("imdb_reviews")`._

In [133]:
dataset = tfds.load("imdb_reviews", as_supervised=True)
train_dataset, test_dataset = dataset["train"], dataset["test"]

In [134]:
for X, y in train_dataset.take(2):
    print(X)
    print(y)
    print()

tf.Tensor(b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.", shape=(), dtype=string)
tf.Tensor(0, shape=(), dtype=int64)

tf.Tensor(b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on 

2025-02-13 15:17:17.109431: W tensorflow/core/kernels/data/cache_dataset_ops.cc:914] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
