<a href="https://colab.research.google.com/github/MassimoCiaffoni/Progetto-CER/blob/master/IMDb_dataset_benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <img valign="middle" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/IMDB_Logo_2016.svg/1200px-IMDB_Logo_2016.svg.png" alt="IMDb" width="80">  dataset training benchmark

Benchmark based on the famouse [IBMd dataset](https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews).<br>
50K movie reviews are labeled as "positive" or "negative" reviews<br>
The aim of this test is to give a benchmark of the computational power between **TPU-GPU-CPU**



<h2> How to run the benchmark</h2>

You need to run the code each time you want to test a different hardware, it is not possible to test them all in once.<br>
First of all you need to log in to your Google account and connect to the laptop

*   **Bench TPU performance**  <a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a></h3><br>
> Runtime -> Change runtime type -> select **TPU** <br>

*   **Bench GPU performance** <br>
> Runtime -> Change runtime type -> select **GPU** <br>

*   **Bench CPU performance** <br>
> Runtime -> Change runtime type -> select **None** <br>


Run the code with the icon or simply press Runtime -> Run all



<h3>our test results avalable <a href= "https://github.com/MassimoCiaffoni/Progetto-CER/wiki/TPU-Performance-and-Test">HERE</a> </h3>

In [0]:
import tensorflow as tf
from tensorflow.keras import layers, datasets, preprocessing, models
import tensorflow_datasets as tfds
print("TensorFlow version:",tf.__version__)
max_len = 200
n_words = 10000
dim_embedding = 256
EPOCHS = 20
BATCH_SIZE = 20

Tensor flow version running at the the creation of the script : **2.2.0**

In [0]:
def load_data():
        # Load data.
        (X_train, y_train), (X_test, y_test) = datasets.imdb.load_data(num_words=n_words)
        # Pad sequences with max_len.
        X_train = preprocessing.sequence.pad_sequences(X_train, maxlen=max_len)
        X_test = preprocessing.sequence.pad_sequences(X_test, maxlen=max_len)
        return (X_train, y_train), (X_test, y_test)

In [0]:
def build_model():
    model = models.Sequential()
    # Input: - eEmbedding Layer.
    # The model will take as input an integer matrix of size (batch,input_length).
    # The model will output dimension (input_length, dim_embedding).
    # The largest integer in the input should be no larger than n_words (vocabulary size).
    model.add(layers.Embedding(n_words, dim_embedding, input_length=max_len))
    model.add(layers.Dropout(0.3))
    # Takes the maximum value of either feature vector from each of the n_words features.
    model.add(layers.GlobalMaxPooling1D())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(1, activation='sigmoid'))
    return model

In [0]:
# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
    tpu = None
    gpus = tf.config.experimental.list_logical_devices("GPU")

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
elif len(gpus) > 1: # multiple GPUs in one VM
    strategy = tf.distribute.MirroredStrategy(gpus)
else: # default strategy that works on CPU and single GPU
    strategy = tf.distribute.get_strategy()

print("REPLICAS: ", strategy.num_replicas_in_sync)

In [0]:
(X_train, y_train), (X_test, y_test) = load_data()
model = build_model()
model.summary()
model.compile(optimizer = "adam", loss = "binary_crossentropy",
 metrics = ["accuracy"]
)

score = model.fit(X_train, y_train,
 epochs = EPOCHS,
 batch_size = BATCH_SIZE,
 validation_data = (X_test, y_test)
)

score = model.evaluate(X_test, y_test, batch_size=BATCH_SIZE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])
