This example notebook shows how we can train an image classification model, as described [here](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb),
and store it as TileDB array. Firstly, let's import what we need.

In [12]:
import tensorflow as tf
import tiledb
import os
import json

from models.tensorflow_keras_models import TensorflowTileDB

Load MNIST dataset for Keras datasets and scale.

In [14]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

We can then define a function that creates a basic digit classifier for the MNIST dataset.

In [17]:
def create_model():
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10)
    ])

    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

    model.compile(optimizer='adam',
                  loss=loss_fn,
                  metrics=['accuracy'])

    return model

We can then train a model using some of our data. Let's assume that we initially train with the first 30000
observations from our dataset.

In [18]:
model = create_model()
model.fit(x_train[:30000], y_train[:30000], epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1716654a8>

We can now save the trained model as a TileDB array. In case we want to train  the model further in a later time, we can save
optimizer's information in our TileDB array. In case we will use our model only for inference, we don't have to save optimizer's
information and we only keep model's weights. We first declare a TileDB-Keras model object (with the corresponding uri) and then
save the model as a TileDB array.

In [19]:
tiledb_model_1 = TensorflowTileDB(uri='tiledb-keras-mnist-sequential-1')

tiledb_model_1.save(model=model,
                    include_optimizer=True,
                    update=False)

The above step will create a TileDB array in your working directory. For information about the structure of a dense
TileDB array in terms of files on disk please take a look [here](https://docs.tiledb.com/main/basic-concepts/data-format).
Let's open our TileDB array model and check metadata. Metadata that are of type list, dict or tuple have been JSON
serialized while saving, i.e., we need json.loads to deserialize them.

In [20]:
# Open in write mode in order to add metadata
model_array_1 = tiledb.open('tiledb-keras-mnist-sequential-1')
for key, value in model_array_1.meta.items():
    if isinstance(value, bytes):
        value = json.loads(value)
    print("Key: {}, Value: {}".format(key, value))

Key: backend, Value: tensorflow
Key: keras_version, Value: 2.4.0
Key: model_config, Value: {'class_name': 'Sequential', 'config': {'name': 'sequential_3', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'flatten_3_input'}}, {'class_name': 'Flatten', 'config': {'name': 'flatten_3', 'trainable': True, 'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'data_format': 'channels_last'}}, {'class_name': 'Dense', 'config': {'name': 'dense_7', 'trainable': True, 'dtype': 'float32', 'units': 128, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_3', 'trainable': True, 'dty

As we can see, in array's metadata we have by default information about the backend we used for training, keras version,
python version, model configuration and training configuration. We can load and check any of the aforementioned without
having to load the entire model in memory. Moreover, we can add any kind of extra information about model accuracy, model
version, deployment status etc, in the model's metadata either while saving the model, by passing a dictionary with any
kind of information, or by opening the TileDB array and adding new keys. Both cases are shown below.

In [21]:
# Open the array in write mode
with tiledb.Array('tiledb-keras-mnist-sequential-1', "w") as A:
    # Keep all history
    A.meta['loss'] = json.dumps(model.history.history['loss'])
    A.meta['accuracy'] = json.dumps(model.history.history['accuracy'])

    # Or keep last epoch's loss and accuracy
    A.meta['last_epoch_loss'] = json.dumps(model.history.history['loss'][-1])
    A.meta['last_epoch_accuracy'] = json.dumps(model.history.history['accuracy'][-1])

# Check that everything is there
model_array_1 = tiledb.open('tiledb-keras-mnist-sequential-1')
for key, value in model_array_1.meta.items():
    if isinstance(value, bytes):
        value = json.loads(value)
    print("Key: {}, Value: {}".format(key, value))

Key: accuracy, Value: [0.890333354473114, 0.9450333118438721, 0.9590333104133606, 0.9666000008583069, 0.9712666869163513]
Key: backend, Value: tensorflow
Key: keras_version, Value: 2.4.0
Key: last_epoch_accuracy, Value: 0.9712666869163513
Key: last_epoch_loss, Value: 0.09391548484563828
Key: loss, Value: [0.3824021816253662, 0.18665964901447296, 0.13953274488449097, 0.1090812087059021, 0.09391548484563828]
Key: model_config, Value: {'class_name': 'Sequential', 'config': {'name': 'sequential_3', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'flatten_3_input'}}, {'class_name': 'Flatten', 'config': {'name': 'flatten_3', 'trainable': True, 'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'data_format': 'channels_last'}}, {'class_name': 'Dense', 'config': {'name': 'dense_7', 'trainable': True, 'dtype': 'float32', 'units': 128, 'activation': 'relu', 'use_bias': True, 'kernel_initia

We can also save any kind of metadata while saving the model as a TileDB array, and avoid opening it multiple times.

In [22]:
model = create_model()
model.fit(x_train[:30000], y_train[:30000], epochs=5)

tiledb_model_2 = TensorflowTileDB(uri='tiledb-keras-mnist-sequential-2')

tiledb_model_2.save(model=model,
                    include_optimizer=True,
                    update=False,
                    meta={"accuracy": model.history.history['accuracy'],
                          "loss": model.history.history['loss'],
                          "version": '0.0.1',
                          "status": 'experimental'})

# Check that everything is there
model_array_1 = tiledb.open('tiledb-keras-mnist-sequential-2')
for key, value in model_array_1.meta.items():
    if isinstance(value, bytes):
        value = json.loads(value)
    print("Key: {}, Value: {}".format(key, value))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Key: accuracy, Value: [0.8869333267211914, 0.9460999965667725, 0.9593333601951599, 0.9661666750907898, 0.9733333587646484]
Key: backend, Value: tensorflow
Key: keras_version, Value: 2.4.0
Key: loss, Value: [0.3882191777229309, 0.19028112292289734, 0.1363549530506134, 0.1107335016131401, 0.09010602533817291]
Key: model_config, Value: {'class_name': 'Sequential', 'config': {'name': 'sequential_4', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'flatten_4_input'}}, {'class_name': 'Flatten', 'config': {'name': 'flatten_4', 'trainable': True, 'batch_input_shape': [None, 28, 28], 'dtype': 'float32', 'data_format': 'channels_last'}}, {'class_name': 'Dense', 'config': {'name': 'dense_9', 'trainable': True, 'dtype': 'float32', 'units': 128, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {

Moving on, we can load the trained models for prediction or evaluation (we have to compile the model), as usual with
Tensorflow Keras models.

In [23]:
loaded_model_1 = tiledb_model_1.load()
loaded_model_2 = tiledb_model_2.load()

# Make some predictions
print(loaded_model_1.predict(x_test))
print(loaded_model_2.predict(x_test))

# Evaluate models
loaded_model_1 = tiledb_model_1.load(compile_model=True)
loaded_model_2 = tiledb_model_2.load(compile_model=True)
loaded_model_1.evaluate(x_test, y_test)
loaded_model_2.evaluate(x_test, y_test)

[[ -3.1279027   -6.568434    -0.4252091  ...  11.660195    -3.3526886
   -2.766497  ]
 [ -3.7407515    2.6952996   11.920709   ... -14.252106     0.44303626
  -12.41744   ]
 [ -4.9525685    6.792746     0.26711312 ...  -0.45758492  -0.30579057
   -4.1699696 ]
 ...
 [ -6.6460347  -10.294316    -9.749106   ...   1.1787943    1.3536193
    2.4386663 ]
 [ -1.0649565   -5.223176    -9.355335   ...  -1.4681871    1.7254432
   -1.9841626 ]
 [ -1.7826611   -9.365487     0.14322725 ...  -7.296979    -5.360878
   -7.2144213 ]]
[[ -4.6274433   -9.597495    -0.10862601 ...   9.1714      -2.376289
   -4.1253734 ]
 [ -5.3148026    2.5041955   11.574888   ... -10.034176    -1.2081295
   -9.236549  ]
 [ -6.4843216    6.324581     0.0203052  ...   0.02500642  -1.5317293
   -4.378875  ]
 ...
 [ -9.435321   -11.73144     -8.514579   ...  -1.1290907   -1.9690759
    2.2376351 ]
 [ -5.9645257   -5.699082    -8.42025    ...  -2.9483845    2.9252856
   -6.2881393 ]
 [ -0.82890916  -7.8822355   -0.46922368 ..

[0.09490542113780975, 0.9710000157356262]

What is really nice with saving models as TileDB array, is native versioning based on fragments as described
[here](https://docs.tiledb.com/main/basic-concepts/data-format#immutable-fragments). We can load a model, retrain it
with new data and update the already existing TileDB model array with the new model and metadata. All information, old
and new will be there and accessible. This is extremely useful when trying many different architectures for the same
problem and you want to keep track of all your experiments without having to store different model instances. In our case,
let's continue training model_1 with the rest of our dataset and for 5 more epochs. After training is done, you will
notice the extra directories and files (fragments) added to tiledb-keras-mnist-sequential-1 TileDB array directory,
which keep all versions of the model.

In [24]:
loaded_model_1 = tiledb_model_1.load(compile_model=True)
loaded_model_1.fit(x_train[30000:], y_train[30000:], epochs=5)

# and update
tiledb_model_1.save(model=loaded_model_1,
                    include_optimizer=True,
                    update=True,
                    meta={"accuracy": model.history.history['accuracy'],
                          "loss": model.history.history['loss'],
                          "version": '0.0.1',
                          "status": 'experimental'})

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Finally, a very interesting and useful, for machine learning models, TileDB feature that is described
[here](https://docs.tiledb.com/main/basic-concepts/data-format#groups) and [here](https://docs.tiledb.com/main/solutions/tiledb-embedded/api-usage/object-management#creating-tiledb-groups)
are groups. Assuming that we want to solve the MNIST problem, and we want to try many different architectures. We can save each architecture
separate TileDB array with native versioning each time it's re-trained, and then organise all models that solve the same problem (MNIST)
as a TileDB array group. Let's firstly define a new model architecture.

In [25]:
def create_deeper_model():
    # For the sake of simplicity we just add an extra dense layer to the previous architecture.
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10)
    ])

    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

    model.compile(optimizer='adam',
                  loss=loss_fn,
                  metrics=['accuracy'])

    return model

Then train it and save it as a new TileDB array.

In [26]:
model = create_deeper_model()
model.fit(x_train, y_train, epochs=5)

tiledb_deeper_model = TensorflowTileDB(uri='tiledb-keras-mnist-sequential-deeper')

tiledb_deeper_model.save(model=model,
                         include_optimizer=True,
                         update=False,
                        meta={"accuracy": model.history.history['accuracy'],
                              "loss": model.history.history['loss'],
                              "version": '0.0.1',
                              "status": 'experimental'})

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Now we can create a TileDB group and organise (even in hierarchies, e.g., sophisticated vs less sophisticated) all our
MNIST models as follows.

In [29]:
tiledb.group_create('MNIST_Group')
os.system('mv tiledb-keras-mnist-sequential-1 MNIST_Group/')
os.system('mv tiledb-keras-mnist-sequential-2 MNIST_Group/')
os.system('mv tiledb-keras-mnist-sequential-deeper MNIST_Group/')

TileDBError: [TileDB::StorageManager] Error: Cannot create group; Group 'file:///Users/george/PycharmProjects/TileDB-ML/example_notebooks/models/MNIST_Group_2' already exists

And at any time check what kind of models are there for a specific problem.

In [None]:
tiledb.ls('MNIST_Group', lambda obj_path, obj_type: print(obj_path, obj_type))