In this notebook, we show how we can train a model with Scikit-learn and save it as a TileDB array on TileDB-Cloud.
Firstly, let's import what we need.

In [1]:
import os

import numpy as np
import tiledb.cloud
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression

from tiledb.ml.models.sklearn import SklearnTileDBModel

We then create a TileDB-Cloud context, log in to TileDB via our API token (or username/password) and get our username.

In [2]:
ctx = tiledb.cloud.Ctx()
tiledb.cloud.login(token=os.getenv('TILEDB_API_TOKEN'))
namespace = tiledb.cloud.client.default_user().username

And move on with training a sklearn model with some random data.

In [3]:
X_train = np.random.random((1000, 784))
y_train = np.random.randint(9, size=1000)

X_test = np.random.random((500, 784))
y_test = np.random.randint(9, size=500)

scaler = preprocessing.StandardScaler().fit(X_train)

scaled_X_train = scaler.transform(X_train)
scaled_X_test = scaler.transform(X_test)

print("Model fit...")
model = LogisticRegression(random_state=0).fit(scaled_X_train, y_train)

print("Model score...")
sparsity = np.mean(model.coef_ == 0) * 100
score = model.score(scaled_X_test, y_test)

print("Sparsity with L1 penalty: %.2f%%" % sparsity)
print("Test score with L1 penalty: %.4f" % score)

Model fit...
Model score...
Sparsity with L1 penalty: 0.00%
Test score with L1 penalty: 0.1260


We can move on by defining a TileDB Sklearn model and use model save functionality in order to save it directly to
our bucket on S3 (defined with AWS credentials in your TileDB-Cloud account) and register it on TileDB-Cloud.

In [4]:
print('Defining SklearnTileDBModel model...')
# In order to save our model on S3 and register it on TileDB-Cloud we have to pass our Namespace and TileDB Context.
tiledb_model = SklearnTileDBModel(uri='tiledb-sklearn-model', namespace=namespace, ctx=ctx, model=model)

# We will need the uri that was created from our model class
# (and follows pattern tiledb://my_username/s3://my_bucket/my_array),
# in order to interact with our model on TileDB-Cloud.
tiledb_cloud_model_uri = tiledb_model.uri

print('Saving model on S3 and registering on TileDB-Cloud...')
tiledb_model.save(meta={"Sparsity_with_L1_penalty": sparsity, "score": score})


Defining SklearnTileDBModel model...
Saving model on S3 and registering on TileDB-Cloud...


Finally, we can use TileDB-Cloud API as described in our [cloud documentation](https://docs.tiledb.com/cloud/), in order
to list our models, get information and deregister them.

In [5]:
# List all our models. All machine learning model TileDB arrays are of type 'ml_model'
print(tiledb.cloud.client.list_arrays(file_type=['ml_model'], namespace=namespace))

# Get model's info
print(tiledb.cloud.array.info(tiledb_cloud_model_uri))

# Load our model for inference
loaded_tiledb_model = SklearnTileDBModel(uri=os.path.basename(tiledb_cloud_model_uri), namespace=namespace, ctx=ctx).load()

assert score == loaded_tiledb_model.score(scaled_X_test, y_test)

{'arrays': [{'access_credentials_name': 'gsk',
             'allowed_actions': ['read_array_logs',
                                 'read_array_info',
                                 'read_array_schema',
                                 'write',
                                 'edit',
                                 'read'],
             'description': None,
             'file_properties': None,
             'file_type': 'ml_model',
             'id': '3c6ecc49-56b4-4e7b-83a5-2de20a6e260b',
             'is_favorite': None,
             'last_accessed': datetime.datetime(2022, 8, 10, 18, 21, 39, tzinfo=tzutc()),
             'license_id': None,
             'license_text': None,
             'logo': None,
             'name': 'tiledb-sklearn-model',
             'namespace': 'george.sakkis',
             'namespace_subscribed': None,
             'pricing': None,
             'public_share': False,
             'read_only': False,
             'share_count': None,
             'size

In [6]:
# Deregister model and physically delete the array from the S3 bucket
tiledb.cloud.array.delete_array(tiledb_cloud_model_uri)