In this notebook, we show how we can train a model with Scikit-learn and save it as a TileDB array on TileDB-Cloud.
Firstly, let's import what we need.

In [None]:
import numpy as np
import tiledb
import os

from sklearn.linear_model import LogisticRegression
from tiledb.ml.models.sklearn import SklearnTileDB

We then have to export and load our TileDB-Cloud credentials. For TileDB cloud you can also use a token.
You have to also set up your AWS credentials on your TileDB-Cloud account.

In [None]:
# This is also our namespace on TileDB-Cloud.
TILEDB_USER_NAME = os.environ.get('TILEDB_USER_NAME')
TILEDB_PASSWD = os.environ.get('TILEDB_PASSWD')

We then create a TileDB-Cloud context and set up our communication with TileDB-Cloud.

In [None]:
ctx = tiledb.cloud.Ctx()
tiledb.cloud.login(username=TILEDB_USER_NAME, password=TILEDB_PASSWD)

And move on with training a sklearn model with some random data.

In [None]:
X_train = np.random.random((1000, 784))
y_train = np.random.randint(9, size=1000)

X_test = np.random.random((500, 784))
y_test = np.random.randint(9, size=500)

print("Model fit...")
clf = LogisticRegression(random_state=0).fit(X_train, y_train)

print("Model score...")
sparsity = np.mean(clf.coef_ == 0) * 100
score = clf.score(X_test, y_test)

print("Sparsity with L1 penalty: %.2f%%" % sparsity)
print("Test score with L1 penalty: %.4f" % score)

We can move on by defining a TileDB Sklearn model and use model save functionality in order to save it directly to
our bucket on S3 (defined with AWS credentials in your TileDB-Cloud account) and register it on TileDB-Cloud.

In [None]:
# Define array model uri.
uri = "tiledb-sklearn-model"

print('Defining SklearnTileDB model...')
# In order to save our model on S3 and register it on TileDB-Cloud we have to pass our Namespace and TileDB Context.
tiledb_model = SklearnTileDB(uri=uri, namespace=TILEDB_USER_NAME, ctx=ctx)

print(tiledb_model.uri)

# We will need the uri that was created from our model class
# (and follows pattern tiledb://my_username/s3://my_bucket/my_array),
# in order to interact with our model on TileDB-Cloud.
tiledb_cloud_model_uri = tiledb_model.uri

print('Saving model on TileDB-Cloud')
tiledb_model.save(
    model=clf, meta={"Sparsity_with_L1_penalty": sparsity, "score": score}
)


Finally, we can use TileDB-Cloud API as described in our [cloud documentation](https://docs.tiledb.com/cloud/), in order
to list our models, get information and deregister them.

In [None]:
# List all our models. Here, we filter with file_type = 'ml_model'. All machine learning model TileDB arrays are of type
# 'ml_model'
print(
tiledb.cloud.client.list_arrays(
    file_type=['ml_model'],
    namespace=TILEDB_USER_NAME))

# Get model's info
print(tiledb.cloud.array.info(tiledb_cloud_model_uri))

# Load our model for inference
loaded_tiledb_model = SklearnTileDB(uri=tiledb_cloud_model_uri, ctx=ctx).load()

print(score == loaded_tiledb_model.score(X_test, y_test))

# Deregister model
tiledb.cloud.deregister_array(tiledb_cloud_model_uri)