# DataBricks Integration Using the aisquared DatabricksClient

This notebook shows how to set up the `aisquared` package and connect it to a Databricks workspace to create a job to train a model. We then deploy that model as a REST endpoint in the Databricks workspace

## Installation

First, let's install all the required packages

In [None]:
# Install the aisquared package and scikit-learn

! pip install --upgrade 'aisquared[full]' scikit-learn

In [None]:
from sklearn.datasets import load_iris
import aisquared
import requests

USER_DIRECTORY = '/Users/jacob.renn@squared.ai'
MODEL_NAME = 'IRIS_DECISION_TREE'

## Authenticate with Databricks

Next, we need to stand up our `aisquared DatabricksClient` and connect it to our workspace.

In [None]:
# Create the client
client = aisquared.platform.DatabricksClient()

# Authenticate with the Databricks workspace
#client.login()

## Upload Training Script to Workspace and Create a Job with the Script

Now, we are going to upload the training script (in this directory) to the connected Databricks workspace. This ensures that the script can be run in a job.

In [None]:
client.upload_to_workspace('train_iris.py')

## Create and run a job that runs the script we just uploaded

Now that the script is uploaded into the workspace, we will create a job to allow us to run the script.

In [None]:
# Create the job
client.create_job(
    job_name = 'train_iris_job_aisquared_client',
    tasks = [{'train_iris': f'{USER_DIRECTORY}/train_iris.py'}],
    libraries = ['mlflow', 'scikit-learn'],
    compute_name = 'train_iris_aisquared_client_job_compute',
    spark_version = '13.3.x-scala2.12',
    node_type_id = 'Standard_DS3_v2'
)

# Get the job ID for the job
jobs = client.list_jobs()
job_id = str(jobs[jobs['settings.name'] == 'train_iris_job_aisquared_client'].job_id.iloc[0])

# Run the job
client.run_job(job_id)

## List the New Model in the Workspace and Deploy the Model to a Serving Endpoint

Once we've kicked off the job, we will have to wait a few minutes for the job to finish.  Typically, on testing this job during development, we have typically seen that this job takes about five minutes to complete.

Once the job has completed its run, we can use the model and deploy it to a live serving endpoint

In [None]:
client.create_served_model(
    MODEL_NAME,
    '1',
    'Small'
)

## Use the Model for Inference

Once the model serving endpoint has been created, we can query the endpoint with live data

In [None]:
dataset = load_iris()
data = dataset['data']

with requests.Session() as sess:
    resp = sess.post(
        url = f'{client.base_url}/serving-endpoints/{MODEL_NAME}/invocations',
        headers = client.headers,
        json = {
            'inputs' : data.tolist()
        }
    )

print(resp.json())