# Scikit template
**Note**: This notebook runs on Python 3.6 and uses UbiOps CLient Library 3.3.0.

In this notebook we will show you the following:
- how to make a training pipeline in UbiOps which preprocesses the data and trains and tests a model using scikit
- how to make a production pipeline in UbiOps which takes in new data, processes it and feeds it to a trained model for prediction/classification

For this example we will use a diabetes dataset from Kaggle to create a KNN classifier to predict if someone will have diabetes or not. Link to original dataset: https://www.kaggle.com/uciml/pima-indians-diabetes-database

If you run this entire notebook after filling in your access token, the two pipelines and all the necessary models will be deployed to your UbiOps environment. You can thus check your environment after running to explore. You can also check the individual steps in this notebook to see what we did exactly and how you can adapt it to your own use case.

We recommend to run the cells step by step, as some cells can take a few minutes to finish. You can run everything in one go as well and it will work, just allow a few minutes for building the individual deployments.

## Establishing a connection with your UbiOps environment
Add your API token. Then we will provide a project name, deployment name and deployment version name. Afterwards we initialize the client library. This way we can deploy the two pipelines to your environment.

In [None]:
API_TOKEN = "<INSERT YOUR TOKEN HERE>" # Make sure this is in the format "Token token-code"
PROJECT_NAME = "<INSERT PROJECT NAME>"
DEPLOYMENT_NAME = 'data-preprocessor'
DEPLOYMENT_VERSION = 'v1'

# Import all necessary libraries
import shutil
import os
import ubiops

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)

## Making a training pipeline

Our training pipeline will consist of two steps: preprocessing the data, and training a model.
For each of these two steps we will create a separate deployment in UbiOps. This way the processing step can be reused later in the deployment pipeline (or in other pipelines) and each block will be scaled separately, increasing speed.

### Preprocessing the data
In the cell below the deployment.py of the preprocessing block is loaded. In the request function you can see that the deployment will clean up the data for further use and output that back in the form of two csv files. 
The deployment has the following input:
- data: a csv file with the training data or with real data
- training: a boolean indicating whether we using the data for training or not. In the case this boolean is set to true the target outcome is split of of the training data.

The use of the boolean input "training" allows us to reuse this block later in a production pipeline. 

In [None]:
%load preprocessing_package/deployment.py

Now we create a deployment and a deployment version for the package in the cell above. 

In [None]:
deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description='Clean up data',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='training',
            data_type='bool',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='cleaned_data',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='target_data',
            data_type='blob'
        )
    ],
    labels={'demo': 'scikit-deployment'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='none' # we don't need request storage in this example
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
shutil.make_archive('preprocessing_package', 'zip', '.', 'preprocessing_package')

# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='preprocessing_package.zip'
)

The first model will now have been deployed to your UbiOps environment. Go ahead and take a look in the UI in the tab deployments to see it for yourself. 


### Training and testing

Now that we have the preprocessing deployment in UbiOps, we need a deployment that can take the output of the preprocessing step and train a KNN model on it. The code for this is in the "training_package" directory and can be seen in the next cell. We are going to perform the same steps as above to deploy this code in UbiOps.

In [None]:
%load training_package/deployment.py

Time to deploy this step to UbiOps.

In [None]:
deployment_template_t = ubiops.DeploymentCreate(
    name='model-training',
    description='Trains a KNN model',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='cleaned_data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='target_data',
            data_type='blob',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='trained_model',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='model_score',
            data_type='double'
        )
    ],
    labels={'demo': 'scikit-deployment'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template_t
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='none' # we don't need request storage in this example
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name='model-training',
    data=version_template
)

# Zip the deployment package
shutil.make_archive('training_package', 'zip', '.', 'training_package')

# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name='model-training',
    version=DEPLOYMENT_VERSION,
    file='training_package.zip'
)

Check if both deployments, preprocessing and training, are available for further use. Only once the models are built and ready can we use them in a pipeline.

In [None]:
from time import sleep
status1 = 'building'
status2 = 'building'
while (status1 != 'available' and 'failed' not in status1) or (status2 != 'available' and 'failed' not in status2) :    
    version_status1 = api.deployment_versions_get(       
        project_name=PROJECT_NAME,        
        deployment_name=DEPLOYMENT_NAME,        
        version=DEPLOYMENT_VERSION    
    )    
    status1 = version_status1.status
    version_status2 = api.deployment_versions_get(       
        project_name=PROJECT_NAME,        
        deployment_name='model-training',        
        version=DEPLOYMENT_VERSION    
    )   
    status2 = version_status2.status
    sleep(1)
    
print(status1)
print(status2)

## Creating a training pipeline

So right now we have two deployments: one cleaning up the input data and one using that data for training a model. We want to tie these two blocks together to create a workflow. We can use pipelines for that. Let's create a pipeline that takes the same input as the preprocessing block.

In [None]:
training_pipeline_name = "training-pipeline"

pipeline_template = ubiops.PipelineCreate(
    name=training_pipeline_name,
    description='A simple pipeline that cleans up data and trains a KNN model on it.',
    input_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='training',
            data_type='bool',
        )
    ],
    output_type='structured',
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='trained_model',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='model_score',
            data_type='double'
        )
    ],
    labels={'demo': 'scikit-deployment'}
)

api.pipelines_create(
    project_name=PROJECT_NAME,
    data=pipeline_template
)

In [None]:
training_pipeline_version = DEPLOYMENT_VERSION

pipeline_template = ubiops.PipelineVersionCreate(
    version=training_pipeline_version,
    request_retention_mode='none' # we don't need request storage for this example
)

api.pipeline_versions_create(
    project_name=PROJECT_NAME, pipeline_name=training_pipeline_name, data=pipeline_template
)

We have a pipeline, now we just need to add our two components to it and connect it.

**IMPORTANT**: If you get an error like: "error":"Version is not available: The version is currently in the building stage"
Your model is not yet available and still building. 
Check in the UI if your model is ready and then rerun the block below.

In [None]:
# Adding the preprocessing deployment
object_template = ubiops.PipelineVersionObjectCreate(
    name=DEPLOYMENT_NAME,
    reference_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION
)
api.pipeline_version_objects_create(
    project_name=PROJECT_NAME,
    pipeline_name=training_pipeline_name,
    version=training_pipeline_version,
    data=object_template
)

# Adding the training deployment
object_template2 = ubiops.PipelineVersionObjectCreate(
    name='model-training',
    reference_name='model-training',
    version=DEPLOYMENT_VERSION
)
api.pipeline_version_objects_create(
    project_name=PROJECT_NAME,
    pipeline_name=training_pipeline_name,
    version=training_pipeline_version,
    data=object_template2
)

In [None]:
# Connecting the components

# First connecting start --> preprocessor
attachment_template1 = ubiops.AttachmentsCreate(
    destination_name=DEPLOYMENT_NAME,
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name='pipeline_start',
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='data',
                    destination_field_name='data'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='training',
                    destination_field_name='training'
                )]
        )]
)

api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=training_pipeline_name, 
    version=training_pipeline_version,
    data=attachment_template1
)

# Connection preprocessor --> model-training
attachment_template2 = ubiops.AttachmentsCreate(
    destination_name='model-training',
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name=DEPLOYMENT_NAME,
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='cleaned_data',
                    destination_field_name='cleaned_data'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='target_data',
                    destination_field_name='target_data'
                )]
        )]
)


api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=training_pipeline_name,
    version=training_pipeline_version, 
    data=attachment_template2
)

# Connection model-training -> pipeline end
attachment_template2 = ubiops.AttachmentsCreate(
    destination_name='pipeline-end',
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name='model-training',
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='trained_model',
                    destination_field_name='trained_model'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='model_score',
                    destination_field_name='model_score'
                )]
        )]
)


api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=training_pipeline_name,
    version=training_pipeline_version, 
    data=attachment_template2
)

## Training pipeline done!
If you check in your UbiOps account under pipeline you will find a training-pipeline with our components in it and connected. Let's make a request to it. You can also make a request in the UI with the "create direct request button".

this might take a while since the models will need a cold start as they have never been used before.

In [None]:
training_pipeline_name = "training-pipeline"

blob = api.blobs_create(project_name=PROJECT_NAME, file='diabetes.csv', blob_ttl=1000)

data = {'data': blob.id, 'training': True}
pipeline_result = api.pipeline_version_requests_create(
    project_name=PROJECT_NAME,
    pipeline_name=training_pipeline_name,
    version=training_pipeline_version,
    data=data
)

print(pipeline_result)

In [None]:
# let's keep the blobid of the trained model safe for further use:
# We will check the most recent blobs in our environment and look for the trained model one.
# The trained model is kept in a joblib file called knn.joblib
blobs_list = api.blobs_list(project_name = PROJECT_NAME)
trained_model_blob = None
for blob in blobs_list:
    if blob.filename == 'knn.joblib':
        trained_model_blob = str(blob.id)

## Predicting with the trained model

Our model is trained and ready. Now we still need to deploy a predictor to UbiOps that uses this model for predicting. 

I already have the code and the requirements ready that need to be deployed to UbiOps. However, the joblib file is still missing in this folder. We dont want to manually download the joblib file output from the training pipeline, but automatically put it in the deployment package for the predictor. After that we can zip up the folder and push it to UbiOps like we did with the previous two packages.

In [None]:
%load predictor_package/deployment.py

In [None]:
# We need to download the trained model joblib and put it in the predictor package directory
base_directory = os.path.dirname(os.path.abspath("scikit-deployment"))

with api.blobs_get(project_name=PROJECT_NAME, blob_id=trained_model_blob) as response:
    output_path = os.path.join(base_directory, "predictor_package", response.getfilename())
    with open(output_path, 'wb') as f:
        f.write(response.read())


In [None]:
# Now we need to zip the deployment package
shutil.make_archive('predictor_package', 'zip', '.', 'predictor_package')

## Deploying the KNN model
The folder is ready, now we need to make a deployment in UbiOps. Just like before.

In [None]:
deployment_template = ubiops.DeploymentCreate(
    name='knn-model',
    description='KNN model for diabetes prediction',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='data_cleaning_artefact',
            data_type='blob',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='prediction',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='predicted_diabetes_instances',
            data_type='int'
        )
    ],
    labels={'demo': 'scikit-deployment'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='none' # we don't need request storage in this example
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    data=version_template
)

# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    version=DEPLOYMENT_VERSION,
    file='predictor_package.zip'
)

Check if the deployment is ready for use

In [None]:
status = 'building'
while status != 'available' and 'failed' not in status:    
    version_status = api.deployment_versions_get(       
        project_name=PROJECT_NAME,        
        deployment_name='knn-model',        
        version=DEPLOYMENT_VERSION    
    )    
    status = version_status.status
    sleep(1)
print(status)

## Creating the production pipeline


In [None]:
prod_pipeline_name = "production-pipeline"

pipeline_template = ubiops.PipelineCreate(
    name=prod_pipeline_name,
    description="A simple pipeline that cleans up data and let's a KNN model predict on it.",
    input_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='training',
            data_type='bool',
        )
    ],
    output_type='structured',
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='prediction',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='predicted_diabetes_instances',
            data_type='int'
        )
    ],
    labels={'demo': 'scikit-deployment'}
)

api.pipelines_create(
    project_name=PROJECT_NAME,
    data=pipeline_template
)

In [None]:
prod_pipeline_version = DEPLOYMENT_VERSION

pipeline_template = ubiops.PipelineVersionCreate(
    version=prod_pipeline_version
    request_retention_mode='none' # we don't need request storage in this example
)

api.pipeline_versions_create(project_name=PROJECT_NAME, pipeline_name=prod_pipeline_name, data=pipeline_template)

Adding the preprocessing and the predicting components and connecting them.

**IMPORTANT**: If you get an error like: "error":"Version is not available: The version is currently in the building stage"
Your model is not yet available and still building. 
Check in the UI if your model is ready and then rerun the block below.

In [None]:
# Adding the preprocessing deployment
object_template = ubiops.PipelineVersionObjectCreate(
    name=DEPLOYMENT_NAME,
    reference_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION
)
api.pipeline_version_objects_create(
    project_name=PROJECT_NAME,
    pipeline_name=prod_pipeline_name,
    version=prod_pipeline_version,
    data=object_template
)

# Adding the KNN deployment
object_template2 = ubiops.PipelineVersionObjectCreate(
    name='knn-model',
    reference_name='knn-model',
    version=DEPLOYMENT_VERSION
)
api.pipeline_version_objects_create(
    project_name=PROJECT_NAME,
    pipeline_name=prod_pipeline_name,
    version=prod_pipeline_version,
    data=object_template2
)

# Connecting the components
# First connecting start --> preprocessor
attachment_template1 = ubiops.AttachmentsCreate(
    destination_name=DEPLOYMENT_NAME,
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name='pipeline_start',
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='data',
                    destination_field_name='data'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='training',
                    destination_field_name='training'
                )]
        )]
)

api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=prod_pipeline_name, 
    version=prod_pipeline_version,
    data=attachment_template1
)

# Connection preprocessor --> KNN model
attachment_template2 = ubiops.AttachmentsCreate(
    destination_name='knn-model',
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name=DEPLOYMENT_NAME,
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='cleaned_data',
                    destination_field_name='data'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='target_data',
                    destination_field_name='data_cleaning_artefact'
                )]
        )]
)


api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=prod_pipeline_name, 
    version=prod_pipeline_version,
    data=attachment_template2
)

# Connection KNN model -> pipeline end
attachment_template3 = ubiops.AttachmentsCreate(
    destination_name='pipeline_end',
    sources=[
        ubiops.AttachmentSourcesCreate(
            source_name='knn-model',
            mapping=[
                ubiops.AttachmentFieldsCreate(
                    source_field_name='prediction',
                    destination_field_name='prediction'
                ),
                ubiops.AttachmentFieldsCreate(
                    source_field_name='predicted_diabetes_instances',
                    destination_field_name='predicted_diabetes_instances'
                )]
        )]
)


api.pipeline_version_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=prod_pipeline_name, 
    version=prod_pipeline_version,
    data=attachment_template3
)

## Making a request and exploring further
You can go ahead to the Web App and take a look in the user interface at what you have just built. If you want you can create a request to the production pipeline using the "dummy_data_for_predicting.csv" and setting the "training" input to "False". The dummy data is just the diabetes data with the Outcome column chopped of. 

So there we have it! We have made a training pipeline and a production pipeline using the scikit learn library. You can use this notebook to base your own pipelines on. Just adapt the code in the deployment packages and alter the input and output fields as you wish and you should be good to go. 

For any questions, feel free to reach out to us via the customer service portal: https://ubiops.atlassian.net/servicedesk/customer/portals