# Azure Data Factory integration
**Note**: This notebook runs on Python 3.6.

In this notebook we will show you how to create a pipeline that consists of 2 deployments: one that does preprocessing on the input data and the other that uses a KNN classifier to predict whether someone has diabetes.


If you run this entire notebook after filling in your access token, the pipeline will be deployed to your UbiOps environment. You can thus check your environment after running the notebook to explore. You can also check the individual steps in this notebook to see what we did exactly and how you can adapt it to your own use case.

We recommend to run the cells step by step, as some cells can take a few minutes to finish. You can run everything in one go as well and it will work, just allow a few minutes for building the deployment.

## Establishing a connection with your UbiOps environment
Add your API token. Then we will provide a project name, deployment name and deployment version name. Afterwards we initialize the client library. This way we can deploy the two pipelines to your environment.

In [None]:
API_TOKEN = '<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>' # Make sure this is in the format "Token token-code"
PROJECT_NAME= '<INSERT PROJECT NAME IN YOUR ACCOUNT>'

# Import all necessary libraries
import shutil
import ubiops

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)

## Create the preprocessing deployment

First of all, we will create the deployment that pre-processes the CSV files, before passing this input to the next deployment consisting of a KNN classifier.

The deployment has the following input:
- data: a csv file with the training data or with test data
- training: a boolean indicating whether we using the data for training or not. In the case this boolean is set to true the target outcome is split of of the training data.

The use of the boolean input "training" allows us to reuse this block later in a production pipeline. 

In [None]:
DEPLOYMENT_NAME = 'data-preprocessor'
DEPLOYMENT_VERSION = 'v1'

deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description='Pre-process incoming csv file',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='string',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='training',
            data_type='bool',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='cleaned_data',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='target_data',
            data_type='blob'
        )
    ]
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.VersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800 # = 30 minutes
)

api.versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
shutil.make_archive('preprocessing_package', 'zip', '.', 'preprocessing_package')

# Upload the zipped deployment package
file_upload_result = api.versions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='preprocessing_package.zip'
)

# Preprocessing Deployment created and deployed!

Now that the preprocessing deployment has been successfully created and deployed, we can move to the next step.

In the next step, we will create the deployment that uses an already trained KNN classifier to predict whether someone has diabetes or not.

## Creating the KNN classifier deployment

In [None]:
deployment_template = ubiops.DeploymentCreate(
    name='knn-model',
    description='KNN model for diabetes prediction',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='data_cleaning_artefact',
            data_type='blob',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='prediction',
            data_type='blob'
        ),
        ubiops.DeploymentOutputFieldCreate(
            name='predicted_diabetes_instances',
            data_type='int'
        )
    ]
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.VersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800 # = 30 minutes
)

api.versions_create(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    data=version_template
)

# Zip the deployment package
shutil.make_archive('predictor_package', 'zip', '.', 'predictor_package')

# Upload the zipped deployment package
file_upload_result = api.versions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    version=DEPLOYMENT_VERSION,
    file='predictor_package.zip'
)

## KNN classifier deployment successfully created and deployed!

In the next step, we will continuously monitor the preprocessing and KNN deployments to find out whether they have passed the building stage successfully.

In [None]:
from time import sleep
status1 = 'building'
status2 = 'building'
while (status1 != 'available' and 'failed' not in status1) or (status2 != 'available' and 'failed' not in status2) :    
    version_status1 = api.versions_get(       
        project_name=PROJECT_NAME,        
        deployment_name=DEPLOYMENT_NAME,        
        version=DEPLOYMENT_VERSION    
    )    
    status1 = version_status1.status
    version_status2 = api.versions_get(       
        project_name=PROJECT_NAME,        
        deployment_name='knn-model',        
        version=DEPLOYMENT_VERSION    
    )    
    status2 = version_status2.status
    sleep(1)
    
print(status1)
print(status2)

# Deployments passed the building stage

If the output of the previous cell is "available" for each of the deployments, the deployments have been successfully created, deployed and built.

In the next step, we will create a pipeline which uses the preprocessing & KNN deployments that we have created in the previous cells of this notebook.

In [None]:
PIPELINE_NAME = "example-pipeline"

pipeline_template = ubiops.PipelineCreate(
    name=PIPELINE_NAME,
    description="A simple pipeline that cleans up data and let's a KNN model predict on it.",
    input_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='string',
        ),
        ubiops.DeploymentInputFieldCreate(
            name='training',
            data_type='bool',
        )
    ]
)

api.pipelines_create(
    project_name=PROJECT_NAME,
    data=pipeline_template
)

# Adding the preprocessing deployment
object_template = ubiops.PipelineObjectCreate(
    name=DEPLOYMENT_NAME,
    reference_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION
)
api.pipeline_objects_create(project_name=PROJECT_NAME, pipeline_name=PIPELINE_NAME, data=object_template)

# Adding the KNN deployment
object_template2 = ubiops.PipelineObjectCreate(
    name='knn-model',
    reference_name='knn-model',
    version=DEPLOYMENT_VERSION
)
api.pipeline_objects_create(project_name=PROJECT_NAME, pipeline_name=PIPELINE_NAME, data=object_template2)

# Connecting the components

# First connecting start --> preprocessor
connection_template1 = ubiops.AttachmentsCreate(
    source_name='pipeline_start', 
    destination_name=DEPLOYMENT_NAME,
    mapping=[
        ubiops.AttachmentFieldsCreate(
            source_field_name='data',
            destination_field_name='data'
        ),
        ubiops.AttachmentFieldsCreate(
            source_field_name='training',
            destination_field_name='training'
        )
    ]
)

api.pipeline_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=PIPELINE_NAME, 
    data=connection_template1
)

# Connection preprocessor --> KNN model
connection_template2 = ubiops.AttachmentsCreate(
    source_name=DEPLOYMENT_NAME, 
    destination_name='knn-model',
    mapping=[
        ubiops.AttachmentFieldsCreate(
            source_field_name='cleaned_data',
            destination_field_name='data'
        ),
        ubiops.AttachmentFieldsCreate(
            source_field_name='target_data',
            destination_field_name='data_cleaning_artefact'
        )
    ]
)

api.pipeline_object_attachments_create(
    project_name=PROJECT_NAME, 
    pipeline_name=PIPELINE_NAME, 
    data=connection_template2
)

**IMPORTANT**: If you get an error like: "error":"Version is not available: The version is currently in the building stage"
Your deployment is not yet available and still building. 
Check in the UI if your deployment is ready and then rerun the block below.

# Pipeline successfuly created!

## Making a request and exploring further
You can go ahead to the Web App and take a look in the user interface at what you have just built. If you want you can create a request to the pipeline using data from the "dummy_data_for_predicting.csv" and setting the "training" input to "False". The dummy data is just the diabetes data without the Outcome column. 

So there we have it! We have made a pipeline in UbiOps that can be connected to Azure Data Factory. Be sure to run the rest of the steps of this cookbook as mentioned in the README, to explore how to set up the integration with Azure Data Factory. 

For any questions, feel free to reach out to us via the customer service portal: https://ubiops.atlassian.net/servicedesk/customer/portals