# Setting up Certifai Pro to scans Models available as service

# Typical flow for a Data Scientist and ML Engineer during Development
- Detailed analysis
- Setup Baseline
- Trend analysis


## Overview

This notebook-tutorial takes you through the following processes:

1. Installing Certifai Pro in an Azure cloud environment and configuring your Certifai Pro instance with storage parameters for a pre-existing container in an Azure Storage Account

2. Installing the required Certifai Toolkit python libraries

3. Obtaining credentials for Azure-hosted model endpoint services for sklearn models by following the ["Azure ML Certifai Scan" notebook](https://github.com/CognitiveScale/cortex-certifai-examples/blob/master/notebooks/azureml_model_headers_demo/german_credit_azure_ml_demo.ipynb)

4. Testing the deployed hosted model endpoint services (webservices)

5. Installing Cortex Certifai Python packages

6. Configuring Certifai CLI with connection details for Certifai Pro VM

7. Constructing Certifai scan definitions for a binary classification model

8. Uploading the required datasets for the scan into the Azure storage account container configured for Certifai Pro (in Step 1)

9. Submitting a remote scan job to the Certifai Pro instance through the Certifai CLI using the scan definition (constructed in Step 6)

## 1. Install Certifai Pro VM from the Azure Marketplace

You can find and install a personal instance of Cortex Certifai Pro in the [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/cognitive-scale.cortex-certifai-pro?tab=Overview). Please follow the instructions from the official Certifai docs for [Azure setup](https://cognitivescale.github.io/cortex-certifai/docs/platforms/azure/azure-setup).

The installation process includes:

- Completing the Certifai Pro Azure  Marketplace subscription and VM instantiation.
- Configuring your Certifai Pro instance with blob storage containers and credentials for an Azure Storage account.
- (Optionally) Installing sample reports for a variety of usecases in Finance, Healthcare and Insurance to understand how the AI Trust Index scores generated by Certifai.
- Configuring Custom SSL certificates (if needed)

### Configure Storage

Follow the [Certifai Console Storage Setup](https://cognitivescale.github.io/cortex-certifai/docs/platforms/azure/azure-setup#certifai-console-storage-setup) instructions provided in the official Certifai docs.

When you configure the storage parameters for your Azure Certifai Pro instance, make note of the **Scan Directory field (Blob Container Name)**, which is where scan reports generated in this tutorial are stored.

On the Certifai Console Storage Settings page:

- You may uncheck or check the `Install Sample Reports` option
- You MUST check the `Download Kubeconfig` option

## 2. Download the Cortex Certifai Toolkit from Your Certifai Pro VM

After you have finished initial setup for your Certifai Pro VM, open the Certifai Console. Click the Help icon (top right) and select `Download Toolkit`. A zip file containing the Cortex Certifai Toolkit is downloaded to your computer.

If you're running an Azure-hosted notebook, upload the Certifai Toolkit zip file to the hosted notebook, and make note of its path.

### Prerequisites - Notebook Dependencies

If you are using an [Azure Machine Learning Notebook VM](https://docs.microsoft.com/en-us/azure/machine-learning/studio/create-workspace), you are all set. Otherwise, make sure you go through the [configuration notebook](https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/configuration.ipynb) to create an Azure workspace. Creating local and remote environments/dependencies are covered in the `configuration` notebook.

**NOTE**: To step through the `configuration` notebook, make sure you have necessary dependencies installed locally:

- python>=3.6.2,<3.7
- ipython
- matplotlib
- jupyter

You may also use Conda to create the local environment using the `certifai_azure_model_env.yml` file provided in the [cortex-certifai-examples repo](https://github.com/CognitiveScale/cortex-certifai-examples/blob/master/notebooks/azureml_model_headers_demo/certifai_azure_model_env.yml)

Open a terminal and `cd` into the folder where the `configuration` notebook is located and run `jupyter-notebook` to launch a jupyter notebook session.

Update the certifai_toolkit_path to point to where you uploaded the Certifai Toolkit (in the Azure-hosted notebook). This is where you install Cortex Certifai Python packages.

**NOTE**: Installing Cortex Certifai packages is covered separately later.

In [1]:
from pathlib import Path
pwd = !pwd
my_path = str(Path(pwd[0]).parents[0])
certifai_toolkit_path = my_path  + '/certifai_toolkit_1.3.2'
certifai_toolkit_path

'/mnt/batch/tasks/shared/LS_root/mounts/clusters/sko-compute-1-3-2i/code/users/skottaram/msft-workshop/certifai_toolkit_1.3.2'

## 3. Obtain credentials for Azure-hosted Model Endpoint Services

### Prerequisites - Hosted Azure German Credit Models as described in part2_certifai_workshop_scan_registered_models.ipynb

The values that the script obtains are:

- The Service URI and Key for accessing a Azure hosted SVM model service endpoint
- The Service URI and Key for accessing a Azure hosted Logistic Regression model service endpoint


## 4. Test the Obtained Azure ML Service URIs and Keys for Inference
The following cells load sample data and test the Service Keys and Endpoints obtained from `part2_certifai_workshop_scan_registered_models.ipynb`

In [2]:
service_logistic_uri  = 'REDACTED'
service_logistic_key  = 'REDACTED'

service_svm_uri       = 'REDACTED'
service_svm_key       = 'REDACTED'

In [3]:
# create json test data sample(from csv)

import json
sample_input = json.dumps({
"payload": {
    "instances": [
        [
            "... < 0 DM",
            6,
            "critical account/ other credits existing (not at this bank)",
            "radio/television",
            1169,
            "unknown/ no savings account",
            ".. >= 7 years",
            4,
            "male : single",
            "others - none",
            4,
            "real estate",
            "> 25 years",
            "none",
            "own",
            2,
            "skilled employee / official",
            1,
            "phone - yes, registered under the customers name",
            "foreign - yes"
        ]
    ]
}
})
sample_input

'{"payload": {"instances": [["... < 0 DM", 6, "critical account/ other credits existing (not at this bank)", "radio/television", 1169, "unknown/ no savings account", ".. >= 7 years", 4, "male : single", "others - none", 4, "real estate", "> 25 years", "none", "own", 2, "skilled employee / official", 1, "phone - yes, registered under the customers name", "foreign - yes"]]}}'

In [4]:
import requests
import json

headers_svm = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {service_svm_key}'          
          }
headers_logistic = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {service_logistic_key}'          
          }

response = requests.post(
    service_svm_uri, data=sample_input, headers=headers_svm)
print('SVM Model Endpoint Inference Test')
print(response.status_code)
print(response.elapsed)
print(response.json())

print('Logistic Regression Model Endpoint Inference Test')
response = requests.post(
    service_logistic_uri, data=sample_input, headers=headers_logistic)
print(response.status_code)
print(response.elapsed)
print(response.json())

SVM Model Endpoint Inference Test
200
0:00:00.500979
{'payload': {'predictions': [1]}}
Logistic Regression Model Endpoint Inference Test
200
0:00:00.544394
{'payload': {'predictions': [1]}}


## 5. Installing Cortex Certifai Python Packages

You must install the following python packages to enable initiating a Cortex Certifai scan
`required-packages`

- cortex-certifai-scanner
- cortex-certifai-engine
- cortex-certifai-common

`optional-packages`

- cortex-certifai-client
- cortex-certifai-console

The following steps make use of the `certifai_toolkit_path` variable configured earlier to point to the Toolkit downloaded from your Certifai Pro VM

In [5]:
!find $certifai_toolkit_path/packages/all -type f   -name "*common-*"                      | xargs -I % sh -c 'pip install % ' ;
!find $certifai_toolkit_path/packages/python3.6/ -type f   -name "*engine-*"               | xargs -I % sh -c 'pip install % ' ;
!find $certifai_toolkit_path/packages/all -type f   -name "*client-*"                      | xargs -I % sh -c 'pip install % ' ;
!find $certifai_toolkit_path/packages/all -type f   -name "*scanner-*"                     | xargs -I % sh -c 'pip install % ' ;

Processing /Users/pkandarpa/Downloads/toolkit/packages/all/cortex-certifai-common-1.2.14-115-g96dc7f43.zip
Processing /Users/pkandarpa/Library/Caches/pip/wheels/e1/8b/65/3294e5b727440250bda09e8c0153b7ba19d328f661605cb151/toolz-0.10.0-cp36-none-any.whl
Collecting typing_extensions<4.0,>=3.6.6
  Using cached typing_extensions-3.7.4.2-py3-none-any.whl (22 kB)
Building wheels for collected packages: cortex-certifai-common
  Building wheel for cortex-certifai-common (setup.py) ... [?25ldone
[?25h  Created wheel for cortex-certifai-common: filename=cortex_certifai_common-1.2.14-py3-none-any.whl size=69271 sha256=b351cfbd9b8e81210ad53dde4d5924660c1e9a8903ed6bc22394ad2aedd40c69
  Stored in directory: /Users/pkandarpa/Library/Caches/pip/wheels/2c/d3/b3/ca04f8b25817cd4b106cfd27f36078bb497a86429a431700b2
Successfully built cortex-certifai-common
Installing collected packages: toolz, typing-extensions, cortex-certifai-common
Successfully installed cortex-certifai-common-1.2.14 toolz-0.10.0 typin

## 6. Use Cortex Certifai Client CLI to Configure Remote Certifai Pro VM
Use the `certifai` CLI tool to configure remote access to the Certifai Pro VM

#### CLI commands

```
# remove flower brackets if not in a jupyter notebook cell
certifai remote config --file certifai-kubeconfig.json --alias {remote_alias} 
```

In [7]:
remote_alias = 'cpro-az'
!certifai remote config --file certifai-kubeconfig.json --alias {remote_alias}


Checking for access to Kubernetes cluster with context - certifai-pro
Connection to cluster succeeded, found API - v1
Scanner image found - cortex-certifai-scanner:local
Updating alias - cpro-az

Configuration updated from - certifai-kubeconfig.json


## 7. Construct Certifai Scan Definitions for Binary Classification models

Use the `certifai-scanner` python package to build a scan definition for the SVM and Logistic Regression models (via the Service Endpoints created earlier).

Configure scan definition parameters include (mandatory):

1. Prediction Task Outcomes and Values
2. Model Details (names, endpoints and more)
3. Datasets to evaluate the models on

Configure optional scan definition parameters that depend on the desired evaluation reports. Evaluation types include:

1. Fairness
2. Robustness
3. Explainability

In [8]:
# make sure certifai package was installed correctly
!certifai --version

Certifai version: 1.2.14
Scanner build: 1.2.14-115-g96dc7f43


In [9]:
# necessary imports for creating a scan

from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiModel, CertifaiModelMetric,
                                      CertifaiDataset, CertifaiGroupingFeature, CertifaiDatasetSource,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue)
from certifai.scanner.report_utils import scores, construct_scores_dataframe


### Define Cortex Certifai Task Type

- `CertifaiTaskOutcomes` : Cortex Certifai supports classification as well as regression models. The task type of this tutorial is binary-classification (e.g. predicting whether a loan should be granted or not)
- `CertifaiOutcomeValue` : Define the different outcomes possible from the model predictions. These models predict either 1(loan granted) or 2(loan denied)

**NOTE**: Please refer to [Certifai Api Docs](https://cognitivescale.github.io/cortex-certifai/certifai-api-ref/certifai.scanner.builder.html) for more details

In [11]:
# Create the scan object from scratch using the ScanBuilder class with tasks and outcomes

# First define the possible prediction outcomes
task = CertifaiPredictionTask(CertifaiTaskOutcomes.classification(
    [
        CertifaiOutcomeValue(1, name='Loan granted', favorable=True),
        CertifaiOutcomeValue(2, name='Loan denied')
    ]),
    prediction_description='Determine whether a loan should be granted')

#  create a certifai scan object and add the certifai task created above
scan = CertifaiScanBuilder.create('model_auth_demo',
                                  prediction_task=task)

scan

<certifai.scanner.builder.CertifaiScanBuilder at 0x1a1e041a58>

### Add Logistic and SVM Models to the Scan definition

Additional parameters that maybe provided to the `CertifaiModel` class can be gleaned from the [API Reference for CertifaiModel](https://cognitivescale.github.io/cortex-certifai/certifai-api-ref-1.2.14/certifai.scanner.builder.html#certifai.scanner.builder.CertifaiModel)

or `?CertifaiModel`

In [12]:
# Create a Certifai Model Object using the web service (from earlier) by passing the deployed web service url
first_model = CertifaiModel('SVM',
                            predict_endpoint=service_svm_uri)
scan.add_model(first_model)

second_model = CertifaiModel('logistic',
                            predict_endpoint=service_logistic_uri)
scan.add_model(second_model)

# Add corresponding model headers for service authentication and content-type

# add the default headers applicable to all models
scan.add_model_header(header_name='Content-Type',header_value='application/json')

# add defined headers corresponding to auth keys for respective model services
scan.add_model_header(header_name='Authorization', header_value=f'Bearer {service_svm_key}', model_id='SVM')
scan.add_model_header(header_name='Authorization', header_value=f'Bearer {service_logistic_key}', model_id='logistic')



## 8. Uploading Required Datasets into the Azure Storage Account Container

Use the `azure-storage-blob` python package to upload the evaluation dataset for the Certifai scan to the Azure storage account blob container configured in Step 1 of the Certifai Pro VM setup process

### Upload the Dataset to Azure Blob Storage

Get the connection string for the storage account that holds the container called {az_container_name} in the cell below. You can obtain the connection string by following the [Azure guide here](https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string).

Ensure that the storage account and container used here match the values you used at the beginning of this notebook to setup your Certifai Pro instance.

In [13]:
!pip install azure-storage-blob

Collecting azure-storage-blob
  Using cached azure_storage_blob-12.3.1-py2.py3-none-any.whl (279 kB)
Installing collected packages: azure-storage-blob
Successfully installed azure-storage-blob-12.3.1


You can get storage blob credentials by visiting Access keys -> Connection String from your Storage Account's page on the Azure Portal.

In [25]:
# upload our eval dataset to the blob storage container
az_container_name = 'pkandarpa' # this container should already exist. You can create one from the Azure Portal

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

# set credentials for azure storage account
az_credentials = 'REDACTED'


client = BlobServiceClient.from_connection_string(az_credentials)
german_credit_eval_data_file = "data/german_credit_eval.csv"
az_german_credit_blob_name = 'az-pro-example/german_credit_eval.csv'

# upload our evaluation dataset to an Azure Blob Storage Account Container.
blob_client = client.get_blob_client(container=az_container_name, blob=az_german_credit_blob_name)
with open(german_credit_eval_data_file, 'rb') as f:
    blob_client.upload_blob(f)

### Add the Evaluation Dataset to Scan Definition

Add the evaluation dataset to be used by Cortex Certifai to evaluate the model against the scan definition

In [15]:
# create an evaluation object and pass the evaluation dataset(csv) here 
eval_dataset = CertifaiDataset('evaluation',
                               CertifaiDatasetSource.csv(url=f'abfs://{az_container_name}/{az_german_credit_blob_name}'))
scan.add_dataset(eval_dataset)

### Configure Model Fairness Evaluation

- Add `fairness` as an evaluation type to the scan object
- Create an `evaluation_dataset_id` to refer to the added evaluation datset

In [16]:
# Setup an evaluation for fairness on the above dataset using the model
# We'll look at disparity between groups defined by marital status and age
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('age'))
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('status'))
scan.add_evaluation_type('fairness')
scan.evaluation_dataset_id = 'evaluation'

In [17]:
# Because the dataset contains a ground truth outcome column which the model does not
# expect to receive as input we need to state that in the dataset schema (since it cannot
# be inferred from the CSV)
scan.dataset_schema.outcome_feature_name = 'outcome'

### Add Authorization Parameters to the Models
Use the following code block to update the scan definition that you constructed in the steps above with authorization headers needed to invoke your Azure hosted model endpoints

In [18]:
local_scan_definition_file = 'data/german_credit_scan_definition.yaml'
model_headers_template = f"""
model_headers:
  default:
  - name: Content-Type
    value: application/json
  - name: accept
    value: application/json
  defined:
  - model_id: SVM
    name: Authorization
    value: Bearer {service_svm_key}
  - model_id: logistic
    name: Authorization
    value: Bearer {service_logistic_key}
"""

with open(local_scan_definition_file, 'w') as f:
    scan.save(f)
# we also need to add the model headers section separately
with open(local_scan_definition_file, 'a') as f:
    f.write(model_headers_template)

## 9. Run a Remote Scan on Certifai Pro

In [19]:
reports_folder = f'abfs://{az_container_name}/az-pro-example/reports'
# run a remote scan
!certifai remote scan --alias cpro-az --definition-file data/german_credit_scan_definition.yaml --output {reports_folder}


Created job - certifai-scanner-09c105e4


### Get Logs of the Remote Scan

In [22]:
!certifai remote logs -a cpro-az -n $(certifai remote list -a cpro-az | head -2 | tail -1 | cut -d' ' -f1)


Printing logs for: certifai-scanner-09c105e4-ss62r

2020-06-11 14:47:27,752 - root - INFO - Validating license...
2020-06-11 14:47:27,752 - root - INFO - License is valid - expires: n/a
2020-06-11 14:47:27,765 - root - INFO - Generated unique scan id: bd155e196454
2020-06-11 14:47:27,766 - root - INFO - Validating input data...
2020-06-11 14:47:27,766 - root - INFO - Creating dataset with id: evaluation
2020-06-11 14:47:27,795 - azure.storage.common.storageclient - INFO - Client-Request-ID=785c01ea-abf2-11ea-a199-46c5297ec334 Outgoing request: Method=GET, Path=/pkandarpa, Query={'restype': 'container', 'comp': 'list', 'prefix': 'az-pro-example', 'delimiter': '/', 'marker': None, 'maxresults': None, 'include': None, 'timeout': None}, Headers={'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.8; Linux 5.3.0-1020-azure)', 'x-ms-client-request-id': '785c01ea-abf2-11ea-a199-46c5297ec334', 'x-ms-date': 'Thu, 11 Jun 2020 14:47:27 GMT', 'Authorization'

### List the Scan Jobs on the Certifai Pro Remote Instance
Use the CLI command to list scan jobs for the configured `remote_alias`
```
certifai remote list -a <remote_alias>
```

In [23]:
# Check the status of the triggered remote scan job
!certifai remote list -a cpro-az

NAME                        COMPLETIONS   DURATION      AGE           
certifai-scanner-09c105e4   0/1           1m            1m            


## View Reports from the Remote Scan

Once the remote scan's `COMPLETIONS` field says `1/1`, you can configure the Certifai Console to view the reports.

Go to the Certifai Console URL for the Certifai Pro VM instance you created earlier in this tutorial. 

1. Click the `User Icon` on the top right and select `Storage Settings` from the dropdown.
2. Update the `Scan Directory` field to the `reports_folder` variable configured in the previous cell. Omit `abfs://` while pasting this variable's value in the `Scan Directory` field. 
3. Save your settings and wait while the page reloads and loads reports from the remote scan. The scan will be available under the name `model_auth_demo`.


## Resource Cleanup


Delete the Certifai Pro VM instance created when you are done using it.

Azure ML pre-requisites Cleanup
 - Created and registered `logistic_model_azure` and `svm_model_azure` models to your Azure workspace -> Delete these. The exact names will be available from the output of the `azure_ml_service_keys.py` script. (From when you ran it earlier)
 - Created `german-credit-logistic-service` and `german-credit-svm-service` ACI (Azure Container Instance) webservices -> Delete these. The exact names will be available from the output of the `azure_ml_service_keys` script. (From when you ran it earlier)

- Once Cortex Certifai evaluation is complete, make sure to clear all Azure resources in order to avoid costs associated with running VMs and their associated resources on Azure
- Follow the [Azure Ml resource cleanup docs][https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-train#clean-up-resources]