# Running Certifai Pro scans on Azure ML models

## Introduction

Cortex Certifai Pro is a single user version of Cortex Certifai that runs on an Azure VM and can be installed from the Azure Marketplace.  Certifai enables data scientists to define, scan, and analyze ML models to evaluate their fairness, robustness, and explainability.

Certifai Pro VMs run scans on ML models with the aid of the Certifai Toolkit, a downloadable set of Python packages and CLI tools.

​You can find more details about Certifai on the official [documentation site](https://cognitivescale.github.io/cortex-certifai/).

​This tutorial helps users of Azure Machine Learning resources (hosted notebooks/models/endpoints) setup their models and ready them for scanning with Certifai Pro.

In addition, it walks you through using the Certifai Python API to create scan definitions that can be passed to a Certifai Pro instance along with datasets and secrets (if needed).

## Overview

This notebook-tutorial takes you through the following processes:

1. Installing Certifai Pro in an Azure cloud environment and configuring your Certifai Pro instance with storage parameters for a pre-existing container in an Azure Storage Account

2. Uploading the required datasets for the scan into the Azure storage account container configured for Certifai Pro (in Step 1)

3. Update the scan definition yaml file to set the uploaded dataset path (from #2)

4. Configuring Certifai CLI with connection details for Certifai Pro VM

5. Submit a remote scan job to the Certifai Pro instance through the Certifai CLI using the scan definition (constructed in part2 and part3 of the notebook)

# 1. Install Certifai Pro VM from the Azure Marketplace

You can find and install a personal instance of Cortex Certifai Pro in the [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/cognitive-scale.cortex-certifai-pro?tab=Overview). Please follow the instructions from the official Certifai docs for [Azure setup](https://cognitivescale.github.io/cortex-certifai/docs/platforms/azure/azure-setup).

The installation process includes:

- Completing the Certifai Pro Azure  Marketplace subscription and VM instantiation.
- Configuring your Certifai Pro instance with blob storage containers and credentials for an Azure Storage account.
- (Optionally) Installing sample reports for a variety of usecases in Finance, Healthcare and Insurance to understand how the AI Trust Index scores generated by Certifai.
- Configuring Custom SSL certificates (if needed)

### Configure Storage

Follow the [Certifai Console Storage Setup](https://cognitivescale.github.io/cortex-certifai/docs/platforms/azure/azure-setup#certifai-console-storage-setup) instructions provided in the official Certifai docs.

When you configure the storage parameters for your Azure Certifai Pro instance, make note of the **Scan Directory field (Blob Container Name)**, which is where scan reports generated in this tutorial are stored.

On the Certifai Console Storage Settings page:

- You may uncheck or check the `Install Sample Reports` option
- You MUST check the `Download Kubeconfig` option

## 2. Use the `azure-storage-blob` python package to upload the evaluation dataset for the Certifai scan to the Azure storage account blob container configured in Step 1 of the Certifai Pro VM setup process

In [None]:
# install azure blob store package to upload evaluation dataset
!pip install azure-storage-blob==12.3.1

### Upload the Dataset to Azure Blob Storage

Get the connection string for the storage account that holds the container called {az_container_name} in the cell below. You can obtain the connection string by following the [Azure guide here](https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string).

Ensure that the storage account and container used here match the values you used at the beginning of this notebook to setup your Certifai Pro instance.

In [14]:
az_container_name = 'target-encoding-multiclass'

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

# set credentials for azure storage account
az_credentials = 'REDACTED'
client = BlobServiceClient.from_connection_string(az_credentials)

base_path = '../..'
all_data_file = f"{base_path}/datasets/german_credit_eval_multiclass_encoded.csv"
expln_mini_data_file = f"{base_path}/datasets/german_credit_eval_multiclass_encoded_expln.csv"


# upload our evaluation dataset to an Azure Blob Storage Account Container.
for file in [all_data_file, expln_mini_data_file]:
    with open(file, 'rb') as f:
        blob = f"az-pro-example/{file.split('/')[-1]}"
        print(f'uploading file {blob}')
        client.get_blob_client(container=az_container_name, blob=blob).upload_blob(f)

uploading file az-pro-example/german_credit_eval_multiclass_encoded.csv
uploading file az-pro-example/german_credit_eval_multiclass_encoded_expln.csv


## 3. Update Scan Definition

- update the `evaluation` and `explanation` dataset `url` to uploaded dataset path (#2)

In [17]:
# load the scan definition
import yaml
local_scan_definition_file = 'target_encoded_gcredit_multiclass_scan_def.yaml'

with open(local_scan_definition_file) as file:
    scan_def = yaml.load(file, Loader=yaml.FullLoader)

In [37]:
for ds in scan_def['datasets']:
    if ds['dataset_id'] == 'evaluation':
        ds['url'] =  f'abfs://{az_container_name}/az-pro-example/{all_data_file.split("/")[-1]}'
    elif ds['dataset_id'] == 'explanation':
        ds['url'] =  f'abfs://{az_container_name}/az-pro-example/{expln_mini_data_file.split("/")[-1]}'
    else:
        print('invalid dataset. Not supported')

In [38]:
# save yaml to disk
with open(local_scan_definition_file, 'w') as file:
    yaml.dump(scan_def, file)    

## 4. Use Cortex Certifai Client CLI to Configure Remote Certifai Pro VM
Use the `certifai` CLI tool to configure remote access to the Certifai Pro VM

#### CLI commands

```
# remove flower brackets if not in a jupyter notebook cell
certifai remote config --file certifai-kubeconfig.json --alias {remote_alias} 
```

In [40]:
remote_alias = 'cpro-az'
!certifai remote config --file certifai-kubeconfig.json --alias {remote_alias}


Checking for access to Kubernetes cluster with context - certifai-pro
Connection to cluster succeeded, found API - v1
Updating alias - cpro-az
Configuration updated from - certifai-kubeconfig.json


## 4. Run a Remote Scan on Certifai Pro

In [41]:
reports_folder = f'abfs://{az_container_name}/az-pro-example/reports'
# run a remote scan
!certifai remote scan --alias cpro-az --definition-file $local_scan_definition_file --output {reports_folder}

Created certifai scan ID 787b2ff6b9a4


In [46]:
!certifai remote logs -a cpro-az  $(certifai remote list -a cpro-az | head -2 | tail -1 | cut -d' ' -f1)

In [45]:
!certifai remote describe 787b2ff6b9a4 --alias cpro-az

Scan ID        Total Reports   Completed Reports   Failed Reports   Active Reports   Progress   State      Scan Duration   
787b2ff6b9a4   6               0                   0                6                0          Scanning   --Scanning--    
