# Quickstart

Get started here. We will assess a payment default prediction model for gender fairness using Lens, in 5 minutes. More in-depth information can be found in the [lens FAQ](https://credoai-lens.readthedocs.io/en/latest/notebooks/lens_faq.html#How-can-I-choose-which-assessments-to-run?)

## Setup

Setup instruction can be found on [readthedocs](https://credoai-lens.readthedocs.io/en/stable/setup.html)

**Find the code**

This notebook can be found on [github](https://github.com/credo-ai/credoai_lens/blob/develop/docs/notebooks/quickstart.ipynb).

**Data + Model Preparation (before Lens)**

Some quick setup. This script reflects all of your datascience work before assessment and integration with Credo AI.

Here we have a gradient boosted classifier trained on the UCI Credit Card Default Dataset.

In [1]:
# model and df are defined by this script
%run training_script.py

### Imports

In [2]:
# Base Lens imports
import credoai.lens as cl
# set default format for image displays. Change to 'png' if 'svg' is failing
%config InlineBackend.figure_formats = ['svg']

print(cl.__version__)

0.2.0


## Lens in 5 minutes

Below is a basic example where our goal is to evaluate the above model. We will rely on Lens defaults for this analysis, which will automatically determine the assessments that can be run.

We'll break these down [below](#Breaking-Down-The-Steps).

In [3]:
# set up model and data artifacts
credo_model = cl.CredoModel(name='credit_default_classifier',
                            model=model)

credo_data = cl.CredoData(name='UCI-credit-default',
                          X=X_test,
                          y=y_test,
                          sensitive_features=sensitive_features_test
                          )

# specify the metrics that will be used by the Fairness and Performance assessment
metrics = ['precision_score', 'recall_score', 'equal_opportunity']
assessment_plan = {'Fairness': {'metrics': metrics}}
# run lens
lens = cl.Lens(model=credo_model,
               data=credo_data,
               assessment_plan=assessment_plan)
               
# note that we use method chaining to make code more readable

# first we run the assessments and get the results into a dictionary
results = lens.run_assessments().get_results()
# we then display them
lens.display_results()

INFO:absl:Initializing Assessment (DatasetEquity)
INFO:absl:Initializing Assessment (DatasetFairness)
INFO:absl:Initializing Assessment (DatasetProfiling)
INFO:absl:Initializing Assessment (Fairness) with kwargs: {'metrics': ['precision_score', 'recall_score', 'equal_opportunity']}
INFO:absl:Initializing Assessment (ModelEquity)
INFO:absl:Initializing Assessment (Performance)
INFO:absl:Running assessment: DatasetEquity
INFO:absl:Running assessment: DatasetFairness
INFO:absl:Running assessment: DatasetProfiling


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

INFO:absl:Running assessment: Fairness
INFO:absl:Running assessment: ModelEquity
INFO:absl:Running assessment: Performance
INFO:absl:Reporters initialized for assessment-DatasetEquity
INFO:absl:Reporters initialized for assessment-DatasetFairness
INFO:absl:Reporters initialized for assessment-DatasetProfiling
INFO:absl:Reporters initialized for assessment-Fairness
INFO:absl:Reporters initialized for assessment-ModelEquity
INFO:absl:Reporters initialized for assessment-Performance


Unnamed: 0,0
summary,count mean std min 25% ...
sensitive_feature,SEX
highest_group,female
lowest_group,male
demographic_parity_difference,0.027886
demographic_parity_ratio,0.880573






Unnamed: 0,0
equity_test,"{'test_type': 'chisquared_contingency', 'stati..."
significant_posthoc_tests,"[{'test_type': 'chisquared_contingency', 'comp..."






Unnamed: 0,SEX,count,percentage
0,female,3045,40.6
1,male,4455,59.4






Unnamed: 0,SEX,target,count
0,female,0,2334
1,female,1,711
2,male,0,3539
3,male,1,916






Unnamed: 0,target,value
0,1,0.027886






Unnamed: 0,target,value
0,1,0.880573






Unnamed: 0,0
0,0.183368






Unnamed: 0,0
BILL_AMT1,0.055906
BILL_AMT2,0.054109
BILL_AMT3,0.052104
BILL_AMT6,0.051735
BILL_AMT4,0.051436
...,...
PAY_2_5.0,0.000011
PAY_1_6.0,0.000008
PAY_4_1.0,0.000004
AGE_79.0,0.000003






Unnamed: 0,AGE,BILL_AMT1,BILL_AMT2,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,EDUCATION,LIMIT_BAL,MARRIAGE,...,PAY_3,PAY_4,PAY_5,PAY_6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
value,0.011463,0.006657,0.007323,0.025036,0.002138,0.016289,0.017218,0.00238,0.02392,0.001081,...,0.003884,0.002,0.001991,0.000386,0.003599,0.006385,0.002134,0.014347,0.011267,0.007438
feature_type,categorical,continuous,continuous,continuous,continuous,continuous,continuous,categorical,categorical,categorical,...,categorical,categorical,categorical,categorical,continuous,continuous,continuous,continuous,continuous,continuous






Unnamed: 0,0
0,0.025036






Unnamed: 0,female-male
AGE,0.156972
BILL_AMT1,0.039869
BILL_AMT2,0.03653
BILL_AMT3,0.022809
BILL_AMT4,0.022489
BILL_AMT5,0.009267
BILL_AMT6,0.008714
LIMIT_BAL,-0.062292
PAY_AMT1,-0.037023
PAY_AMT2,0.012984






<Figure size 750x750 with 3 Axes>

<Figure size 750x525 with 1 Axes>

<Figure size 750x525 with 1 Axes>

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0_level_0,precision_score,recall_score,subtype
SEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.608374,0.347398,disaggregated_performance
male,0.639194,0.381004,disaggregated_performance






Unnamed: 0_level_0,value,sensitive_feature,subtype
metric_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
equal_opportunity,0.033606,SEX,fairness
precision_score,0.03082,SEX,parity
recall_score,0.033606,SEX,parity






<Figure size 450x450 with 1 Axes>

<Figure size 450x450 with 1 Axes>

<Figure size 1350x270 with 4 Axes>

<Figure size 1350x270 with 4 Axes>

<Figure size 1350x270 with 4 Axes>

Unnamed: 0,0
summary,count mean std min 25% ...
sensitive_feature,SEX
highest_group,female
lowest_group,male
demographic_parity_difference,0.010121
demographic_parity_ratio,0.924773






Unnamed: 0,equity_test
pvalue,0.568088
statistic,0.325891
test_type,chisquared_contingency






Unnamed: 0,value,subtype
accuracy_score,0.815067,overall_performance






<Figure size 1350x270 with 4 Axes>

<credoai.lens.Lens at 0x10dba1a60>

In [4]:
from credoai.assessment import *

In [21]:
# set up model and data artifacts
credo_model = cl.CredoModel(name='credit_default_classifier',
                            model=model)

credo_data = cl.CredoData(name='UCI-credit-default',
                          X=X,
                          y=y,
                          sensitive_features=sensitive_features
                          )

credo_training_data = cl.CredoData(name='training_UCI-credit-default',
                          X=X,
                          y=y,
                          sensitive_features=sensitive_features
                          )

# specify the metrics that will be used by the Fairness and Performance assessment
metrics = ['precision_score', 'recall_score', 'equal_opportunity']
assessment_plan = {'Fairness': {'metrics': metrics}}
# run lens
lens = cl.Lens(model=credo_model,
               data=credo_data,
               training_data=credo_training_data,
               assessments=[FairnessAssessment, SecurityAssessment],
               assessment_plan=assessment_plan)
               
# note that we use method chaining to make code more readable

# first we run the assessments and get the results into a dictionary
results = lens.run_assessments().get_results()
# we then display them
lens.display_results()

INFO:absl:Initializing Assessment (Fairness) with kwargs: {'metrics': ['precision_score', 'recall_score', 'equal_opportunity']}
INFO:absl:Initializing Assessment (Security)
INFO:absl:Running assessment: Fairness
INFO:absl:Running assessment: Security


Train on 15000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


HopSkipJump:   0%|          | 0/10 [00:00<?, ?it/s]

INFO:absl:Reporters initialized for assessment-Fairness
INFO:absl:No reporters found for assessment-Security


Unnamed: 0_level_0,precision_score,recall_score,subtype
SEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.936526,0.837104,disaggregated_performance
male,0.94069,0.847196,disaggregated_performance






Unnamed: 0_level_0,value,sensitive_feature,subtype
metric_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
equal_opportunity,0.010092,SEX,fairness
precision_score,0.004164,SEX,parity
recall_score,0.010092,SEX,parity






<Figure size 450x450 with 1 Axes>

<Figure size 450x450 with 1 Axes>

<Figure size 1350x270 with 4 Axes>

<Figure size 1350x270 with 4 Axes>

<Figure size 1350x270 with 4 Axes>

<credoai.lens.Lens at 0x2a6e54370>

## In-Depth Overview

CredoAI Lens is the assessment framework component of the broader CredoAI suite.
It is usable as a standalone gateway to assessments or in combination
with CredoAI's Governance App. 

Understanding how your AI systems are operating is the most important step in intervening upon your system. From the technically complicated questions of improving a system to the business-relevant questions of whether to deploy a system - everything is fundamentally founded upon good observability. Lens strives to make assessment comprehensive, easy, and adaptable. The primary outputs from Lens are **assessment results** in the form of various metrics. Lens also can visualize some of these results.

### Assessments

CredoAI Lens is an entry point to assessments developed by CredoAI, as well as the broader ecosystem of open-source assessments. Custom analytics can also be folded in easily (see the `lens customization` notebook)

AI system assessment starts with verifying standard performance metrics to an evolving set of assessments falling under the banner of *Responsible AI*. A non-exhaustive list includes

* Fairness
* Explainability
* Performance
* Robustness

These different categories of assessment differ substantially based on whether one is 
evaluating datasets or models, what kind of model (e.g., tabular, NLP, computer vision), and the use-case. As the ecosystem develops, Lens will support assessing a broader range of AI systems. Currently, we are focused on Fairness.

### Governance

While Lens is a stand-alone assessment framework, its value is increased when combined with the CredoAI Governance App. The app supports multi-stakeholder `Alignment` on how to assess your AI systems (e.g., what does good look like for this system?). It also supports translating assessment results into a Risk perspective that is scalable across your organization and understandable to diverse stakeholders.

Check out the [Connecting with Governance App](https://credoai-lens.readthedocs.io/en/latest/notebooks/governance_integration.html) for information.



## Breaking Down The Steps

### Preparing artifacts

Lens interacts with Credo Artifacts which wrap models and datasets and standardizes them for use by different assessments.
Below we create a `CredoModel` object, which automatically infers that the "model" object is from scikit-learn. We also create a `CredoData` object which is store X, y and sensitive features. Both of these objects are customizable. See `lens_customization.ipynb` for more information.


In [22]:
credo_model = cl.CredoModel(name='credit_default_classifier',
                            model=model)

credo_data = cl.CredoData(name='UCI-credit-default',
                          X=X_test,
                          y=y_test,
                          sensitive_features=sensitive_features_test
                          )

#### CredoModel

CredoModel serves as an adapter between arbitrary models and the assessments in CredoLens. Assessments depend on CredoModel instantiating certain methods. In turn, the methods an instance of CredoModel defines informs Lens which assessment can be automatically run.

The way a CredoModel works is by defining a "config" dictionary that outlines the models functionality.

Above the CredoModel functionality was inferred from the fact that the model (GraidentBoostingClassifier) is a scikit-learn model. But under the hood all that happens was it defined a `config`.

<br>



In [23]:
# the config was inferred from the model passed to CredoModel
credo_model.config

{'predict': <bound method ForestClassifier.predict of RandomForestClassifier()>,
 'predict_proba': <function credoai.artifacts.CredoModel._sklearn_style_config.<locals>.predict_proba(X)>}

#### CredoData

Just as CredoModel is an adapter between arbitrary models and the Lens assessment framework, CredoData serves as an adapter between tabular datasets and the assessments in CredoLens.

When you pass a dataframe to CredoData, CredoData separates it into an "X", "y", and, if applicable, "sensitive_features".

You can pass CredoData to Lens as a training dataset or a validation dataset. If the former, it will not be used to assess the model. Instead, dataset assessments will be performed on the dataset (e.g., fairness assessment). The validation dataset will be assessed in the same way, but _also_ used to assess the model, if provided.

In [24]:
credo_data = cl.CredoData(name='UCI-credit-default',
                          X=X_test,
                          y=y_test,
                          sensitive_features=sensitive_features_test
                          )
credo_data.X.head(3)

Unnamed: 0,LIMIT_BAL,EDUCATION,MARRIAGE,AGE,PAY_1,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6,...,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
2308,2,2,2,4,2,2,2,2,2,2,...,11581.0,12580.0,13716.0,14828.0,1500.0,2000.0,1500.0,1500.0,1500.0,2000.0
22404,14,1,2,5,2,2,2,2,2,2,...,116684.0,101581.0,77741.0,77264.0,4486.0,4235.0,3161.0,2647.0,2669.0,2669.0
23397,6,3,1,11,2,2,2,2,2,2,...,68530.0,69753.0,70111.0,70212.0,2431.0,3112.0,3000.0,2438.0,2500.0,2554.0


### Assessments 

Lens uses the functionality of the above artifacts to automatically determine which assessments can be run. In this case the Dataset Assessment and Fairness Assessment can be run. You can see what assessments are runnable with the following function.

Assessments can be chosen, rather than inferred. See the [lens FAQ](https://credoai-lens.readthedocs.io/en/latest/notebooks/lens_faq.html#How-can-I-choose-which-assessments-to-run?) for this functionality, and other information about assessments.

### Assessment Plan

The Assessment Plan describes how the assessments should be run. Think about it is as the *parameterization* of the assessments Lens will run.

If you use the Credo AI Governance App, the Assessment Plan is a principle artifact determined during the *Alignment Phase*. It is the output of multi-stakeholder collaboration. Lens will automatically download the Assessment Plan associated with your governance credentials (which uses another artifact: `CredoGovernance`)

You can also define the plan in code. Anything defined in the `assessment_plan` parameter will take precedence over the Assessment Plan retrieved from the Governance App.

**Setting up the Plan**

The Assessment Plan is a set of {assessment_name: parameter} pairs. The assessment name must be the name of one of the assessments, as returned by `get_usable_assessments` (above). In general, the name will be the name of the method without the trailing "assessment". For example, FairnessAssessment -> "Fairness". `get_assessment_names` will tell you the names you need.

The plan's parameters are passed to each Assessments `init_module` function.

Not all assessments *require* a plan, though many can be customized. In the case of "Performance" and "Fairness", a plan defining a list of metrics should be supplied, though default metrics will be defined for regression/classifiers.

In [25]:
# specify the metrics that will be used by the Fairness assessment
assessment_plan = {
    'Fairness': {'metrics': ['precision_score']},
    'Performance': {'metrics': ['precision_score']}
}

### Run Lens

Once we have the model and data artifacts, as well as the spec, we can run Lens. By default it will automatically infer which assessments to run, just as we manually did above.

In [26]:
lens = cl.Lens(model=credo_model,
               data=credo_data,
               assessment_plan=assessment_plan)

INFO:absl:Initializing Assessment (DatasetEquity)
INFO:absl:Initializing Assessment (DatasetFairness)
INFO:absl:Initializing Assessment (DatasetProfiling)
INFO:absl:Initializing Assessment (Fairness) with kwargs: {'metrics': ['precision_score']}
INFO:absl:Initializing Assessment (ModelEquity)
INFO:absl:Initializing Assessment (Performance) with kwargs: {'metrics': ['precision_score']}


**Getting Assessment Results**

To run the assessments with Lens, call `run_assessments`

`run_assessments` outputs the results into a dictionary that can be used for further processing. You can also export the data to a json or straight to Credo AI's Governance App by calling `lens.export()`

In [27]:
results = lens.run_assessments().get_results()
results['validation'].keys()

INFO:absl:Running assessment: DatasetEquity
INFO:absl:Running assessment: DatasetFairness
INFO:absl:Running assessment: DatasetProfiling


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

INFO:absl:Running assessment: Fairness
INFO:absl:Running assessment: ModelEquity
INFO:absl:Running assessment: Performance


dict_keys(['DatasetEquity', 'DatasetFairness', 'DatasetProfiling'])

In [28]:
# get the fairness results, from the Fairness assessment, run on the validation dataset
results['validation_model']['Fairness']['SEX']['fairness']

Unnamed: 0_level_0,value,sensitive_feature,subtype
metric_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
precision_score,0.020565,SEX,parity


### Visualizing assessments

Assessments aren't much if you can't visualize them. Lens allows you to visualize your results easily.

**Displaying Plots**

If you'd like to display the plots in your active jupyter notebook, set `display_results` to True. That's what we did at the top of this notebook.

**Exporting assessments To Credo AI's Governance App**

Finally, the assessments can also be exported to Credo AI's Governance App. Check out the [Connecting with Governance App](https://credoai-lens.readthedocs.io/en/latest/notebooks/governance_integration.html) tutorial for directions.
