# RAI Dashboard Tutorial - Getting Started

Github repo: https://github.com/cisco-open/ResponsibleAI.git

## Introduction

In this notebook, you will learn the simple Demo how RAI can be used without the dashboard to calculate and report on the metrics for a machine learning task.

## Install Dependencies and RAI Using Pip

List of full dependencies can be found in the [README.md](https://github.com/cisco-open/ResponsibleAI/blob/main/requirements.txt).

Project Requires Redis. Redis can be downloaded using the .msi file at:https://github.com/microsoftarchive/redis/releases/tag/win-3.2.100.

Please also run``` pip install -r requirements.txt```

Please first install the required packages found in requirements.txt. 
RAI can then be installed using ```pip install --editable```


In [None]:
import os
import sys
import inspect
import pandas as pd
from sklearn.model_selection import train_test_split
from RAI.AISystem import AISystem, Model
from RAI.Analysis import AnalysisManager
from RAI.dataset import NumpyData, Dataset
from RAI.utils import df_to_RAI
import numpy as np
from sklearn.ensemble import RandomForestClassifier
current_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parent_dir = os.path.dirname(current_dir)
sys.path.insert(0, parent_dir)
np.random.seed(50)

It starts by importing the necessary libraries

## Get Dataset

In [None]:
data_path = "../data/adult/"
train_data = pd.read_csv(data_path + "train.csv", header=0, skipinitialspace=True, na_values="?")
test_data = pd.read_csv(data_path + "test.csv", header=0, skipinitialspace=True, na_values="?")
all_data = pd.concat([train_data, test_data], ignore_index=True)

It read the data from the file train.csv and test.csv into a pandas DataFrame object named all_data.

Next,it create another Dataframe with just test data called xTest and yTest.

Finally, it create an instance of Dataset that has both train and test datasets inside of it.

## Get X and y data, as well as RAI Meta information from the Dataframe

In [None]:
rai_meta_information, X, y, rai_output_feature = df_to_RAI(all_data,
target_column="income-per-year", normalize="Scalar")

It will create a Dataframe with two columns: X and y.

The first column, X, is the input data for the model.

The second column, y, is the output of the model and then creates a new Dataframe called rai_output_feature that contains all of the features from RAI’s output feature set.

## Create Data Splits and pass them to RAI

In [None]:
xTrain, xTest, yTrain, yTest = train_test_split(X, y, random_state=1, stratify=y)
dataset = Dataset({"train": NumpyData(xTrain, yTrain), "test": NumpyData(xTest, yTest)})

It splits the training set into two parts: xTrain and yTrain.

Next it create another Dataframe with just test data called xTest and yTest.

Finally,it create an instance of Dataset that has both train and test datasets inside of it.

## Create Model and RAIs representation of it

In [None]:
clf = RandomForestClassifier(n_estimators=4, max_depth=6)
model = Model(agent=clf, output_features=rai_output_feature,
name="cisco_income_ai", predict_fun=clf.predict,
predict_prob_fun=clf.predict_proba,
description="Income Prediction AI", model_class="RFC")

It Create a Random Forest Classifier object with 4 estimators and 6 trees.

The model is then created, which has the attribute set to clf, output_features set to rai_output_feature, name “cisco_income_ai”, predict function set to clf.predict, predict probability function set to clf.predict_proba, description “Income Prediction AI”, and model classed as RFC (Random Forest Classifier).

It creates a Random Forest Classifier model and then uses it to predict the income of an individual.

## Create RAI AISystem to pass all relevant data to RAI

In [None]:
ai = AISystem(name="income_classification",  task='binary_classification',
             meta_database=rai_meta_information,
             dataset=dataset, model=model)

configuration = {"fairness": {"priv_group": {"race": {"privileged": 1, "unprivileged": 0}},
                "protected_attributes": ["race"], "positive_label": 1},
                "time_complexity": "polynomial"}
ai.initialize(user_config=configuration)


It creates a new AISystem called “income_classification” with the name of income_classification.

The task is set to binary classification, and the meta-database is set to rai_meta information. The dataset is then passed in as well as model.

Next, it create a configuration object that has fairness. In fairness, there are three groups: privileged (priv), unprivileged (unpriv), and race.The configuration section contains information about what data will be used for this AISystem.

In this case, it has two values that are relevant: priv_group which defines how privileged or unprivileged people are in terms of race, and fairness which defines how much fairness there should be in terms of privilege groups.

## Train the model, generate predictions

In [None]:
clf.fit(xTrain, yTrain)
test_predictions = clf.predict(xTest)

It then trains the model on a training set of data, and generates predictions for a test set of data.

## Pass predictions to RAI

In [None]:
ai.compute({"test": {"predict": test_predictions}}, tag='model')

Generated Predictions for a test set of data passes these predictions to RAI, which returns an object with two fields: test and predict.

The first field is called test, and it contains an array of objects that represent the prediction results for each row in xTest.

The second field is called predict, and it contains one object that represents the predicted value for each row in xTrain.

## View results computed by RAI

In [None]:
ai.display_metric_values(display_detailed=True)

analysis = AnalysisManager()
result = analysis.run_analysis(ai, "test", "FairnessAnalysis")
print(result["FairnessAnalysis"].to_string())

Analysis created
==== Group Fairness Analysis Results ====
1 of 4 tests passed.

Statistical Parity Difference Test:
This metric is The difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.
The idea value is 0.0.
It's value of -0.11160752641979553 is not between between 0.1 and -0.1 indicating that there is unfairness.

Equal Opportunity Difference Test:
This metric is The difference of true positive rates between the unprivileged
and the privileged groups.
The true positive rate is the ratio of true positives to the total number of actual positives for a given group.
The ideal value is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group.
It's value of -0.12121212121212122 is not between between 0.1 and -0.1 indicating that there is unfairness.

Average Odds Difference Test:
This metric is The average difference of false positive rate (false positives
/ negatives) and true positive rate (true positives / positives) between unprivileged and privileged groups.
The ideal value is 0.  A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group..
It's value of -0.08017127799736495 is between between 0.1 and -0.1 indicating
that there is fairness.

Disparate Impact Ratio Test:
This metric is The ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.
The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value > 1 implies a higher benefit for the unprivileged group.
It's value of 0.2581382067390062 is not between between 1.25 and 0.8 indicating that there is unfairness.