<a href="https://colab.research.google.com/github/anastasiia-mozg/RUWA-PersonlNameDict/blob/main/Virny_lab_user_study.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Responsible AI Lab 5: Virny

In this example, we are going to apply [Virny](https://github.com/DataResponsibly/Virny) to conduct a deep performance profiling for 5 models trained on the [ACS Income dataset](https://github.com/zykls/folktables). This demonstration will show how to create input arguments for Virny, how to compute overall and disparity metrics with a metric computation interface, and how to build static and interactive visualizations based on the calculated metrics.

The structure of this notebook is the following:
* **Step 1**: Create a _config yaml_ for metric computation.
* **Step 2**: Preprocess a dataset and construct a _BaseFlowDataset_ object.
* **Step 3**: Tune models and create a _models config_.
* **Step 4**: Run a metric computation interface from Virny.
* **Step 5**: Compose disparity metrics using _Metric Composer_.

## Install necessary packages and import dependencies

In [130]:
# Install Virny using pypi. The library supports Python 3.8 and 3.9.
!pip install virny
# !pip install xgboost==1.7.2



In [8]:

import os
import warnings
warnings.filterwarnings('ignore')
os.environ["PYTHONWARNINGS"] = "ignore"

In [32]:
from pprint import pprint
from datetime import datetime, timezone

import pandas as pd

from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from virny.utils.custom_initializers import create_config_obj, read_model_metric_dfs
from virny.user_interfaces.multiple_models_api import compute_metrics_with_config
from virny.preprocessing.basic_preprocessing import preprocess_dataset
from virny.custom_classes.metrics_interactive_visualizer import MetricsInteractiveVisualizer
from virny.custom_classes.metrics_visualizer import MetricsVisualizer
from virny.custom_classes.metrics_composer import MetricsComposer
from virny.utils.model_tuning_utils import tune_ML_models

## **Step 1**: Create a _config yaml_ for metrics computation.

First, we need to create a _config yaml_, which includes the following parameters for metrics computation:

* **dataset_name**: str, a name of your dataset; it will be used to name files with metrics.

* **bootstrap_fraction**: float, the fraction from a train set in the range [0.0 - 1.0] to fit models in bootstrap (usually more than 0.5).

* **random_state**: int, a seed to control the randomness of the whole model evaluation pipeline.

* **n_estimators**: int, the number of estimators for bootstrap to compute stability and uncertainty metrics.

* **sensitive_attributes_dct**: dict, a dictionary where keys are sensitive attribute names (including intersectional attributes), and values are disadvantaged values for these attributes. Intersectional attributes must include '&' between sensitive attributes. You do not need to specify disadvantaged values for intersectional groups since they will be derived from disadvantaged values in sensitive_attributes_dct for each separate sensitive attribute in this intersectional pair.

In [10]:
config_yaml_path = os.path.join('.', 'experiment_config.yaml')
config_yaml_content = """
dataset_name: ACS_Income
bootstrap_fraction: 0.8
random_state: 42
n_estimators: 20  # Better to input the higher number of estimators than 100; this is only for this demonstration
sensitive_attributes_dct: {'SEX': '2', 'RAC1P': ['2', '3', '4', '5', '6', '7', '8', '9'], 'SEX & RAC1P': None}
"""

with open(config_yaml_path, 'w', encoding='utf-8') as f:
    f.write(config_yaml_content)

In [11]:
config = create_config_obj(config_yaml_path=config_yaml_path)
SAVE_RESULTS_DIR_PATH = os.path.join('.', 'results', f'{config.dataset_name}_Metrics_{datetime.now(timezone.utc).strftime("%Y%m%d__%H%M%S")}')

## **Step 2**: Preprocess a dataset and construct a _BaseFlowDataset_ object.

Second, we need to import a dataset and preprocess it.

Note that if you get an error importing the ACS Income dataset, this means that it is not available for your geographic location. In this case, you can use VPN to overcome this issue.

Import the dataset.

In [12]:
from virny.datasets import ACSIncomeDataset

data_loader = ACSIncomeDataset(state=['GA'], year=2018, with_nulls=False,
                               subsample_size=15_000, subsample_seed=42)
data_loader.full_df.head()

Downloading data for 2018 1-Year person survey for GA...


Unnamed: 0,SCHL,COW,MAR,OCCP,POBP,RELP,SEX,RAC1P,AGEP,WKHP,PINCP
0,23,7,3,230,36,0,1,1,55,55.0,1
1,16,1,5,4110,13,2,2,1,20,35.0,0
2,16,4,3,4130,51,0,2,1,59,30.0,0
3,18,4,1,4020,13,0,1,2,43,40.0,0
4,14,1,1,8300,20,1,2,2,33,20.0,0


Define preprocessing steps and initialize a column transformer.

In [13]:
column_transformer = ColumnTransformer(transformers=[
    ('categorical_features', OneHotEncoder(handle_unknown='ignore', sparse_output=False), data_loader.categorical_columns),
    ('numerical_features', StandardScaler(), data_loader.numerical_columns),
])

Construct a BaseFlowDataset object.

In [14]:
DATASET_SPLIT_SEED = 42
MODELS_TUNING_SEED = 42
TEST_SET_FRACTION = 0.2

base_flow_dataset = preprocess_dataset(data_loader=data_loader,
                                       column_transformer=column_transformer,
                                       sensitive_attributes_dct=config.sensitive_attributes_dct,
                                       test_set_fraction=TEST_SET_FRACTION,
                                       dataset_split_seed=DATASET_SPLIT_SEED)

## **Step 3**: Tune models and create a _models config_.

Next, we need to construct a _models config_ that includes initialized models you want to profile with Virny. For that, the models should be tuned using the _tune_ML_models()_ function from Virny or in any other convenient way.

Note that the model name convention for the _models config_ is '<MODEL_TYPE>' or '<MODEL_TYPE>__<ANY_SUFFIX>' (with two underscores), for example, 'KNeighborsClassifier' or 'KNeighborsClassifier\_\_'. This convention is needed to create appropriate bar charts for model selection in the interactive web app.

Define models and hyper-parameters to tune using GridSearchCV from sklearn.

In [15]:
models_params_for_tuning = {
    'LogisticRegression': {
        'model': LogisticRegression(random_state=MODELS_TUNING_SEED),
        'params': {
            'penalty': ['l2'],
            'C' : [0.0001, 0.1, 1, 100],
            'solver': ['newton-cg', 'lbfgs'],
            'max_iter': [250],
        }
    },
    'KNeighborsClassifier': {
        'model': KNeighborsClassifier(),
        'params': {
            'weights' : ['uniform'],
            'algorithm' : ['auto'],
            'n_neighbors' : [3, 4, 5],
        }
    },
    'XGBClassifier': {
        'model': XGBClassifier(random_state=MODELS_TUNING_SEED, verbosity=0),
        'params': {
            'learning_rate': [0.1],
            'n_estimators': [50],
            'max_depth': [5, 7],
            'lambda':  [10, 100]
        }
    }
}

Tune models using the _tune_ML_models()_ function from Virny and create the _models config_.

In [16]:
tuned_params_df, models_config = tune_ML_models(models_params_for_tuning, base_flow_dataset, dataset_name='ACS_Income', n_folds=3)
tuned_params_df

2025/10/27, 10:23:15: Tuning LogisticRegression...
2025/10/27, 10:23:43: Tuning for LogisticRegression is finished [F1 score = 0.7929772404392671, Accuracy = 0.8144999999999999]

2025/10/27, 10:23:43: Tuning KNeighborsClassifier...
2025/10/27, 10:24:00: Tuning for KNeighborsClassifier is finished [F1 score = 0.750755075501199, Accuracy = 0.7731666666666667]

2025/10/27, 10:24:00: Tuning XGBClassifier...
2025/10/27, 10:24:17: Tuning for XGBClassifier is finished [F1 score = 0.7776203499390398, Accuracy = 0.8015833333333333]



Unnamed: 0,Dataset_Name,Model_Name,F1_Score,Accuracy_Score,Model_Best_Params
0,ACS_Income,LogisticRegression,0.792977,0.8145,"{'C': 1, 'max_iter': 250, 'penalty': 'l2', 'so..."
1,ACS_Income,KNeighborsClassifier,0.750755,0.773167,"{'algorithm': 'auto', 'n_neighbors': 5, 'weigh..."
2,ACS_Income,XGBClassifier,0.77762,0.801583,"{'lambda': 10, 'learning_rate': 0.1, 'max_dept..."


In [17]:
pprint(models_config)

{'KNeighborsClassifier': KNeighborsClassifier(),
 'LogisticRegression': LogisticRegression(C=1, max_iter=250, random_state=42, solver='newton-cg'),
 'XGBClassifier': XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              feature_weights=None, gamma=None, grow_policy=None,
              importance_type=None, interaction_constraints=None, lambda=10,
              learning_rate=0.1, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=7,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, multi_strategy=None, n_estimators=50,
              n_jobs=None, ...)}


# **TODO 1:** Based on these accuracy metrics, which model would you choose for this classification task? Can you say anything about who is harmed when the model makes an error?

***Briefly write your answer here***

## **Step 4**: Run a metric computation interface from Virny.

After that we need to input the _BaseFlowDataset_ object, _models config_, and _config yaml_ to a metric computation interface and execute it. The interface uses subgroup analyzers to compute different sets of metrics for each privileged and disadvantaged group. As for now, our library supports **Subgroup Variance Analyzer** and **Subgroup Error Analyzer**, but it is easily extensible to any other analyzers. When the variance and error analyzers complete metric computation, their metrics are combined, returned in a matrix format, and stored in a file if defined.

In [18]:
metrics_dct = compute_metrics_with_config(base_flow_dataset, config, models_config, SAVE_RESULTS_DIR_PATH, notebook_logs_stdout=True)

Analyze multiple models:   0%|          | 0/3 [00:00<?, ?it/s]

Classifiers testing by bootstrap:   0%|          | 0/20 [00:00<?, ?it/s]

Classifiers testing by bootstrap:   0%|          | 0/20 [00:00<?, ?it/s]

Classifiers testing by bootstrap:   0%|          | 0/20 [00:00<?, ?it/s]

# **TODO 2:** Fill in the index of your chosen model below from the table in step 3, and run to see some more comprehensive metrics for the protected classes in the dataset

In [19]:
model_index =

View the computed metrics for one model.

In [20]:
sample_model_metrics_df = metrics_dct[list(models_config.keys())[model_index]]
sample_model_metrics_df[(sample_model_metrics_df['Metric'] == "TPR") | (sample_model_metrics_df['Metric'] == "TNR") | (sample_model_metrics_df['Metric'] == "PPV") | (sample_model_metrics_df['Metric'] == "FPR") | (sample_model_metrics_df['Metric'] == "FNR")]

Unnamed: 0,Metric,overall,SEX_priv,SEX_dis,RAC1P_priv,RAC1P_dis,SEX&RAC1P_priv,SEX&RAC1P_dis,Model_Name,Virny_Random_State,Model_Params,Runtime_in_Mins
9,TPR,0.638835,0.701893,0.537879,0.697165,0.46063,0.668852,0.4,KNeighborsClassifier,42,"{'algorithm': 'auto', 'leaf_size': 30, 'metric...",0.772617
10,TNR,0.831472,0.755556,0.895327,0.798246,0.889665,0.809796,0.917085,KNeighborsClassifier,42,"{'algorithm': 'auto', 'leaf_size': 30, 'metric...",0.772617
11,PPV,0.664646,0.669173,0.655385,0.68136,0.596939,0.671789,0.582278,KNeighborsClassifier,42,"{'algorithm': 'auto', 'leaf_size': 30, 'metric...",0.772617
12,FNR,0.361165,0.298107,0.462121,0.302835,0.53937,0.331148,0.6,KNeighborsClassifier,42,"{'algorithm': 'auto', 'leaf_size': 30, 'metric...",0.772617
13,FPR,0.168528,0.244444,0.104673,0.201754,0.110335,0.190204,0.082915,KNeighborsClassifier,42,"{'algorithm': 'auto', 'leaf_size': 30, 'metric...",0.772617


# **TODO 3:** Consider your chosen model's error rates (FPR, FNR) for the protected classes. Does your model further disadvantage the already disadvantaged group? (note that _priv denotes the privileged group and _dis denotes the disadvantaged group for each protected class)

***Briefly write your answer here***

# **TODO 4:** Select one additional model. Fill in model names to compare additional model's metrics for protected classes against those of your original model. Would you change your original choice? Why or why not?

In [None]:
chosen_model, additional_model = 'originaly_chosen_model_name', 'additional_model_name'

df_concat = pd.concat([metrics_dct[chosen_model], metrics_dct[additional_model]], keys=[chosen_model, additional_model])
df_concat = df_concat.reset_index()
df_concat.drop(columns=['level_0', 'level_1'], inplace=True)

df_concat.pivot(index='Metric', columns='Model_Name', values=[ 'overall', 'SEX_priv', 'SEX_dis', 'RAC1P_priv', 'RAC1P_dis', 'SEX&RAC1P_priv', 'SEX&RAC1P_dis'])

***Briefly write your answer here***

## **Step 5**: Compose disparity metrics using _Metric Composer_.

To compose disparity metrics, the _Metric Composer_ should be applied. **Metric Composer** is responsible for the second stage of the model audit. Currently, it computes our custom error disparity, stability disparity, and uncertainty disparity metrics, but extending it for new disparity metrics is very simple. We noticed that more and more disparity metrics have appeared during the last decade, but most of them are based on the same group specific metrics. Hence, such a separation of group specific and disparity metrics computation allows us to experiment with different combinations of group specific metrics and avoid group metrics recomputation for a new set of disparity metrics.

Read the computed metrics from a file created by the metric computation interface.

In [22]:
models_metrics_dct = read_model_metric_dfs(SAVE_RESULTS_DIR_PATH, model_names=list(models_config.keys()))

Compose disparity metrics using _MetricsComposer_.

In [23]:
metrics_composer = MetricsComposer(models_metrics_dct, config.sensitive_attributes_dct)
models_composed_metrics_df = metrics_composer.compose_metrics()

# **TODO 5:** Fill in the model name variable with the model you chose in TODO 4 and calculate the composite metrics. Do the composite metrics affirm your choice of model? Why or why not?

In [24]:
# models_composed_metrics_df
model_name = ''

models_composed_metrics_df = models_composed_metrics_df[models_composed_metrics_df['Model_Name'] == model_name]
models_composed_metrics_df[(models_composed_metrics_df['Metric'] == 'Accuracy_Difference') | (models_composed_metrics_df['Metric'] == 'Disparate_Impact') | (models_composed_metrics_df['Metric'] == 'Statistical_Parity_Difference')]

Unnamed: 0,Metric,SEX,RAC1P,SEX&RAC1P,Model_Name
0,Accuracy_Difference,0.047213,0.003469,0.02156,LogisticRegression
13,Statistical_Parity_Difference,-0.180905,-0.156503,-0.193594,LogisticRegression
14,Disparate_Impact,0.557404,0.578086,0.452255,LogisticRegression


***Briefly write your answer here***

## **Step 6**: Create static visualizations using _Metric Static Visualizer_.

**Metric Static Visualizer** allows us to build static visualizations for the computed metrics. It unifies different preprocessing methods for the computed  metrics and creates various data formats required for visualizations. Hence, users can simply call methods of the _MetricsVisualizer_ class and get custom plots for diverse metric analysis.

# **TODO 6:** Run the visualization with predifined metrics to visualize (you can add some from given list)

Metrics that can be visualized = ```'Overall_Uncertainty',
 'Std',
 'Aleatoric_Uncertainty',
 'Mean_Prediction',
 'IQR',
 'Statistical_Bias',
 'Epistemic_Uncertainty',
 'Label_Stability',
 'Jitter',
 'TPR',
 'TNR',
 'PPV',
 'FNR',
 'FPR',
 'Accuracy',
 'F1',
 'Selection-Rate',```

In [25]:
metrics_to_visualize = [
 'TPR',
 'TNR',
 'PPV',
 'FNR',
 'FPR']

In [26]:
visualizer = MetricsVisualizer(models_metrics_dct, models_composed_metrics_df, config.dataset_name,
                               model_names=list(models_config.keys()),
                               sensitive_attributes_dct=config.sensitive_attributes_dct)

In [27]:
visualizer.create_overall_metrics_bar_char(
    metric_names=metrics_to_visualize,
    plot_title="Overall Metrics"
)