Skip to content
R K Sharath Kumar edited this page Jan 15, 2021 · 4 revisions

Use this file to gather the content required for the pattern overview. Copy this draft-patten-template.md file, replace with your own content for each of the sections below, and attach your file to the GitHub tracking issue for your pattern.

For full details on requirements for each section, see "Write a code pattern overview" on w3 Developer: https://w3.ibm.com/developer/docs/content/write-overview/

Short title

Bias-mitigation-of-machine-learning-models-using-aif360

Long title

Build a predictive model by eliminating the bias to churn out unbiased results.

Author

Provide names and IBM email addresses.

URLs

Github repo

"Get the code": https://github.com/IBM/bias-mitigation-of-machine-learning-models-using-aif360

  • GitHub URL

Other URLs

"View the demo": Provide the link to YouTube video of a recorded demo of the pattern. This is STRONGLY recommended. If you have other videos of demos or running apps, describe them here and add the URL below.

  • Video URL - NA
  • Demo URL -- NA

Summary

How do we remove bias from the machine learning models and ensure that the predictions are fair? What are the three stages in which the bias mitigation solution can be applied? This code pattern answers these questions and more to help developers, data scientists, stakeholders take informed decision by consuming the results of predictive models.

Technologies

Description

Fairness in data, and machine learning algorithms is critical to building safe and responsible AI systems from the ground up by design. Both technical and business AI stakeholders are in constant pursuit of fairness to ensure they meaningfully address problems like AI bias. While accuracy is one metric for evaluating the accuracy of a machine learning model, fairness gives us a way to understand the practical implications of deploying the model in a real-world situation.

Fairness is the process of understanding bias introduced by your data, and ensuring your model provides equitable predictions across all demographic groups. Rather than thinking of fairness as a separate initiative, it’s important to apply fairness analysis throughout your entire ML process, making sure to continuously reevaluate your models from the perspective of fairness and inclusion. This is especially important when AI is deployed in critical business processes, like credit application reviews and fraud detection, that affect a wide range of end users.

How does the fairness algorithm work?

The bias mitigation algorithm can be applied in three different stages of model building. These stages are pre-processing, in-processing & post-processing. The below diagram demonstrates how it works.

Machine learning models are increasingly used to inform high-stakes decisions about people. Although machine learning, by its very nature, is always a form of statistical discrimination, the discrimination becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage. Bias in training data, due to either prejudice in labels or under-/over-sampling, yields models with unwanted bias.

The AIF360 Python package contains nine different algorithms, developed by the broader algorithmic fairness research community, to mitigate that unwanted bias. They can all be called in a standard way, very similar to scikit-learn’s fit/predict paradigm. In this way, we hope that the package is not only a way to bring all of us researchers together, but also a way to translate our collective research results to data scientists, data engineers, and developers deploying solutions in a variety of industries. You can learn more about AIF 360 here.

Flow

  1. Log in to Watson Studio powered by spark, initiate Cloud Object Storage, and create a project.
  2. Upload the .csv data file to Object Storage.
  3. Load the Data File in Watson Studio Notebook.
  4. Install aif 360 Toolkit in the Watson Studio Notebook.
  5. Analyze the results after applying the bias mitigation algorithm during pre-processing, in-processing & post-processing stages.

Instructions

Steps using AIF 360 on Watson Studio

  1. Create an account with IBM Cloud
  2. Create a new Watson Studio project
  3. Add Data
  4. Create the notebook
  5. Insert the data as dataframe
  6. Run the notebook
  7. Analyze the results

1. Create an account with IBM Cloud

Sign up for IBM Cloud. By clicking on create a free account you will get 30 days trial account.

2. Create a new Watson Studio project

Sign up for IBM's Watson Studio.

Click on New Project and select per below.

Define the project by giving a Name and hit 'Create'.

3. Add Data

Clone this repo Navigate to data/assets and save the file by name fraud_data.csv on the disk. The zip file Pipeline_LabelEncoder-0.1.zip also needs to be saved onto the disk.

Click on Assets and select Browse and add the csv file from your file system. Repeat the step and add the zip file as an asset.

4. Create the notebook

After the notebooks are imported, click on Not Trusted and select the option as Yes to trust the source of the notebooks.

This notebook has been created to demonstrate the steps for building the model using Watson Studio platform for fraud prediction usecase. For other usecases, the notebook has to be modified to read the new dataset and the same steps can be executed.

5. Insert the data as dataframe

Click on 0010 icon at the top right side which will bring up the data assets tab.

Click on Insert to code dropdown and select the option Insert Pandas Dataframe.

6. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press the Play button in the toolbar.
  • Batch mode, in sequential order.
    • From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.

7. Analyze the results

After we run all cells in the notebook, the results are displayed at the end of each notebook per below.

Pre-processing results

Before pre-processing

We can observe that, privileged group had 37% more chance of getting a favorable outcome because of the bias in the dataset.

After pre-processing

We can observe that, after applying bias mitigation algorithm, there is no unfair advantage between priviledged & unpriviledged groups.

In-processing results

We can observe that, after applying bias mitigation algorithm during training, the equal opportunity difference has reduced from 17% to just 3%. The Average odds difference has reduced from 22% to 13% thereby making the model unbiased to a good extent. There's reduction in the numbers before and after de-biasing the dataset metrics.

Post-processing results

We can observe that, after applying bias mitigation algorithm on predicted labels, there's a change in balanced accuracy and equal opportunity difference indicating the fairness of the results.

Components and services

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • IBM AI Fairness 360 toolkit: AI Fairness 360 (AIF360), a comprehensive open-source toolkit of metrics to check for unwanted bias in datasets and machine learning models, and state-of-the-art algorithms to mitigate such bias.

  • IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market. This code pattern uses Cloud Object Storage.

Runtimes

Indicate languages or environments your pattern code requires, if applicable. (java, javascript/node, .net, swift, go, php, python, ruby, etc.)

  • Python

Related IBM Developer content

List any IBM Developer resources that are closely related to this pattern, such as other patterns, blog posts, tutorials, etc..

Related links

  • Fraud Detection Code Pattern: This code pattern discusses building a system for creating predictions that can be used in different scenarios. It focuses on predicting fraudulent transactions, which can reduce monetary loss and risk mitigation. But, you can use the same approach for predicting customer churn, demand and supply forecast, and more.