Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for dealing with fairness in rankings #461

Merged
merged 34 commits into from
Nov 10, 2023

Conversation

andrewklayk
Copy link
Contributor

We've implemented a set of post-processing algorithms for dealing with fairness in rankings, as described in https://arxiv.org/abs/1905.01989 (KDD 2019), complete with a demo notebook and tests.
Along with the algorithms, we've added two useful metrics for evaluating rankings that were introduced or referenced in the paper (Infeasibility Index and Discounted Cumulative Gain). Further details are available in the demo notebook.

closes #455

@andrewklayk andrewklayk changed the title Main rerank Add methods for dealing with fairness in rankings Sep 8, 2023
andrewklayk and others added 20 commits September 8, 2023 13:00
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
…ive and relaxed

Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: andrewklayk <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
…site

Signed-off-by: Illia Kryvoviaz <illiakryvoviaz@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Copy link
Collaborator

@hoffmansc hoffmansc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @andrewklayk, this looks great! Other than the small comments, I think we can merge this.

I would love to include this in the aif360.sklearn subpackage as well. There is a slightly different API for those algorithms/metrics but it should actually be easier to use since you rely on DataFrames anyway. That can be a separate PR if you prefer or you can add it to this one.

examples/demo_deterministic_reranking.ipynb Outdated Show resolved Hide resolved
Comment on lines 368 to 384
"import tempfile\n",
"import requests\n",
"import zipfile\n",
"import os\n",
"import pandas as pd \n",
"\n",
"with tempfile.TemporaryDirectory() as temp_dir:\n",
" response = requests.get(\"http://www.seaphe.org/databases/LSAC/LSAC_SAS.zip\")\n",
" temp_file_name = os.path.join(temp_dir, \"LSAC_SAS.zip\")\n",
" with open(temp_file_name, \"wb\") as temp_file:\n",
" temp_file.write(response.content)\n",
" with zipfile.ZipFile(temp_file_name, 'r') as zip_ref:\n",
" zip_ref.extractall(temp_dir)\n",
" data = pd.read_sas(os.path.join(temp_dir, \"lsac.sas7bdat\"))\n",
" data = data.assign(gender=(data[\"gender\"] == b\"male\") * 1)\n",
" data['race'] = data['race1']\n",
" data = data[['race', 'gender', 'lsat', 'ugpa', 'zfygpa']]\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not use the built-in LawSchoolGPADataset class here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the initial idea, but I couldn't get the tempeh import to work, despite having installed the .[LawSchoolGPA] option of the package. I've tried installing tempeh separately, reinstalling AIF360 manually and so on, but to no end. Without the import, neither the LawSchoolGPADataset nor the fetch_lawschool_gpa work.

Copy link
Contributor Author

@andrewklayk andrewklayk Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An update on this: it seems like the SEAPHE website is no more (domain expired), which means that neither this code nor the AIF360 functions work. Could you suggest some other dataset in the library that would work as an example here? Or, alternatively, the LawSchoolGPA dataset is still available from the Internet Archive backup of the SEAPHE website.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is unfortunate... I've reached out to SEAPHE and they're investigating. Let's give it a week and see if they can get the website back up.

tests/test_demo_deterministic_reranking.py Outdated Show resolved Hide resolved
aif360/metrics/regression_metric.py Outdated Show resolved Hide resolved
aif360/metrics/__init__.py Outdated Show resolved Hide resolved
examples/demo_deterministic_reranking.ipynb Outdated Show resolved Hide resolved
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
@andrewklayk
Copy link
Contributor Author

andrewklayk commented Oct 25, 2023

Hello @hoffmansc,
Thank you for your remarks. I've addressed the issues, except for the usage of the SEAPHE dataset. The site is still down, and I've been unable to find a free-use dataset from a reputable source that would fit here as an example.
In my opinion, the "toy example" in the notebook is enough to explain the API, so perhaps we should ditch the dataset part alltogether?
As for the aif360.sklearn - I think it's better to do this in a separate issue.

Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
Signed-off-by: Andrii Kliachkin <andrew.klyachkin@gmail.com>
@rahulnair23
Copy link

Hi @hoffmansc - seems like the Law School dependency has been removed from the demo notebook. Is this good to merge?

@hoffmansc hoffmansc merged commit 01b77d4 into Trusted-AI:master Nov 10, 2023
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add methods for dealing with fairness in rankings
4 participants