Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download LawSchool dataset directly from SEAPHE #492

Closed
wants to merge 3 commits into from

Conversation

EktaBhaskar
Copy link

@EktaBhaskar EktaBhaskar commented Sep 22, 2023

Signed-off-by: Ekta Bhaskar bhaskar.ekta@ufl.edu

Closes #359

Signed-off-by: Ekta Bhaskar <bhaskar.ekta@ufl.edu
Signed-off-by: Ekta <ekta12bhaskar@gmail.com>
"from sklearn.metrics import mean_absolute_error\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"current_script_path=os.path.abspath('demo_grid_search_reduction_regression_sklearn.ipynb')\n",
"print(current_script_path)\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we'd like to clean this part up for review 👍

Also, it looks like there are a few warnings noticed in https://github.com/EktaBhaskar/AIF360/blob/master/examples/sklearn/demo_grid_search_reduction_regression_sklearn.ipynb, do you also see those warnings locally as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yehee I'm not sure why but the notebook was giving error to me on simply importing like:-

from aif360.sklearn.datasets.seaphe_datasets import load_lawschool_data from aif360.sklearn.inprocessing import GridSearchReduction from aif360.sklearn.metrics import difference

I assumed it might be because of some dependancy issues.
So I had to include sys.path.
Do you want me to write import statements as above? It might work on proper setup, just won't work on mine.

Yes, I can see warnings but not sure why it's there as I did not make any changes related to AdversarialDebiasing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you replace the line in __init__.py with the seaphe_datasets file, it should import normally. The warning is fine but there's also an error later. It looks like you commented out

from aif360.sklearn.metrics import difference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried putting in init.py of sklearn folder as well. But it gives me error:-

this is init.py:-


from aif360.sklearn.datasets.utils import standardize_dataset, NumericConversionWarning
from aif360.sklearn.datasets.openml_datasets import fetch_adult, fetch_german, fetch_bank
from aif360.sklearn.datasets.compas_dataset import fetch_compas
from aif360.sklearn.datasets.meps_datasets import fetch_meps
from aif360.sklearn.datasets.tempeh_datasets import fetch_lawschool_gpa
from aif360.sklearn.datasets.seaphe_datasets import load_lawschool_data

error:-

ImportError: cannot import name 'load_lawschool_data' from 'aif360.sklearn.datasets' (/opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/datasets/__init__.py)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to comment :-
from aif360.sklearn.metrics import difference

Because it was giving me error even before I made any changes and was just trying to run this notebook to see how it works. And later on also, did not make any changes in this method/class:-

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [28], line 14
     12 from aif360.sklearn.inprocessing import GridSearchReduction
     13 # from aif360.sklearn.datasets import load_lawschool_data
---> 14 from aif360.sklearn.metrics import difference

File /opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/metrics/__init__.py:7
      1 """
      2 ``aif360.sklearn`` implements a number of fairness metrics for group fairness
      3 and individual fairness. For guidance on which metric to use for a given
      4 application, see our
      5 `Guidance <http://aif360.mybluemix.net/resources#guidance>`_ page.
      6 """
----> 7 from aif360.sklearn.metrics.metrics import *

File /opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/metrics/metrics.py:14
     11 from sklearn.utils.deprecation import deprecated
     13 from aif360.sklearn.utils import check_inputs, check_groups
---> 14 from aif360.detectors.mdss.ScoringFunctions import BerkJones, Bernoulli
     15 from aif360.detectors.mdss.MDSS import MDSS
     17 __all__ = [
     18     # meta-metrics
     19     'difference', 'ratio', 'intersection', 'one_vs_rest',
   (...)
     38     'false_positive_rate_error'
     39 ]

File /opt/homebrew/lib/python3.10/site-packages/aif360/detectors/mdss/ScoringFunctions/__init__.py:5
      3 from aif360.detectors.mdss.ScoringFunctions.Poisson import Poisson
      4 from aif360.detectors.mdss.ScoringFunctions.BerkJones import BerkJones
----> 5 from aif360.detectors.mdss.ScoringFunctions.Gaussian import Gaussian

File /opt/homebrew/lib/python3.10/site-packages/aif360/detectors/mdss/ScoringFunctions/Gaussian.py:1
----> 1 from turtle import pen
      2 from aif360.detectors.mdss.ScoringFunctions.ScoringFunction import ScoringFunction
      3 from aif360.detectors.mdss.ScoringFunctions import optim

File /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/turtle.py:107
    103 _ver = "turtle 1.1b- - for Python 3.1   -  4. 5. 2009"
    105 # print(_ver)
--> 107 import tkinter as TK
    108 import types
    109 import math

File /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tkinter/__init__.py:37
     34 import sys
     35 import types
---> 37 import _tkinter # If this fails your Python may not be configured for Tk
     38 TclError = _tkinter.TclError
     39 from tkinter.constants import *

ModuleNotFoundError: No module named '_tkinter'


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to install the code with pip install --editable . from the project root

Copy link
Collaborator

@hoffmansc hoffmansc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the SEAPHE website is currently down so we can't download the data but they're looking into it. I'll comment here when it's resolved.

Once you change the name to fetch_lawschool_gpa, I think the test case might fail since it checks if it exactly matches the version from tempeh so maybe you can double check that all the processing is the same.

from sklearn.model_selection import train_test_split


def load_lawschool_data(target, subset="all", usecols=None, dropcols=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this named fetch_lawschool_gpa and have it replace the existing function in aif360/sklearn/datasets/tempeh_datasets.py?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it's name to "fetch_lawschool_gpa" and have placed in "aif360/sklearn/datasets/tempeh_datasets.py" file. However, there's some issue in imports(which I have explained below) so, it's still referring to older "fetch_lawschool_gpa" function format and definition.

I tested multiple times in my local and "fetch_lawschool_gpa" always redircts to the updated method but when I run the notebook, it gives an error that "target is not a defined positional argument" which means it is referring to older "fetch_lawschool_gpa" method as older one didn't have "target" argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant keep the tempeh_datasets.py file as is just rename the function in seaphe_datasets.py as fetch_lawschool_gpa then change the __init__.py to import from seaphe_datasets instead of tempeh_datasets. Once we know it's working, you can delete the tempeh_datasets.py file.

Did you test this after installing with the --editable flag?

Copy link
Author

@EktaBhaskar EktaBhaskar Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Sorry! I have reverted changes in" tempeh_datasets.py" and renamed the method in "seaphe_datasets" and changed the init.py file to import the method from "seaphe_datasets".

It is importing "fetch_lawschool_gpa" fine now, but I think the "seaphe_datasets" download issue is still not fixed. It is not returning zip file as expected in this call :-
requests.get("http://www.seaphe.org/databases/LSAC/LSAC_SAS.zip")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did "pip install --editable ." at root AIF360. And then launched jupyter-notebook from root.

But these 2 imports :-

from aif360.sklearn.metrics import difference
from aif360.sklearn.inprocessing import GridSearchReduction

are still giving this error:_
ModuleNotFoundError: No module named 'inFairness'

"from sklearn.metrics import mean_absolute_error\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"current_script_path=os.path.abspath('demo_grid_search_reduction_regression_sklearn.ipynb')\n",
"print(current_script_path)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you replace the line in __init__.py with the seaphe_datasets file, it should import normally. The warning is fine but there's also an error later. It looks like you commented out

from aif360.sklearn.metrics import difference

"source": [
"Datasets are formatted as separate `X` (# samples x # features) and `y` (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a `sample_weight` object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.\n",
"\n",
"For example, we can easily load the law school gpa dataset from tempeh with the following line:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the text to reflect that we're fetching it from SEAPHE now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

aif360/sklearn/datasets/seaphe_datasets.py Outdated Show resolved Hide resolved
@@ -0,0 +1,66 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line explaining where the code is copied from and which parts this covers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the code from :-
https://github.com/microsoft/tempeh/blob/main/tempeh/datasets/seaphe_datasets.py

This was mentioned on the issue assigned to me :-

#359

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant write it in the source file :)

I just want it to be clear which parts are covered by that copyright and which are written by us

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Mentioned it in "seaphe_datasets" before the "fetch_lawschool_gpa" method.

@@ -144,12 +123,12 @@
"source": [
"Datasets are formatted as separate `X` (# samples x # features) and `y` (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a `sample_weight` object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.\n",
"\n",
"For example, we can easily load the law school gpa dataset from tempeh with the following line:"
"For example, we can easily download and use the law school gpa dataset from SEAPHE direclty with the following line:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"For example, we can easily download and use the law school gpa dataset from SEAPHE direclty with the following line:"
"For example, we can easily download and use the Law School GPA dataset from SEAPHE directly with the following line:"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from sklearn.model_selection import train_test_split


def load_lawschool_data(target, subset="all", usecols=None, dropcols=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant keep the tempeh_datasets.py file as is just rename the function in seaphe_datasets.py as fetch_lawschool_gpa then change the __init__.py to import from seaphe_datasets instead of tempeh_datasets. Once we know it's working, you can delete the tempeh_datasets.py file.

Did you test this after installing with the --editable flag?

@hoffmansc
Copy link
Collaborator

closing in favor of #510

@hoffmansc hoffmansc closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download LawSchool dataset directly from SEAPHE
3 participants