Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FACTS method bug: feature_weights parameter of the method is not propagated correctly #532

Closed
phantom-duck opened this issue May 20, 2024 · 0 comments

Comments

@phantom-duck
Copy link
Contributor

Inside the fit method of the FACTS detector, the function calc_costs is called to calculate the costs of the recourses. But the params argument (which includes the feature_weights) is not passed. As a result, the default parameters are always used, which assign weight equal to 1 to all features.

Example to reproduce:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.datasets.openml_datasets import fetch_adult
from aif360.sklearn.detectors.facts.clean import clean_dataset
from aif360.sklearn.detectors.facts import FACTS
from aif360.sklearn.detectors.facts.predicate import Predicate
import pandas as pd

random_seed = 131313 # to produce the expected if-then clause

# load the adult dataset and perform some simple preprocessing steps
# See output for a glimpse of the final dataset's characteristics
X, y, sample_weight = fetch_adult()
data = clean_dataset(X.assign(income=y), "adult")

# split into train-test data
y = data['income']
X = data.drop('income', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=random_seed, stratify=y)

#### here, we incrementally build the example model. It consists of one preprocessing step,
#### which is to turn categorical features into the respective one-hot encodings, and
#### a simple scikit-learn logistic regressor.
categorical_features = X.select_dtypes(include=["object", "category"]).columns.to_list()
categorical_features_onehot_transformer = ColumnTransformer(
    transformers=[
        ("one-hot-encoder", OneHotEncoder(), categorical_features)
    ],
    remainder="passthrough"
)
model = Pipeline([
    ("one-hot-encoder", categorical_features_onehot_transformer),
    ("clf", LogisticRegression(max_iter=1500))
])

#### train the model
model = model.fit(X_train, y_train)

detector = FACTS(
    clf=model,
    prot_attr="sex",
    freq_itemset_min_supp=0.08,
    feature_weights={f: 10 for f in X.columns},
    feats_not_allowed_to_change=[]
)

detector = detector.fit(X_test)

print(detector.rules_by_if[Predicate.from_dict({"age": pd.Interval(26., 34.), "hours-per-week": "FullTime"})]["Female"][1][0])

The output of the final command is (Predicate(features=['age', 'hours-per-week'], values=[Interval(41.0, 50.0, closed='right'), 'OverTime']), 0.07728337236533955, 2.0), which shows that the action of changing the features "age" and "hours-per-week" (which are categorical) is counted as having cost equal to 2.0. The correct value, however, would be 20.0, since we have assigned weight equal to 10 to all features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants