Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLARE pull request #38

Merged
merged 59 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
04687c4
Added FDT parameters to Explainer kwargs
Kaysera Jun 20, 2022
4c88162
Fixed LoreNeighborhood instance_membership
Kaysera Jun 20, 2022
480668b
Added fidelity metric
Kaysera Jul 6, 2022
400395d
Fixed coverage to take all rules into account
Kaysera Jul 6, 2022
2e2ae2b
Added default branch for fuzzy tree
Kaysera Jul 6, 2022
acac467
Added smoothing in the neighborhood to remove outliers
Kaysera Jul 6, 2022
067d1b8
Added counterfactual metrics
Kaysera Jul 6, 2022
5f0c390
Added similarity mapping function for rules
Kaysera Jul 6, 2022
079144a
Restructured lore_neighborhood
Kaysera Jul 7, 2022
b66057e
Merge branch 'flore' of https://github.com/Kaysera/teacher-private in…
Kaysera Jul 7, 2022
af20419
Added simmilarity for fuzzy discrete sets
Kaysera Jul 7, 2022
cc0fdd4
Merge branch 'flore' of https://github.com/Kaysera/teacher-private in…
Kaysera Jul 7, 2022
0ddd234
Added Sampling neighborhood and d_counterfactual
Kaysera Jul 11, 2022
466072e
Merge branch 'flore' of https://github.com/Kaysera/teacher-private in…
Kaysera Jul 11, 2022
8c70e32
Added normalization for the datasets
Kaysera Jul 12, 2022
f87731c
Forced casting to int when decoding
Kaysera Jul 12, 2022
2f8343e
Fixed typo
Kaysera Jul 12, 2022
6c5708d
Added hashing to rule
Kaysera Jul 21, 2022
cf9e82c
Added mixed distance for counterfactuals from Guidotti review
Kaysera Jul 21, 2022
eb717fe
Added fast neighborhood generation
Kaysera Jul 22, 2022
916a367
Added new datasets and changes to discretization
Kaysera Oct 18, 2022
4315f29
Deprecated median_absolute_deviation, now median_abs_deviation
Kaysera Oct 18, 2022
7ff656c
added multiclass search_counterfactual
Kaysera Oct 25, 2022
6bf7ec0
Modified fuzzy discretization to enhance fuzzy entropy speed
Kaysera Oct 29, 2022
9fa76ad
Merge branch 'flore' of https://github.com/Kaysera/teacher-private in…
Kaysera Oct 29, 2022
429e777
Fixed division by zero in distance
Kaysera Jan 10, 2023
647b5b3
Added cf_dist parameter and fixed exp_value bug
Kaysera Jan 10, 2023
a3bd657
Added check to compute instance membership
Kaysera Jan 10, 2023
ee1d6f5
Compute instance membership after generating neighborhood
Kaysera Jan 10, 2023
be23262
Added array of distances to have multiple methods
Kaysera Jan 10, 2023
7d99763
Added lighter threshold for fuzzy discretization
Kaysera Jan 10, 2023
32263e0
FDT Explainer hit bug fixed
Kaysera Jan 31, 2023
098860b
Added threshold to entropy for fuzzy neighborhood
Kaysera Jan 31, 2023
50e5666
Added neighborhood range
Kaysera Jan 31, 2023
d86e931
Added fuzzy binary decision tree
Kaysera May 22, 2023
45a4c0a
Added fuzzy binary tree explainer
Kaysera May 23, 2023
adbb8aa
Added jaccard similarity
Kaysera Jun 8, 2023
101ca8b
added iris and wine to datasets
Kaysera Oct 25, 2023
0df2a8c
Merge branch 'flore' of https://github.com/Kaysera/teacher-private in…
Kaysera Oct 25, 2023
e947dda
Sort columns of new datasets
Kaysera Feb 5, 2024
314a5a3
Added f1_score for multiclass
Kaysera Feb 5, 2024
4864d30
Added max tries to avoid neighborhood getting stuck
Kaysera Feb 5, 2024
b985d33
Merge branch 'main' into flore
Kaysera Feb 5, 2024
af8c476
Fixed lint issues
Kaysera Feb 5, 2024
5f5d802
Clean up sampling neighborhood
Kaysera Feb 5, 2024
e43dcea
Changed bare exception in _lore_neighborhood
Kaysera Feb 5, 2024
717cfbe
Clean up FDT explainer
Kaysera Feb 5, 2024
51e3d90
Clean up FBDT explainer (experimental)
Kaysera Feb 5, 2024
5974660
Clean up and write comments for _counterfactual
Kaysera Feb 5, 2024
2715d17
Added docs for base datasets
Kaysera Feb 5, 2024
de0c5d0
Removed unnecessary parameter from FBDT explainer
Kaysera Feb 5, 2024
d7f794a
Removed spaces from sign in function declaration and unused code
Kaysera Feb 5, 2024
e62cae1
Added comments and docs
Kaysera Feb 5, 2024
a39f4ef
Extended docs for tree simplification
Kaysera Feb 5, 2024
9c9a712
Merge pull request #1 from Kaysera/flore
Kaysera Feb 5, 2024
e2592d3
Added datasets to installation
Kaysera Feb 5, 2024
85f5494
Skipped LoreNeighborhood tests to deprecate it
Kaysera Feb 5, 2024
25311b2
Merge pull request #2 from Kaysera/flore
Kaysera Feb 5, 2024
ccf01bc
Added FLARE to readme
Kaysera Feb 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,7 @@ dmypy.json
.pre-commit-config.yaml

# DS files
.DS_Store
.DS_Store

flore-experiments/
knowledge-extraction-experiments/
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
recursive-include docs *
recursive-include teacher/datasets/data *.csv
recursive-include src/teacher/datasets/data *.csv

include LICENSE
include MANIFEST.in
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ For detailed instructions on how to use teacher, please refer to the [API Refere
The following list summarizes the models and explainers currently supported
- **Fuzzy Factuals and Counterfactuals**: Explainer obtained from a fuzzy tree that can be used for global or local explanations
- **LORE**: Local explainer generated from a neighborhood
- **FLARE**: Fuzzy local explainer generated from a neighborhood

## Metrics

Expand All @@ -73,4 +74,5 @@ The following list summarizes the metrics and scores that can be extracted from
- Documentation <https://xai-teacher.readthedocs.io/en/latest/>
- Experiments: <https://github.com/Kaysera/teacher-experiments>
- LORE ([Guidotti et al., 2018](https://doi.org/10.1109/MIS.2019.2957223))
- Documentation and examples: <https://doi.org/10.1109/MIS.2019.2957223>
- Documentation and examples: <https://doi.org/10.1109/MIS.2019.2957223>
- FLARE ([Fernandez et al., 2023 preprint](https://dsi.uclm.es/descargas/technicalreports/DIAB-24-02-1/FLARE_Tech_Rep.pdf))
20 changes: 18 additions & 2 deletions src/teacher/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,18 @@
# =============================================================================

# Local application
from ._base import load_german, load_adult, load_compas, load_heloc, load_beer, load_pima, load_breast
from ._base import (load_german,
load_adult,
load_compas,
load_heloc,
load_beer,
load_pima,
load_breast,
load_basket,
load_phishing,
load_flavia,
load_iris,
load_wine)


# =============================================================================
Expand All @@ -120,10 +131,15 @@
# from the module teacher.datasets
__all__ = [
"load_adult",
"load_basket",
"load_beer",
"load_breast",
"load_compas",
"load_german",
"load_heloc",
"load_pima"
"load_pima",
"load_phishing",
"load_flavia",
"load_iris",
"load_wine"
]
176 changes: 160 additions & 16 deletions src/teacher/datasets/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
# Third party
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import datasets

# Local application
from teacher.utils import recognize_features_type, set_discrete_continuous, label_encode
Expand All @@ -30,7 +32,7 @@
# Functions
# =============================================================================

def generate_dataset(df, columns, class_name, discrete, name):
def generate_dataset(df, columns, class_name, discrete, name, normalize=False):
"""Generate the dataset suitable for LORE usage

Parameters
Expand All @@ -45,6 +47,8 @@ def generate_dataset(df, columns, class_name, discrete, name):
List with all the columns to be considered to have discrete values
name : str
Name of the dataset
normalize : bool
Whether to normalize the continuous features or not

Returns
-------
Expand All @@ -63,6 +67,7 @@ def generate_dataset(df, columns, class_name, discrete, name):
label_encoder : label encoder for the discrete values
X : NumPy array with all the columns except for the class
y : NumPy array with the class column
normalize_scaler : scaler used to normalize the continuous features
"""
possible_outcomes = list(df[class_name].unique())

Expand All @@ -72,7 +77,12 @@ def generate_dataset(df, columns, class_name, discrete, name):
columns_tmp = list(columns)
columns_tmp.remove(class_name)
idx_features = {i: col for i, col in enumerate(columns_tmp)}

df[continuous] += 1
if normalize:
scaler = StandardScaler()
df[continuous] = scaler.fit_transform(df[continuous])
else:
scaler = None
# Dataset Preparation for Scikit Alorithms
df_le, label_encoder = label_encode(df, discrete)
X = df_le.loc[:, df_le.columns != class_name].values
Expand All @@ -90,14 +100,15 @@ def generate_dataset(df, columns, class_name, discrete, name):
'continuous': continuous,
'idx_features': idx_features,
'label_encoder': label_encoder,
'normalize_scaler': scaler,
'X': X,
'y': y,
}

return dataset


def load_german():
def load_german(normalize=False):
"""
Load and return the german credit dataset.

Expand All @@ -115,10 +126,10 @@ def load_german():

discrete = ['installment_as_income_perc', 'present_res_since', 'credits_this_bank', 'people_under_maintenance']

return generate_dataset(df, columns, class_name, discrete, 'german_credit')
return generate_dataset(df, columns, class_name, discrete, 'german_credit', normalize)


def load_adult():
def load_adult(normalize=False):
"""
Load and return the adult dataset.

Expand All @@ -145,10 +156,10 @@ def load_adult():
class_name = 'class'

discrete = []
return generate_dataset(df, columns, class_name, discrete, 'adult')
return generate_dataset(df, columns, class_name, discrete, 'adult', normalize)


def load_compas():
def load_compas(normalize=False):
"""
Load and return the COMPAS scores dataset.

Expand Down Expand Up @@ -196,10 +207,10 @@ def get_class(x):
class_name = 'class'
discrete = ['is_recid', 'is_violent_recid', 'two_year_recid']

return generate_dataset(df, columns, class_name, discrete, 'compas-scores-two-years')
return generate_dataset(df, columns, class_name, discrete, 'compas-scores-two-years', normalize)


def load_heloc():
def load_heloc(normalize=False):
"""
Load and return the HELOC dataset.

Expand All @@ -215,10 +226,10 @@ def load_heloc():
class_name = 'RiskPerformance'

discrete = []
return generate_dataset(df, columns, class_name, discrete, 'heloc_dataset_v1')
return generate_dataset(df, columns, class_name, discrete, 'heloc_dataset_v1', normalize)


def load_beer():
def load_beer(normalize=False):
"""
Load and return the beer dataset.

Expand All @@ -238,10 +249,10 @@ def load_beer():

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'beer')
return generate_dataset(df, columns, class_name, discrete, 'beer', normalize)


def load_pima():
def load_pima(normalize=False):
"""
Load and return the pima indians dataset.

Expand All @@ -261,10 +272,117 @@ def load_pima():

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'pima')
return generate_dataset(df, columns, class_name, discrete, 'pima', normalize)


def load_flavia(normalize=False):
"""
Load and return the FLAVIA dataset.

def load_breast():
Returns
-------
dataset : dict
"""
# Read Dataset
df = pd.read_csv(MODULE_PATH + '/data/FLAVIA3.csv', delimiter=',')

# Features Categorization
class_name = 'Class'
df_cols = list(df.columns)
df_cols.remove(class_name)
new_cols = [class_name] + df_cols
df = df[new_cols]

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'flavia', normalize)


def load_phishing(normalize=False):
"""
Load and return the phishing dataset.

Returns
-------
dataset : dict
"""
# Read Dataset
df = pd.read_csv(MODULE_PATH + '/data/phishing.csv', delimiter=',')
del df['id']
del df['PctExtResourceUrls']
del df['PctNullSelfRedirectHyperlinks']
del df['SubdomainLevelRT']
del df['UrlLengthRT']
del df['PctExtResourceUrlsRT']
del df['AbnormalExtFormActionR']
del df['ExtMetaScriptLinkRT']
del df['PctExtNullSelfRedirectHyperlinksRT']

# Features Categorization
class_name = 'CLASS_LABEL'
df_cols = list(df.columns)
df_cols.remove(class_name)
new_cols = [class_name] + df_cols
df = df[new_cols]

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'phishing', normalize)


def load_iris(normalize=False):
"""
Load and return the iris dataset.

Returns
-------
dataset : dict
"""
# Read Dataset
iris = datasets.load_iris(as_frame=True)
df = iris.frame

# Features Categorization
columns = df.columns
class_name = columns[-1]

df_cols = list(df.columns)
df_cols.remove(class_name)
new_cols = [class_name] + df_cols
df = df[new_cols]

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'iris', normalize)


def load_wine(normalize=False):
"""
Load and return the wine dataset.

Returns
-------
dataset : dict
"""
# Read Dataset
wine = datasets.load_wine(as_frame=True)
df = wine.frame

# Features Categorization
columns = df.columns
class_name = columns[-1]

df_cols = list(df.columns)
df_cols.remove(class_name)
new_cols = [class_name] + df_cols
df = df[new_cols]

discrete = []
columns = df.columns
return generate_dataset(df, columns, class_name, discrete, 'wine', normalize)


def load_breast(normalize=False):
"""
Load and return the breast cancer dataset.

Expand All @@ -281,4 +399,30 @@ def load_breast():
class_name = 'diagnosis'

discrete = []
return generate_dataset(df, columns, class_name, discrete, 'breast')
return generate_dataset(df, columns, class_name, discrete, 'breast', normalize)


def load_basket(normalize=False, reduced=False):
"""
Load and return the basket dataset.

Returns
-------
dataset : dict
"""
# Read Dataset
if reduced:
df = pd.read_csv(MODULE_PATH + '/data/small_basket.csv', delimiter=',')
else:
df = pd.read_csv(MODULE_PATH + '/data/basket.csv', delimiter=',')

# Features Categorization
columns = df.columns
class_name = 'Position'
df_cols = list(df.columns)
df_cols.remove(class_name)
new_cols = [class_name] + df_cols
df = df[new_cols]

discrete = []
return generate_dataset(df, columns, class_name, discrete, 'basket', normalize)
Loading
Loading