Implement individual fairness metrics #48

saleiro · 2018-09-26T14:29:11Z

This issue it's about creating a new class maybe named "Individual" that implements individual notions of fairness based on label differences (impurities) for similar individuals. Each method of the class just needs a list of dataframes as input (let's consider that in the future we might want to compare multiple train/test sets labels) and finds similar data points and then look to the label distribution of the pair/cluster.

Cynthia Dwork's notion of individual fairness (Lipschitz condition).
sub methods:

create pairwise distance metric in feature space
create pairwise distance metric in output space
some sort of aggregator
e.g. count number of times the lipshitz condition is not met for each point, normalize and average?

Matching methods to find similar data points and then calculate label purity.
sub methods:

Create clusters (start with k-means)
Calculate purity metric of labels within a cluster (output k metrics)
Visualize clusters (if not 2-d use principal components?)
Visualize the purity metric per cluster

saleiro · 2018-09-26T14:34:05Z

The sample datasets can be used for implementing this.

anisfeld · 2018-10-01T06:04:42Z

I'm unclear on what the output should be.

For example, Dwork et al. focus on using the Lipshitz condition as a constraint on the optimization function. We want to take the predictions from an unconstrained algorithm and determine how far from the LIpshitz condition it is.

As a first step, would we want two matrices one with distance measured in the feature space and a second with distance measured in the outcome space (from which we could determine a score such as what fraction of the pairs of points fail the conditions)?

Similarly, would the input be the whole feature space for a prediction? This seems necessary, but undesirable, since we would require all the data.

anisfeld · 2018-10-11T02:27:24Z

@saleiro I've outlined how I think this should work in the issue ticket. Does that outline of functions seem reasonable? I'll start with the clustering?

saleiro · 2018-10-12T15:54:28Z

@anisfeld I suggest that you create a README file within the individual module and outline exactly what you are going to implement. Let's abstract if the features are passed in the df or not. It's up to the final user to decide what she wants to use as representation. They can even pass different dfs based on different representations, train/test splits over time etc...

saleiro assigned anisfeld Sep 26, 2018

anisfeld added a commit that referenced this issue Oct 1, 2018

#48 first commit

af715e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement individual fairness metrics #48

Implement individual fairness metrics #48

saleiro commented Sep 26, 2018 •

edited by anisfeld

Loading

saleiro commented Sep 26, 2018

anisfeld commented Oct 1, 2018

anisfeld commented Oct 11, 2018

saleiro commented Oct 12, 2018

Implement individual fairness metrics #48

Implement individual fairness metrics #48

Comments

saleiro commented Sep 26, 2018 • edited by anisfeld Loading

saleiro commented Sep 26, 2018

anisfeld commented Oct 1, 2018

anisfeld commented Oct 11, 2018

saleiro commented Oct 12, 2018

saleiro commented Sep 26, 2018 •

edited by anisfeld

Loading