-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement individual fairness metrics #48
Comments
The sample datasets can be used for implementing this. |
I'm unclear on what the output should be. For example, Dwork et al. focus on using the Lipshitz condition as a constraint on the optimization function. We want to take the predictions from an unconstrained algorithm and determine how far from the LIpshitz condition it is. As a first step, would we want two matrices one with distance measured in the feature space and a second with distance measured in the outcome space (from which we could determine a score such as what fraction of the pairs of points fail the conditions)? Similarly, would the input be the whole feature space for a prediction? This seems necessary, but undesirable, since we would require all the data. |
@saleiro I've outlined how I think this should work in the issue ticket. Does that outline of functions seem reasonable? I'll start with the clustering? |
@anisfeld I suggest that you create a README file within the individual module and outline exactly what you are going to implement. Let's abstract if the features are passed in the df or not. It's up to the final user to decide what she wants to use as representation. They can even pass different dfs based on different representations, train/test splits over time etc... |
This issue it's about creating a new class maybe named "Individual" that implements individual notions of fairness based on label differences (impurities) for similar individuals. Each method of the class just needs a list of dataframes as input (let's consider that in the future we might want to compare multiple train/test sets labels) and finds similar data points and then look to the label distribution of the pair/cluster.
sub methods:
e.g. count number of times the lipshitz condition is not met for each point, normalize and average?
sub methods:
The text was updated successfully, but these errors were encountered: