# Data Science Ethics Checklist

## Team Composition
------
 - [ ] Does our team reflect diversity of opinions, backgrounds, and kinds of thought?

## Data Collection
------
 - [ ] If there are human subjects, have those subjects have given informed consent, where users clearly understand what they are consenting to and there was a mechanism in place for gathering consent?
 - [ ] Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [ ] Have we ensured that data collection does not collect personally identifable information (PII) which is not relevant to the analysis?
 - [ ] Has the data been appropriately anonymized before being stored?

## Data Storage
------
 - [ ] Do we have a plan to protect and secure data (e.g. encryption, up-to-date software)?
 - [ ] Have we set up appropriate access controls and means to monitor access?
 - [ ] Do we have a mechanism through which an individual can request their personal information be removed?
 - [ ] Is there a schedule or plan to delete the data after it is no longer needed?

## Exploratory Analysis
------
 - [ ] Have we studied and understood possible sources of bias in our data?
 - [ ] Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [ ] Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [ ] Is the process of generating the analysis auditable if we discover issues in the future?

## Modeling
------
 - [ ] Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g. affected community and subject matter experts)?
 - [ ] Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminative?
 - [ ] Have we tested model results for fairness with respect to different affected groups (e.g. tested for disparate error rates)?
 - [ ] Have we considered the effects of the assumptions built into our model and the effects of optimizing for our defined metrics?
 - [ ] Can we explain in understandable terms a decision the model made in cases where a justification is needed?

## Deployment
------
 - [ ] Is there a way to turn off or roll back the model in production if necessary?
 - [ ] Do we have a mechanism for redress if users are harmed by the results?
 - [ ] Do we test and monitor for model drift to ensure it remains fair over time?
 - [ ] Have we taken steps to identify and prevent abuse, unintended uses, and malicious attacks on the model?
 - [ ] Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?


