# Data Science Ethics Checklist

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

## A. Data Collection
 - [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

*The project collected raw data from published literature. These data are from wetlab experiments on pharmaceutical removal from wastewater using biochar. Thus, there is no human subject involved; so informed consent was not needed.*
 - [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

*Souces of bias was reduced during data collection phase by incorporating literatures from 4 key databases of journal articles and these databases are - Google Scholar, Scopus, PubMed and Web of Science.*
 - [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

*No personally identifiable information (PII) was collected for this project and so there is no risk of PII exposure.*
 - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

*Since the project didn't collect any data on human subject, downstream bias mitigation was not relevant.*

## B. Data Storage
 - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

*Stored data will remain in the personal drive of each team member and also in the github repository for wider usage. To ensure better accessibility to the knowledge element generated from this project, there is no intention to protect and secure it in the coming days.*
 - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?

*This checklist item is not relevant, because the project didn't collect information on any individuals.*
 - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

*No such plan because these data might be needed for any future work.*

## C. Analysis
 - [ ] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?

*Engagement of team members and I-GUIDE's experts were crucial for this project. This helped to address missing perspectives and blindspots in the analysis of the collected data.*
 - [ ] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

*The data has been examined during the data collection and analysis step. In the collection phase, multiple databases have been used for data collection and during the Exploratory Data Analysis (EDA) and preprocessing step, few variables have been omitted, since they were irrelevant for this project.*
 - [ ] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?

*Yes, the visualizations, summary statistics and reports are designed to honestly represent the underlying data.*
 - [ ] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?

*This checklist item is not relevant for the project, since it doesn't collect PII for human subject.*
 - [ ] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

*The process of geenrating the analysis is well-documented in the github repository & I-GUIDE platform for any future concerns.*

## D. Modeling
 - [ ] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?

*The variables have been carefully chosen so that the model doesn't rely on any variable or their proxies that are unfairly discriminatory.*
 - [ ] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

*Model results have been tested with respect to a different affected group (test dataset) to ensure fairness. Model evaluation has also been performed to ensure fairness across groups.*
 - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?

*The effects of optimizing defined metrics and additional metrics have been considered throughout the project.*
 - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

*Explainability of the model has been ensure with detailed notes in Jupyter Notebook submitted for the project. Besides, a Graphical User Interface (GUI) has been developed for clear illustration that present model/project decisins to broader audience.*
 - [ ] **D.5 Communicate bias**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

*The shortcomings, limitations and biases of the model have been discussed with the project team in details. However, stakeholders will be communicated on these matters in the later stages of the project, particularly for scientific manuscript development.*

## E. Deployment
 - [ ] **E.1 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?

*The project dataset comes from scientific literatures and it assists decision makers to find the best set of conditions needed for pharmaceutical removal using biochar. Users can make decisions based on the efficacy of the model; but, as we all are aware, none of such model can predict 100% effectiveness and therefore, the interest to adopt the model's suggestions is solely dependent on users choice. The data science team cannot be held responsible, if the model's prediction is not very effective in reality.*
 - [ ] **E.2 Roll back**: Is there a way to turn off or roll back the model in production if necessary?

*Yes, turning off or roll back will be possible if necessary.*
 - [ ] **E.3 Concept drift**: Do we test and monitor for concept drift to ensure the model remains fair over time?

*To keep the model fair over time, it is necessary to regularly update the raw dataset with newly-published research articles. At present, there is no such plan to frequently revise the dataset. Thus, the model might not be very relevant after several years, if the dataset doesn't get crowed with new literatures.*
 - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

*One of the key purpose of this project team is to make scientific knowledge available for greater audience. Therefore, no step has been planned to identify and prevent unintended uses and abuse of the model. The dataset and the model will be made publicly available in the github repository to assist users in their decision making process.*

*Data Science Ethics Checklist generated with [deon](http://deon.drivendata.org).*
<a href="http://deon.drivendata.org/">
    <img src="https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square" alt="Deon badge" />
</a>