A website to help explain fair machine learning. See this video for a brief overview and demonstration:
(Please note, these instructions are a work in progress.)
- Python >= 3.5 (Assumed environment: Anaconda)
- Postgres
Ensure you have Python 3.5 or greater installed
- Recommended: Install Anaconda, it includes almost all the packages necessary for F4A to run.
- Optional between step, set up a virtual environment
- Use "pip install -r requirements.txt" or "conda install --file requirements.txt" to install all other necessary packages
Download and install Postgres 13
- Set up your username and password
- Load your username/password into the database.ini file
- If necessary, modify the database name to the name of your choosing (recommended - "F4A")
Ensure that your Python 3.5+ install is being used (whether it be globally via PATH variable or activated virtual environment) by opening Python in a command prompt. Ensure the Python version at the top is correct.
1.) Clone this repository, or download the code into a single folder.
2.) Download necessary packages for frontend and put them into the "static/" folder.
- Tabulator v4.9 (older/newer versions may work, but 4.9 certainly works)
- Tooltipster master branch
- Plotly min js file
3.) Modify the database.ini file to match your database's credentials
4.) Download data (suggested/see for a template of data format: http://okray.ml/data) and load the CSV files into the "datasets/" folder (currently available: Credit Default dataset, COMPAS dataset 2).
-
Each dataset requires:
- 1.) The dataset itself (as a CSV file for now)
- 2.) r rows of training index sets, where the number of columns is the number of instances n in the training set and each entry is the index of a training sample.
- e.g. for a 5 sample dataset, with a 60/40 training/testing split and 3 rows of training sample indices:
meaning the internal code will set the testing sample indices as:1,2,3 0,2,4 0,1,4
0,4 1,3 2,3
-
Why do this?
-
Static, initially randomly generated training/testing sets are necessary for two reasons:
1.) To ensure that different feature/hyperparameter combinations are compared against one another fairly
2.) To ensure that the results of the algorithm are static, so we can load them into/out of the database for faster lookup times. This is very important for more complex algorithms
-
5.) Set up the database
- Open the db_init folder and run the "create.sql" file in the database of your choosing, either in PGAdmin or with an application like DBeaver. This will make the necessary database tables.
- Optionally: Run the "load.sql" file, either unmodified or with changes of your choosing.
Currently only set up to run in development environments. See this page https://flask.palletsprojects.com/en/1.1.x/quickstart/ for basic instructions. All that's really needed is the command "flask run" in the cloned directory.
Currently, the project is in a stable state and can be deployed locally. However, changes are still being pushed rapidly, and it's recommended you run the db_init/ scripts on every pull. The roadmap for this project is tracked through a Notion project here: https://www.notion.so/F4A-645d588e7b194366b05855778bce17ea
Thank you for considering contributing! If you're relatively new to programming, machine learning, etc. I would love to help you get into the project. As of now the developement team is two people, so we have no need for a slack, discord, etc. Please email me and we can discuss questions and such, and in the future a more centralized platform may be set up depending on the number of contributors.
If you're an experienced programmer, we'd love to have you as well! Feel free to contact me with any questions/comments/ concerns, otherwise I look forward to your PR's!