CRN: 62868
lecture: 1:40-3 pm TR
discussion: 4:10-5 pm R
4 units
Norm Matloff
Dept. of Computer Science
University of California, Davis
matloff@cs.ucdavis.edu
(my bio)
Are machine learning (ML) algorithms biased against minorities and women?
-
A 2016 Pro Publica article investigated COMPAS, an ML algorithm designed to predict recidivism by those convicted of crimes. The article found the tool to be racially biased, of major concern since it was being used by judges as an aid in sentencing convicts.
-
Actually, racial, gender and other biases in ML are commonplace.
-
What tools have been developed to detect and remedy bias?
-
Students both from within and from outside computer science are encouraged to enroll.
-
Since students will work in groups, strengths of some group members may complement those of others in the group.
-
However, everyone is assumed to have some minimal background in:
-
Coding: Loops, functions, if-else. We will use R; see my quick tutorial.
-
Statistics: Bayes Rule for probabilities; confidence intervals. Some exposure to linear regression models would be helpful.
-
-
A firm prereq is a common sense`understanding of proportions, e.g. difference between proportion of x among y vs. proportion of y among x. Amazingly, many people lack this; E.g. many serious conceptual errors have been made regarding Covid-19.
-
Previous background in ML is not required. An overview will be covered for those not having this background. If you have had an ML course, that is fine, but I guarantee that you will learn a lot of new things concerning its practical application.
We will begin with the Pro Publica investigation:
These articles will contain various technical methods and concepts, such as logistic model and predictive parity. We will explain these terms as we reach them, but also treat all the above COMPAS sources as motivating examples for the more general material that follows. That will be taken from sources such as the online book by Barocas et al, the notes by Fraenkel, and misc. papers in the fair ML area.
We will also need some supporting materials now and then, such as the excellent paper on Simpson's Paradox, Good for Men, Good for Women, Bad for People, which we will cover in part.
Letter grade.
Tentative breakdown:
-
Quizzes (6-7) (individual): 30%
-
Written and data analysis assignments (group): 30%
-
Presentation on a special topic (group): 20%
-
Term project (group): 20%
So, most graded work will be done on a group basis. Choose your own group if you know people in the class, but otherwise the TA will assign.
There are a number of methods for fair ML, and several R packages, such as this one by Scutari. They all come with included datasets; Scutari has the COMPAS data, the German Credit data (is there bias against women in lending?), etc. I also have a number of datasets of my own, and my own packages, such as this one.
One of the group assignments will probably involve writing an R package implementing a new fair ML idea.
-
A strong appreciation for ethical issues in ML.
-
A solid understanding of the practical aspects of ML (improved understanding even if you've taken ML courses).
-
An improved understanding of the practical aspects of statistics (even if you are a stat major).