# <div align="center"> SPECIAL TOPICS III </div>
## <div align="center"> Data Science for Social Scientists  </div>
### <div align="center"> ECO 4199 </div>
#### <div align="center">Class 11 - Social Biases and Prediction</div>
<div align="center"> Jonathan Holmes, (he/him)</div>

## Brief Review 

$$y = f(x) + \varepsilon$$

Machine learning is: 
1. Different functions that approximate $f(x)$
2. Methods to estimate $\hat{f}(x)$ 
3. Measures to identify how close $\hat{f}$ got to the ``true'' $f$



## Machine Learning Methods in One Expression

$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} 
     + \underbrace{\lambda \sum_{k=1}^k P(\beta_k)}_{\text{complexity restriction}}$$
     
Equivalently (for the mathematically inclined): 

$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} \text{ over } \underbrace{f \in F}_{\text{function class}} \text{ subject to } \underbrace{R(f) \leq c}_{\text{complexity restriction}}$$


## Machine Learning Methods in One Expression

$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} 
     + \underbrace{\lambda \sum_{k=1}^k P(\beta_k)}_{\text{complexity restriction}}$$

### The choice of functions $f$ 
- Linear models ($f(X) = \beta_0 + \beta_1 X_1 + ... + \beta_N X_N$)
- Logistic models
- Neural networks

Other models we have not learned in depth 
- Tree models
- Random forest
- K-nearest neighbour
- Suppor vector machines




## Machine Learning Methods in One Expression
$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} 
     + \underbrace{\lambda \sum_{k=1}^k P(\beta_k)}_{\text{complexity restriction}}$$


### The set of loss functions
- Mean-squared error $L(\hat{f}(x_i), y_i) = \frac{1}{n} \sum_{i=1}^N (\hat{f}(x_i) - y_i)^2$
- Mean absolute error $L(\hat{f}(x_i), y_i) = \frac{1}{n} \sum_{i=1}^N |\hat{f}(x_i) - y_i|$

Other functions we have not learned: 
- Log Loss
- Huber Loss




## Machine Learning Methods in One Expression
$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} 
     + \underbrace{\lambda \sum_{k=1}^k P(\beta_k)}_{\text{regularization}}$$
     
Complexity restriction: 
1. Nothing (OLS, basic neural networks, etc.)
2. Lasso: $P(\beta_k) = |\beta_k|$
3. Ridge: $P(\beta_k) = (\beta_k)^2$

We call these restrictions __REGULARIZATION__

     

## Mix-and-match

$$\min \underbrace{\sum_{i=1}^n L(f(x_i), y_i)}_{\text{in-sample loss}} 
     + \underbrace{\lambda \sum_{k=1}^k P(\beta_k)}_{\text{regularization}}$$


We have learned this is ok: 
- Linear regression + Lasso Regularization = Lasso Regression

But you can also mix-and-match: 
- Neural network + Lasso regularization 
- Logistic regression + Ridge regularization
- Etc. 

## Accuracy of the model

Measures: 
- Core methods: Mean-squared error, mean absolute error
- We have not seen: Log loss, Huber loss, etc
- Corrected statistics: Adjusted $R^2$, AIC, BIC
- Cross-validation

Again, you can uses any of these statistics depending on your needs! 



## Function Classes
- Recall that there is no one function class that is better over all predictive tasks all the time
- If you want to know which performs best a good way would to try them all at once
- It turns out that [PyCaret](https://pycaret.org/) can do this for you
- I will briefly cover their [tutorial](https://github.com/pycaret/pycaret/tree/master/tutorials)

# Concerns About Machine Learning

Examples: 
- Deepfakes, disinformation, spam 
- Re-creating art: 
    - Less jobs/career prospects for some industries
    - What will happen to the economy? 
    - Plagiarism? 
- Algorithms running amok (self-driving car cashes?)
    - Who is legally responsible when a self-driving car crashes? 
- Biased models! (Racism, sexism, etc)
    - The model reflects the data, easy to exclude some ideas/populations/etc. 
- AI Singularity, AIs taking over 
- We don't understand why AIs do what they do, unintended consequences? 
   


# Concerns About Machine Learning

1. ML models can be _biased_ (algorithmic bias)
2. We may not understant why ML models do what they do. 
3. We may lose control of machine learning models (robot takeover)
4. ML will replace humans (loss of jobs)
5. ML may centralize economic power in a small number of companies
6. ML models copy the work of people who are uncompensated


## This class: ML and data-induced biases

1. Sample-induced bias
2. Machines replicating society's bias
3. Incomplete feature sets



![](https://4.img-dpreview.com/files/p/E~TS590x0~articles/4871415337/googlebrain.jpeg)

## ML and Upsampling
- The way to do this is beyond the scope of this lecture
- But the method isn't very different from what we have learned
- The upsampling in the previous example takes a matrix X of dimension 8x8
- It outputs a matrix Y of dimension 32x32
$$\mathbf{Y} = f(\mathbf{X})$$
- For every pixel, the function output 4 pixels

## ML training
- So a deep learning neural network was trained to predict higher resolution picture
- The way to do this is of course to feed the NN a low resolution picture and use the high resolution picture as a target
- The NN learns from this data the pixels it should output given a low resolution picture
- How good this network is determined by how close it gets to the data it received

Who is this guy?

![](obama.png)

## ML test
- As should be clear by now, ML algorithms are only as good as their prediction out of sample
- Seeing the picture from the previous slide it is obvious to us that this low resolution picture represents Obama
- We are therefore able to reconstruct the image based on our recollection of Obama's features
- Not the trained NN, it is only able to output a new matrix Y, based on the weights that best fitted the data

![](https://cdn.vox-cdn.com/thumbor/MXX-mZqWLQZW8Fdx1ilcFEHR8Wk=/55x85:768x536/1820x1213/filters:focal(336x236:464x364):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/66972412/face_depixelizer_obama.0.jpg)

## ML and out of sample prediction
- Did the algorithm fail?
- If you were presented with the right picture only and knew this was generated by a computer you would probably be very impressed
- When compared to the left picture this is outrageously wrong
- But is it enough to talk about biases?
- Here are a few other out of sample predictions

![](https://pbs.twimg.com/media/Ea-8T2NXkAEfH6y?format=png&name=900x900)

![](https://pbs.twimg.com/media/Ea_AGceXYAYg4KT?format=jpg&name=medium)

## Sample induced bias
- The model has learned to mimic the data on which it is based
- If training data is full of white faces, then the model will re-create white faces

### Solution: 
- Train the dataset on a sample that is _representative_ of the population at large

## Biased Training Data -> Biased Models
- Amazon used ML to select good resumes
- Data: Past people hired and not hired
- Algorithm: Predict who was hired based on resume
- Problem: Most past hires were white men

Result: Model gave higher grades to white men

## Biased Training Data -> Biased Models
- A health insurance company wants to identify sickest patients to give them access to a new program
- Data: Past health insurance patients
- Algorithm: Predict who had high health costs (as a proxy for patient health)
- Problem: Black patients historically had lower spending, and were less likely to be diagnosed with health conditions

Result: Model selected mostly non-black patients for the new program



## Biased Training Data -> Biased Models

- Chatbots are trained on massive datasets of text from the internet
- Problem: Many internet posts are: 
    - Biased towards specific groups (racist, sexist, etc.)
    - Rude, insulting, or similar

Result: Chatbots can also be racist/sexist/rude, etc. 





## Biased Training Data -> Biased Models
- The model learns to mimic the data on which it is based. 


### Solutions: 
- This is a hard problem! 
- Can remove biased training data (if possible)
- Can sensor biased output (eg: ChatGPT) 
- Can analyze predictions to test for bias; tweak biased models

# Case study: Recitivism in the U.S. 
- Many states in the US are now using Machine Learning to predict how a defendantâ€™s risk of future crime
- The goal is to remove the judge bias and try to predict "objectively" based on some data who was likely to comit a crime again in the future
- A classification task

## Example continued
- When performing classification tasks, one can use a confusion matrix

|Prediction/Reality| FALSE | TRUE |
| ---| --- | --- |
|__FALSE__| True Negative | False Negative | 
|__TRUE__| False Positive | True Positive | 


## Example continued
- An important [research project](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) looked at the algorithm created by the for-profit company: Northpointe.
    - The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
    - White defendants were mislabeled as low risk more often than black defendants.
    - The algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.

## Example continued

Note: Race was not a variable used in the Northpointe analysis. 

- Many underlying variables can be correlated with race! 
- It can be possible that the machine can use other factors correlated with race to predict recidivism




## Eliminating bias in ML models

Area of current research: 
- Econometrics + Machine Learning: Combined approaches combining causal and predictive analysis
- Researchers providing guidance ([example](https://www.chicagobooth.edu/-/media/project/chicago-booth/centers/caai/docs/algorithmic-bias-playbook-june-2021) )
