# Bias - An Introduction


**Topic:** Bias introduction

**Goal:** You will learn the meaning of biases and discrimination, and what they mean in a data science context. You will be able to explain the importance of being aware of biases and get a sneak peak into laws and protected variables.

**Requirements:** We recommend watching the Lecture video XXX.

**Attention:** To start notebook click on Run > Run All Cells

Have fun!

In [24]:
%%capture 
# to hide output, tool: ipython-extensions
#!pip install import-ipynb
#!pip install jupyterquiz
#!pip install md2json
#!pip install hide_code
#!pip install nbconvert
#!pip install import-ipynb
#!pip install ipython-extensions
#!pip install jupytercards
#!pip install watermark

# import

from ipywidgets import TwoByTwoLayout

from jupyterquiz import display_quiz 
%store -r color_dict # Load quiz colors from different file

from jupytercards import display_flashcards

import ipywidgets as widgets
import md2json
import json
from IPython.display import HTML
import random
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
import json
import import_ipynb

#%watermark -p numpy,matplotlib,seaborn

## What is bias?

Bias is a word with many meanings attached. Maybe you are thinking of prejudice or about words we discussed such as fairness.

You may think of cognitive biases, when you hesr the term bias. As Wikipedia shows you, there are countless biases already recognized as seen in this long list of biases [long list of biases](https://en.wikipedia.org/wiki/List_of_cognitive_biases#cite_note-1). As described by Arnott (2006):
>*"Cognitive biases are cognitions or mental behaviours that prejudice decision quality in a significant number of decisions for a significant number of people; they are inherent in human reasoning. Cognitive biases are often called decision biases or judgement biases. One way of viewing cognitive biases is as predictable deviations from rationality. A rational choice is one
based on the decision-maker’s current assets and the possible consequences of the choice [...]."*

Arnott (2006) lists 37 biases, ranging from *"memory, statistical, confidence, adjustment, presentation and situation biases"*. On the more psychological or social side is the confidence bias, which *"act[s] to increase a person’s confidence in his or her prowess as a decision-maker.* More of interest for data scientists might be the statistical bias, that focuses on the *"tendency of humans to process information contrary to the normative principles of probability theory"* (Arnott, 2006).

These biases certainly influence our thinking, behaviour and decision making processes. Hence, they influence you, me and humans in general. Being aware of how we think helps us reflect and being aware of our reasoning and to act more logically.
We recommend skimming the [bias wikipedia article](https://en.wikipedia.org/wiki/List_of_cognitive_biases#cite_note-1) or [Cognitive biases and decision support systems development: a design science approach](https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-2575.2006.00208.x) by Arnott (2006) if your are interested. Most university students in Germany have access to such articles, mostly on campus or via university VPN.


<div class="alert alert-block alert-danger">
TO DO:
Quiz format, aber mit anderen Fragen, zum anklicken von ein paar Biases -> dazu sätze von arnott kopieren für einen überblick
</div>

## What is bias in this course?

In our course we have a very specific of bias, as we relate it to data science practice and where bias plays a role in that practice. Here we are referring to very specific bias, which may not only come in form of cognitive biases, but also biases in data or algorithms.
If you watched our lecture video about bias you already know about the data science cycle and that bias can play a role in every step. This visualisation should be familiar for you:

![DS_cycle_biases](Attachments/DS_cycle_biases.png)

To further explore what bias means in our context, let's dive deeper into terminology.

Bias and discrimination can be seen interconnected concepts.
Let's explore the difference between such concepts to explain to you what we mean with *bias*!

In [25]:
with open('21_Quizzes/q_4_diff.json') as f: data = json.load(f)
display_quiz(data, border_radius=10, colors = color_dict)

<IPython.core.display.Javascript object>

If we understand discrimination more as a distinction, we could think of treating groups of people differently, which may be wanted or even needed. For example, people with disabilities may need additional health and financial resources to live in dignity. In this unit we do not refer to such a wanted disctinction as discrimination. To clarifiy that, the definition above specifies that we are talking about an *unfair* discrimination with an unwanted outcome.

## Biased based on what?

So we are speaking about "discrimination of certain individuals or groups". 
The next question you may ask yourself is discriminated based on what?

Features/Attributes that might lead to discrimination or bias are often called sensitive features/attributes.

Mark in the following quiz, what you think may be sensitive variables in a tabular data set you may have to work with.
Imagine the data in the context of people applying for a data science job. (We recommend not scrolling fruther, so that you do not spoiler yourself.)

In [26]:
with open('21_Quizzes/q_4_bias_sensitive_vars.json') as f: data = json.load(f)
display_quiz(data, border_radius=10, colors = color_dict)

<IPython.core.display.Javascript object>

Maybe you disagree with how we categorized some examples. There may be cases, where for example it can be unfair discrimination based on people smoking or not. One example are debates about changing health insurance contributions based on smoking, doing sports or similar. Therefore, our categorizations are not set in stone. 

We can learn from that, that **context matters**. When handling data we should take context into consideration.
Besides reflecting about the consequences of the data we use, there is a legal frame we have to consider. Some variables and attributes are protected by law - protected variables.

In Germany based on Article 3 on the Basic Law for the Federal Republic of Germany some personal attributes are protected.
Below is the quiz from above, but this time, please choose the attributes, you believe are protected by german law.

In [27]:
with open('21_Quizzes/q_4_bias_protected_vars.json') as f: data = json.load(f)
display_quiz(data, border_radius=10, colors = color_dict)

<IPython.core.display.Javascript object>

Some answer may surprise you. Why are some sensitive variables, which seem to be targeted to unfairly discriminate, not protected by Article 3?

Besides Art. 3 Germany also has Art. 9 DSGVO which protects for example biometrics. Other Articles also come into play. Keep in mind, this only refers to Germany, other countries may have vastly different values and laws regarding protecting attributes. Also, this course does not focus the german law in detail, if this topic is important for you we recommend taking additional courses. 

To understand why even the law is involved, we want to show you some examples.

In [30]:
with open('21_Quizzes/q_4_bias_favouring.json') as f: data = json.load(f)
display_quiz(data, border_radius=10, colors = color_dict)

<IPython.core.display.Javascript object>

## Negative consequences

## Example: Toeslagenaffaire or the Dutch childcare benefits scandal

📚 Please read the following [english article](https://www.politico.eu/article/dutch-scandal-serves-as-a-warning-for-europe-over-risks-of-using-algorithms/) explaining the Toeslagenaffaire (pdf also attached in Materials).

Please think about and write down answers to the following questions:

1. What was the sensitive attribute here?
2. Was the attribute / variable protected? (You can refer to the german law here)
3. Who or what group was discriminated against?
5. What could have been done to prevent this?

<div class="alert alert-block alert-danger">
TO DO: Hier können wir uns nochmal draufbeziehen, bei den Zoo of biases. Welche bias kam hier vor, an welcher DS cycle stelle?
</div>

<div class="alert alert-block alert-warning">
TO DO: In den Assignments schreiben, dass aufgeschriebenes bitte in TXT oder so geschieht. Die LU selbst für alle lesbarer lassen.
</div>

# Read further

1. Arnott, D. (2006). Cognitive biases and decision support systems development: a design science approach. Information Systems Journal, 16(1), 55-78.

2. Tommaso Di Noia, Nava Tintarev, Panagiota Fatourou, and Markus Schedl. Recommender systems under european ai regulations. Communications of the ACM, 65(4):69–73, 2022.

3. Palma Pagano, T., Bessa Loureiro, R., Vitória Nascimento Lisboa, F., Oliveira Ramos Cruz, G., Matos Peixoto, R., Aragão de Sousa Guimarães, G., ... & Giovani Sperandio Nascimento, E. (2022). Bias and unfairness in machine learning models: a systematic literature review. arXiv e-prints, arXiv-2202. https://arxiv.org/pdf/2202.08176.pdf