# Class 8 Notebook – Bias and Ethics in AI/ML Basics

This notebook introduces **bias and ethics** in AI and machine learning.

As ML systems are deployed in hiring, lending, healthcare, and justice, understanding bias and fairness becomes essential:
- **Data bias** – Training data reflects historical inequities
- **Algorithmic bias** – Models can amplify or introduce new bias
- **Fairness** – Different definitions (e.g., demographic parity, equalized odds) and trade-offs

**Objective**: Set up the environment and explore core concepts for thinking critically about bias and responsible AI.

**Key ideas**:
- Bias can enter at data collection, labeling, feature selection, and model design
- Fairness metrics can conflict; there is no single "correct" definition
- Responsible AI includes transparency, interpretability, and ongoing monitoring

Run the first code cell to confirm your environment works.

## Run in the browser (no local setup)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adzuci/ai-fundamentals/blob/class-8-bias-and-ethics/class-8-bias-and-ethics/01_class_8_bias_and_ethics_basics.ipynb)

> Tip: This notebook assumes you're comfortable with basic Python, pandas, and scikit-learn from earlier classes.

## STEP 1: Environment check and imports

Verify that NumPy, pandas, and scikit-learn are available for building simple classifiers and analyzing predictions.

In [1]:
# Concept: Environment sanity check for bias/ethics notebook
import platform
import numpy as np
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

print(f"Python: {platform.python_version()}")
print(f"OS: {platform.system()} {platform.release()}")
print(f"NumPy: {np.__version__}")
print(f"pandas: {pd.__version__}")
print(f"scikit-learn: {sklearn.__version__}")
print("All libraries imported successfully!")

Python: 3.10.14
OS: Darwin 25.2.0
NumPy: 1.26.4
pandas: 2.3.3
scikit-learn: 1.7.2
All libraries imported successfully!


## What is bias in ML?

**Bias** in machine learning refers to systematic errors or unfairness in model behavior—often toward or against certain groups.

- **Data bias**: Training data underrepresents groups, reflects historical discrimination, or has labeling errors that correlate with protected attributes.
- **Algorithmic bias**: The model itself (e.g., regularization, threshold choices) produces different error rates or outcomes across groups.
- **Feedback loops**: Deployed models influence future data (e.g., predictive policing), reinforcing existing bias.

In later cells, we'll add examples and simple fairness checks.

## STEP 2: Create a toy dataset for bias exploration

We create a simple dataset with **experience**, **test_score**, and **gender**—typical features in hiring or performance contexts. We'll use it to explore how models behave across groups and discuss fairness.

In [2]:
# Concept: Create dataset for bias exploration
# import Libraries
import numpy as np
import pandas as pd

np.random.seed(42)
# Create Dataset
mydata = pd.DataFrame({
    "exp": np.random.randint(0, 10, 100),
    "test_score": np.random.randint(50, 100, 100),
    "gender": np.random.choice(["Male", "Female"], 100)
})

mydata.head(10)

Unnamed: 0,exp,test_score,gender
0,6,61,Male
1,3,83,Female
2,7,82,Male
3,4,97,Male
4,6,72,Male
5,9,73,Female
6,2,86,Female
7,6,84,Male
8,7,93,Female
9,4,89,Female


## STEP 3: Introduce bias into the dataset

We create a **hired** column: first based on merit (exp > 5 and test_score > 70), then we **manually add bias** so males have a 70% chance of being marked hired regardless of qualifications. This simulates the kind of historical bias that can appear in real hiring data.

In [3]:
# Concept: Introduce bias (males have higher chance of being hired)
mydata["hired"] = ((mydata["exp"] > 5) & (mydata["test_score"] > 70)).astype(int)

# Add bias manually
mydata.loc[mydata["gender"] == "Male", "hired"] = np.where(
    np.random.rand(len(mydata[mydata["gender"] == "Male"])) > 0.3, 1, 0
)

In [4]:
mydata.head(10)

Unnamed: 0,exp,test_score,gender,hired
0,6,61,Male,1
1,3,83,Female,0
2,7,82,Male,1
3,4,97,Male,0
4,6,72,Male,0
5,9,73,Female,1
6,2,86,Female,0
7,6,84,Male,1
8,7,93,Female,1
9,4,89,Female,0
