# Big Idea 5 – Computing Bias  
*Mar 17, 2025 • Avika, Gabi, Zoe*

---

## What is Computing Bias?

**Bias**: A prejudice in favor of or against a person or group in a way that is usually unfair.  

**Computing Bias** occurs when algorithms or systems produce results that disadvantage certain groups. It often arises from:
- Biased or incomplete data
- Flawed design
- Unintended consequences of programming choices

---

## Example: Netflix Recommendation Bias

Netflix uses algorithms to recommend content, but those algorithms can introduce bias by:

### Majority Preference Bias
- Recommends only popular shows, hiding niche or diverse options.

### Filtering Bias
- Filters out content based on limited viewing history.
- If you mostly watch rom-coms, you may never see documentaries or foreign films.

---

## How Does Computing Bias Happen?

1. **Unrepresentative or Incomplete Data**  
   - Models trained on limited datasets don’t reflect real-world diversity.

2. **Flawed or Biased Data**  
   - If existing data includes prejudice (e.g., historical hiring patterns), the system learns and repeats those biases.

3. **Biased Data Labeling**  
   - Human annotators may unconsciously inject cultural or personal bias during labeling.

---

## Explicit vs. Implicit Data

| Type            | Definition                               | Netflix Example                                         |
|-----------------|-------------------------------------------|----------------------------------------------------------|
| **Explicit Data** | Data directly provided by users           | Entering your name, age, or rating a movie              |
| **Implicit Data** | Data inferred from user behavior          | Viewing history, time spent watching, click patterns     |

### Why It Matters:
- Implicit data can reinforce user habits, creating feedback loops that limit discovery.
- Explicit data may still be biased if limited by design or user understanding.

---

## Popcorn Hack #1

**Question:** What is an example of Explicit Data?  
**Options:**  
A) Netflix recommends shows based on your viewing history  
B) You provide your name, age, and preferences when creating a Netflix account  
C) Netflix tracks the time you spend watching certain genres  

**Answer:** **B** – This is explicit data, because it's provided directly by the user.

---

## Types of Bias

### Algorithmic Bias
- Comes from faulty system logic that repeats discrimination.  
**Example:** Amazon's hiring tool favored men because it was trained on past hiring data that was male-dominated.

### Data Bias
- Arises when training data is incomplete or unbalanced.  
**Example:** A health AI system underestimates disease risk for underrepresented groups.

### Cognitive Bias
- Introduced by researchers or developers due to personal assumptions.  
**Example:** A researcher only selects data supporting their belief about screen time affecting grades.

---

## Popcorn Hack #2

**Question:** What is an example of Data Bias?  
**Options:**  
A) A hiring algorithm favors men due to biased past resumes  
B) A dataset underrepresents people with darker skin tones  
C) A researcher selects data that supports their screen time theory  

**Answer:** **B** – Underrepresentation in data leads to performance issues for certain groups.

---

## Intentional vs. Unintentional Bias

### Intentional Bias
- Purposefully embedding prejudice to favor one group.  
**Example:** A hiring algorithm is designed to rank resumes from certain schools or companies higher, favoring specific demographics.

### Unintentional Bias
- Occurs accidentally due to flawed datasets.  
**Example:** A facial recognition tool trained on mostly light-skinned faces struggles to recognize darker skin tones—not due to intent, but poor data variety.

---

## Popcorn Hack #3

**Activity:** Describe a biased scenario. Have classmates guess: was it intentional or unintentional?

---

## Mitigation Strategies

To reduce bias in algorithms, apply these techniques at every phase:

### 1. Pre-processing (Planning & Data Collection)
- Check for data diversity and completeness
- Remove irrelevant or biased variables

**Goal:** Prepare balanced data to avoid bias in training.

---

### 2. In-processing (Training & Validation)
- Use cross-validation
- Add synthetic data to represent minorities

**Goal:** Ensure fairness during model development.

---

### 3. Post-processing (Deployment & Real-World Use)
- Monitor system performance
- Adjust output if unfair results appear

**Goal:** Maintain equity as the model operates in real settings.

---

## Homework Questions

### Multiple Choice  
*(Each worth 0.1 points)*

1. Which phase includes inserting synthetic samples?  
2. What is an example of cognitive bias?  
3. What’s the key difference between implicit and explicit data?  
4. Which type of bias occurs due to flawed system logic?

*(More questions provided in-class or online)*

---

### Short-Answer

**Prompt:**  
Explain the difference between implicit and explicit data. Give an example of each.

**Scoring Rubric (Total: 1.0 point):**

| Criteria                      | Description                               | Points |
|------------------------------|-------------------------------------------|--------|
| Multiple-Choice (7 total)    | 0.1 point each                            | 0.7    |
| Short-Answer - Clarity       | Clear explanation                         | 0.15   |
| Short-Answer - Examples      | Two accurate examples provided            | 0.15   |

---

## Suggested File Name