# Tutorial 7-2: Privacy by Design â€“ "Automated Anonymization"

**Course:** CSEN 342: Deep Learning  
**Topic:** AI Ethics, Privacy, Face Detection, and Data Anonymization

## Objective
As discussed in the lecture, large-scale datasets often contain images of people who did not consent to be included. Slide 13 explicitly recommends that researchers should *"Blur out or otherwise disguise recognizable individuals"*  to protect privacy.

In this tutorial, we will build a **Privacy Pipeline**. We will:
1.  **Load a Face Detector:** Use a lightweight computer vision model to locate faces in an image.
2.  **Implement Anonymization Filters:** Write functions to apply **Gaussian Blur** and **Pixelation** to specific regions of interest (ROIs).
3.  **Process a Crowd:** Apply this pipeline to a group photo to automatically redact identities.

---

## Part 1: Setup and Detection

We will use OpenCV's Haar Cascade classifier. While older than Deep Learning methods, it is incredibly fast and standard for simple face detection tasks.

First, we download the necessary XML configuration.

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import os
import sys

# Import our util to get the XML file
sys.path.append(os.path.abspath(os.path.join('..')))
from utils import download_haarcascade

download_haarcascade()

# Load the detector
face_cascade = cv2.CascadeClassifier('../data/haarcascade_frontalface_default.xml')

# Helper to display images in Jupyter
def show_img(img, title="Image"):
    # Convert BGR (OpenCV standard) to RGB (Matplotlib standard)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(10, 6))
    plt.imshow(img_rgb)
    plt.title(title)
    plt.axis('off')
    plt.show()

print("Face Detector Loaded.")

### 1.1 Load a Crowd Image
We need an image with multiple people. We will download a stock photo of a crowd.

In [None]:
# Download a sample crowd image
img_path = '../data/crowd.jpg'
if not os.path.exists(img_path):
    # URL to a public domain crowd image from Pexels/Wikimedia
    !wget -O {img_path} https://citizensandtech.org/wp-content/uploads/2019/11/1024px-CivilServants_2019.jpg

img = cv2.imread(img_path)
show_img(img, "Original Crowd")

---

## Part 2: The Anonymizer Functions

We need two strategies to hide identity:
1.  **Blurring:** Applying a strong Gaussian filter.
2.  **Pixelation:** Downscaling the region to a tiny size (removing high-frequency information) and then upscaling it back.

Let's implement these as helper functions.

In [None]:
def anonymize_blur(image, factor=3.0):
    """
    Applies a Gaussian blur to the image chunk.
    Factor determines kernel size relative to image dimensions.
    """
    (h, w) = image.shape[:2]
    kW = int(w / factor)
    kH = int(h / factor)
    
    # Ensure kernel size is odd
    if kW % 2 == 0: kW -= 1
    if kH % 2 == 0: kH -= 1
    
    return cv2.GaussianBlur(image, (kW, kH), 0)

def anonymize_pixelate(image, blocks=10):
    """
    Pixelates the image chunk by resizing down and back up.
    """
    (h, w) = image.shape[:2]
    
    # Resize small (loss of information)
    small = cv2.resize(image, (blocks, blocks), interpolation=cv2.INTER_LINEAR)
    
    # Resize back to original size (nearest neighbor to keep blocky look)
    return cv2.resize(small, (w, h), interpolation=cv2.INTER_NEAREST)

print("Anonymization functions defined.")

---

## Part 3: The Privacy Pipeline

Now we combine them. 
1.  Convert image to Grayscale (needed for Haar detection).
2.  Detect faces ($x, y, w, h$).
3.  Loop through every face, extract the ROI (Region of Interest), apply the filter, and paste it back.

In [None]:
def protect_privacy(image, method='blur'):
    # Work on a copy
    result = image.copy()
    gray = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
    
    # Detect faces
    # scaleFactor=1.1, minNeighbors=5 are standard tuning params
    faces = face_cascade.detectMultiScale(gray, 1.1, 5)
    
    print(f"Detected {len(faces)} faces.")
    
    for (x, y, w, h) in faces:
        # Extract ROI
        face_roi = result[y:y+h, x:x+w]
        
        # Apply Filter
        if method == 'blur':
            anonymized_roi = anonymize_blur(face_roi, factor=3.0)
        elif method == 'pixelate':
            anonymized_roi = anonymize_pixelate(face_roi, blocks=8)
            
        # Replace the face area with the anonymized version
        result[y:y+h, x:x+w] = anonymized_roi
        
        # Optional: Draw a rectangle to indicate detection
        cv2.rectangle(result, (x, y), (x+w, y+h), (0, 255, 0), 2)
        
    return result

# Run Pipeline
blurred_crowd = protect_privacy(img, method='blur')
pixelated_crowd = protect_privacy(img, method='pixelate')

# Visualization
plt.figure(figsize=(15, 10))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(blurred_crowd, cv2.COLOR_BGR2RGB))
plt.title("Method: Gaussian Blur")
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(pixelated_crowd, cv2.COLOR_BGR2RGB))
plt.title("Method: Pixelation")
plt.axis('off')

plt.show()

### Conclusion
We have successfully anonymized the individuals in this photo.

**Ethical Consideration:** 
While this protects identity from casual observation, is it foolproof? 
In **Slide 34**, we saw the "PULSE" model, which attempts to upsample pixelated faces. While PULSE often hallucinates *new* faces rather than recovering the original, the arms race between **Anonymization** and **De-anonymization (re-identification)** is a core topic in modern AI privacy research.