---
title: "ðŸŽ¨ Selection Bias & Missing Data Challenge"

author: "Sumeet Tayal"
date: 2025-11-26
format:
  html:
    toc: true
    toc-depth: 2
execute:
  echo: true
  eval: true
---

## Visualizing Selection Bias

In [None]:
#| label: step1-prepare
#| echo: false
#| fig-cap: Statistics meme demonstrating selection bias

import numpy as np
import matplotlib.pyplot as plt
from step1_prepare_image import prepare_image

# Load and prepare the image
# CHANGE THIS to use your own image!
img_path = 'my-picture.jpeg'  # Example image - replace with your own image
gray_image = prepare_image(img_path, max_size=512)

#| label: step2-stipple
#| echo: false
#| fig-cap: "Blue noise stippling pattern"
#| message: false
#| warning: false

from step2_create_stipple import create_stipple

# Create stippled image
stipple_pattern, samples = create_stipple(
    gray_image,
    percentage=0.08,  # 8% of pixels will be stippled
    sigma=0.9,  # Repulsion radius
    content_bias=0.9,  # Strongly follow importance map
    noise_scale_factor=0.1,  # Moderate exploration
    extreme_downweight=0.5,  # Moderate downweighting of extremes
    extreme_threshold_low=0.2,  # Downweight tones below 0.2
    extreme_threshold_high=0.8,  # Downweight tones above 0.8
    extreme_sigma=0.1  # Smooth transition width
)

#| label: step3-tonal
#| echo: false
#| fig-cap: "Box-averaged tonal analysis showing brightness distribution"

from step3_create_tonal import create_tonal
import matplotlib.pyplot as plt

# Create tonal analysis with a 16Ã—12 grid
grid_rows = 16
grid_cols = 12
tonal_image, average_tones, tonal_stats = create_tonal(
    gray_image,
    grid_rows=grid_rows,
    grid_cols=grid_cols,
    return_full_image=True
)

#| label: step4-block-letter
#| echo: false
#| fig-cap: "Block letter S representing selection bias"
#| eval: false


from step4_create_block_letter import create_block_letter_s
# 
# # Get image dimensions
h, w = gray_image.shape
# 
# # Create block letter S
block_letter = create_block_letter_s(h, w, letter="S", font_size_ratio=0.9)
# 


#| label: step5-masked
#| echo: false
#| fig-cap: "Masked stippled image showing selection bias effect"
#| eval: false

from step5_create_masked import create_masked_stipple

# # Create masked stippled image
masked_stipple = create_masked_stipple(
     stipple_pattern,
     block_letter,
     threshold=0.5  # Pixels below 0.5 are considered part of the mask
)
 

#| label: create-final-meme
#| echo: false
#| eval: false

from create_meme import create_statistics_meme
# 
# # Create the final meme
create_statistics_meme(
     original_img=gray_image,
     stipple_img=stipple_pattern,
     block_letter_img=block_letter,
     masked_stipple_img=masked_stipple,
     output_path="my_statistics_meme.png",
     dpi=150,
     background_color="white"  # or "pink", "lightgray", etc.
)
#| label: final-meme
#| echo: false
#| eval: false
#| fig-cap: "Statistics meme demonstrating selection bias"

### Visualization Seletion Bias display in IMAGE:

from IPython.display import Image, display
display(Image("my_statistics_meme.png"))

::: {.callout-warning}
### Selection Bias Explanation

This four-panel visualization provides a pedagogical framework for understanding how systematic missing data mechanisms introduce bias into statistical estimation procedures. The **Reality** panel (original grayscale image) represents the true population distribution, serving as the unobservable parameter of interest in statistical inference. The **Your Model** panel (blue noise stippled image) represents the observed sample data, where each stipple point corresponds to a sampled observation from the population. The **Selection Bias** panel (block letter "S" mask) represents a systematic missing data mechanism that operates according to a non-random pattern, creating a dependency between the probability of observation and unobserved characteristics. The **Estimate** panel (masked stippled image) demonstrates the resulting biased estimator: under the missing not at random (MNAR) mechanism, the observed data constitute a non-representative subsample, causing the empirical distribution to deviate systematically from the true population distribution. This deviation introduces bias into parameter estimates, compromising the validity of subsequent statistical inference. The visual metaphor illustrates a fundamental principle in missing data theory: when the missingness mechanism is informative (i.e., dependent on unobserved values), standard statistical methods that assume missing at random (MAR) or missing completely at random (MCAR) will produce biased and inconsistent estimates.
:::