# CHAPTER 11: Visualizing the Invisible

**Pages:** TBD  
**Word Count:** ~4,600 words  
**Figures:** 4

---

## Chapter Overview

**Key Learning Objectives:**
- Understand the power of data visualization to reveal patterns
- Learn the four principles of effective visualization: Clarity, Honesty, Efficiency, Beauty
- Recognize how visualizations can mislead or deceive
- Practice iterative design to improve visual communication
- Apply visualization skills to real community problems

**Historical Context:**
- John Snow's 1854 cholera map that changed medical history
- Florence Nightingale's pioneering rose diagrams
- The evolution of statistical graphics

**Real-World Application:**
- Creating flood risk maps for village planning
- Presenting to panchayat (village council)
- Communicating uncertainty to non-technical audiences

---

*"Numbers alone don't convince people. Humans are visual creatures. We think in pictures, patterns, shapes. The same data can tell completely different stories depending on how you show it."* - Professor Mishra

---

## Setup: Python Libraries

Let's import all the tools we'll need for this chapter's visualizations.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# For better visualizations
import matplotlib.patches as mpatches
from matplotlib.patches import Rectangle, Circle
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

# For reproducibility
np.random.seed(42)

print("‚úÖ Libraries loaded successfully!")
print("Ready to visualize the invisible...")

---

## Part 1: The Story Begins

### Two Weeks After the Newspaper Article

Ananya's phone buzzed with notifications all morning. The *Sambalpur Times* had run their story with the headline "Local Teen Uses Math to Help Farmers" and included a photo of her and Uncle Bikram standing by his paddy fields.

But Professor Mishra had been quieter than usual. He'd congratulated her on the article, but then grown thoughtful.

*"You've learned to build models. That's excellent. But now you need to learn something equally important‚Äîhow to communicate them."*

That's why they were at his house on Saturday afternoon, for what he called "a critical lesson."

Professor Mishra had transformed his living room into a makeshift museum. Papers covered the walls‚Äîgraphs, charts, maps, diagrams spanning from faded historical reproductions to glossy modern prints.

*"Welcome to the brief history of data visualization."*

---

## Part 2: Historical Lesson - John Snow's Cholera Map (1854)

### The Power of a Single Map

In 1854, London was being ravaged by a cholera outbreak. The prevailing theory was that cholera spread through "bad air" (miasma theory). Dr. John Snow suspected contaminated water was the culprit.

He created a simple map: streets of Soho with dots marking each death from cholera, and pumps marked with their locations.

**The pattern was unmistakable:** Deaths clustered densely around the Broad Street pump.

Let's recreate a simplified version:

In [None]:
# Simplified recreation of John Snow's cholera map
# Creating synthetic data that mimics the pattern

np.random.seed(1854)  # Year of the original map!

# Broad Street pump location (contaminated)
broad_street_pump = np.array([0.5, 0.5])

# Other pump locations (clean water)
other_pumps = np.array([
    [0.1, 0.2],
    [0.8, 0.1],
    [0.9, 0.8],
    [0.2, 0.9]
])

# Generate death locations - clustered around Broad Street pump
n_deaths = 150
death_locations = []

for i in range(n_deaths):
    # 80% of deaths near Broad Street pump
    if np.random.random() < 0.8:
        # Deaths cluster around contaminated pump
        offset = np.random.normal(0, 0.08, 2)
        location = broad_street_pump + offset
    else:
        # Some deaths elsewhere (random noise)
        location = np.random.random(2)
    
    death_locations.append(location)

death_locations = np.array(death_locations)

# Create the map
fig, ax = plt.subplots(figsize=(12, 10))

# Plot deaths as black dots
ax.scatter(death_locations[:, 0], death_locations[:, 1], 
           c='black', s=30, alpha=0.6, marker='o', label='Cholera Deaths')

# Plot Broad Street pump (contaminated) - larger, red
ax.scatter(broad_street_pump[0], broad_street_pump[1], 
           c='darkred', s=400, marker='s', edgecolors='black', linewidths=2,
           label='Broad Street Pump (Contaminated)', zorder=5)

# Plot other pumps (clean) - blue
ax.scatter(other_pumps[:, 0], other_pumps[:, 1], 
           c='steelblue', s=300, marker='s', edgecolors='black', linewidths=2,
           label='Other Pumps (Clean Water)', zorder=5)

# Add street grid for context
for i in np.arange(0, 1.1, 0.1):
    ax.axvline(i, color='gray', linewidth=0.5, alpha=0.3)
    ax.axhline(i, color='gray', linewidth=0.5, alpha=0.3)

# Labels and formatting
ax.set_xlim(-0.05, 1.05)
ax.set_ylim(-0.05, 1.05)
ax.set_xlabel('Distance (arbitrary units)', fontsize=12)
ax.set_ylabel('Distance (arbitrary units)', fontsize=12)
ax.set_title('John Snow\'s Cholera Map (1854) - Simplified Recreation\n" One map, clearer than a thousand words"', 
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', fontsize=11, framealpha=0.9)
ax.set_aspect('equal')

# Add annotation
ax.annotate('Deaths cluster around\ncontaminated pump', 
            xy=(broad_street_pump[0], broad_street_pump[1]), 
            xytext=(0.7, 0.4),
            fontsize=11, 
            bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.7),
            arrowprops=dict(arrowstyle='->', lw=2, color='darkred'))

plt.tight_layout()
plt.show()

print("\nüìä Historical Significance:")
print("- Snow's map revealed the pattern that words couldn't")
print("- Led to removal of Broad Street pump handle")
print("- Helped establish germ theory of disease")
print("- One of the first uses of spatial analysis in epidemiology")

### What Made This Map Powerful?

**Professor Mishra explained:**

*"The prevailing theory was miasma‚Äîbad air. Everyone was looking for atmospheric patterns. But Snow looked at geography. He plotted deaths as dots, pumps as landmarks. The pattern screamed at you. Deaths clustered around one pump."*

*"Without the map, he had numbers‚Äî'150 deaths in Broad Street area.' With the map, he had proof‚Äî'look, they're all near this one pump.' He convinced the local council to remove the pump handle. Cholera cases dropped immediately."*

**Key Lesson:** Sometimes the right visualization is more convincing than any statistical test.

---

## Part 3: Florence Nightingale's Rose Diagrams (1858)

### The Lady with the Lamp... and the Graph

Florence Nightingale is famous for being a pioneering nurse, but she was also a brilliant statistician. During the Crimean War (1853-1856), she noticed that more soldiers were dying from preventable diseases than from battle wounds.

To convince the British military establishment, she created revolutionary "coxcomb" or "rose" diagrams showing causes of death by month.

Let's recreate a simplified version:

In [None]:
# Florence Nightingale's Rose Diagram (Simplified Recreation)

# Data: Monthly deaths by cause (simplified for illustration)
months = ['Apr 1854', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan 1855', 'Feb', 'Mar']

# Deaths from different causes (scaled for visualization)
disease_deaths = np.array([150, 310, 395, 520, 480, 490, 245, 160, 130, 85, 75, 70])
battle_deaths = np.array([10, 15, 30, 50, 45, 40, 35, 25, 20, 15, 10, 12])
other_deaths = np.array([8, 12, 18, 22, 20, 18, 15, 12, 10, 8, 7, 6])

# Create polar plot (rose diagram)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7), subplot_kw=dict(projection='polar'))

# Calculate angles for 12 months
angles = np.linspace(0, 2 * np.pi, 12, endpoint=False)
width = 2 * np.pi / 12

# First subplot: Disease deaths (the shocking revelation)
bars1 = ax1.bar(angles, disease_deaths, width=width, 
                color='steelblue', alpha=0.8, edgecolor='navy', linewidth=2)

ax1.set_theta_zero_location('N')
ax1.set_theta_direction(-1)
ax1.set_xticks(angles)
ax1.set_xticklabels(months, fontsize=9)
ax1.set_ylim(0, max(disease_deaths) * 1.1)
ax1.set_title('Deaths from Preventable Diseases\n(The Hidden Crisis)', 
              fontsize=13, fontweight='bold', pad=20)
ax1.grid(True, alpha=0.3)

# Second subplot: All causes combined (the full story)
bars2_disease = ax2.bar(angles, disease_deaths, width=width, 
                        color='steelblue', alpha=0.8, edgecolor='navy', 
                        linewidth=1.5, label='Disease (Preventable)')
bars2_battle = ax2.bar(angles, battle_deaths, width=width, bottom=disease_deaths,
                       color='darkred', alpha=0.8, edgecolor='maroon', 
                       linewidth=1.5, label='Battle Wounds')
bars2_other = ax2.bar(angles, other_deaths, width=width, 
                      bottom=disease_deaths + battle_deaths,
                      color='pink', alpha=0.7, edgecolor='darkred', 
                      linewidth=1.5, label='Other Causes')

ax2.set_theta_zero_location('N')
ax2.set_theta_direction(-1)
ax2.set_xticks(angles)
ax2.set_xticklabels(months, fontsize=9)
ax2.set_ylim(0, max(disease_deaths + battle_deaths + other_deaths) * 1.1)
ax2.set_title('All Causes of Death Combined\n(Blue = Disease, Red = Battle, Pink = Other)', 
              fontsize=13, fontweight='bold', pad=20)
ax2.legend(loc='upper left', bbox_to_anchor=(1.15, 1), fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä The Shocking Truth:")
print(f"- Total disease deaths: {disease_deaths.sum():,}")
print(f"- Total battle deaths: {battle_deaths.sum():,}")
print(f"- Disease killed {disease_deaths.sum() / battle_deaths.sum():.1f}x more soldiers than combat")
print("\nüí° Nightingale's Insight:")
print("- The blue area (disease) dwarfs the red (battle)")
print("- Most deaths were PREVENTABLE with better sanitation")
print("- This graph changed military medicine forever")

### The Power of Visual Rhetoric

**Professor Mishra's explanation:**

*"Nightingale could have written a report: 'Dear War Office, disease kills 10 times more soldiers than combat.' They'd have filed it away. Instead, she made these rose diagrams. Look at them‚Äîthe blue wedges for disease deaths dwarf the tiny red wedges for battle deaths. One glance, and you understand the tragedy."*

*"She used these graphs to advocate for hospital reform, proper sanitation, better ventilation. She saved countless lives‚Äînot with medicine alone, but with data and its visualization."*

**Key Lesson:** Good visualization can change policy and save lives.

---

## Part 4: The Four Principles of Good Visualization

Professor Mishra shared his framework for creating effective visualizations:

### 1. CLARITY: Can your grandmother understand it?
- Remove clutter
- Label everything clearly
- Use intuitive colors and shapes
- One main message per graph

### 2. HONESTY: Does it tell the truth?
- Start y-axis at zero (usually)
- Don't cherry-pick data
- Show uncertainty when it exists
- Avoid misleading scales

### 3. EFFICIENCY: Does it use ink wisely?
- Maximize data-to-ink ratio
- Remove unnecessary decorations
- Choose the right chart type
- Make comparisons easy

### 4. BEAUTY: Does it invite exploration?
- Thoughtful color choices
- Balanced composition
- Professional appearance
- Accessible to colorblind viewers

**Professor's motto:** *"Clarity first. Then honesty. Then efficiency. Beauty comes last, but it matters."*

---

## Part 5: Same Data, Different Stories

### Demonstration: How Visualization Choice Affects Perception

Let's use Western Odisha rainfall data to show how the same data can tell different stories depending on how we visualize it.

In [None]:
# Generate realistic rainfall data for Western Odisha
np.random.seed(2024)
years = np.arange(1974, 2024)
n_years = len(years)

# Simulate rainfall with slight declining trend and high variability
base_rainfall = 1100  # mm per year
trend = -1.5  # mm decline per year
noise = 180  # high year-to-year variability

rainfall = base_rainfall + trend * np.arange(n_years) + np.random.normal(0, noise, n_years)
rainfall = np.maximum(rainfall, 600)  # No negative values

# Create dataframe
df_rainfall = pd.DataFrame({
    'Year': years,
    'Rainfall_mm': rainfall
})

# Add decade column
df_rainfall['Decade'] = (df_rainfall['Year'] // 10) * 10

print("Sample of rainfall data:")
print(df_rainfall.head(10))
print(f"\nData range: {df_rainfall['Year'].min()} to {df_rainfall['Year'].max()}")
print(f"Mean rainfall: {df_rainfall['Rainfall_mm'].mean():.1f} mm")
print(f"Std deviation: {df_rainfall['Rainfall_mm'].std():.1f} mm")

In [None]:
# Now let's create FOUR DIFFERENT visualizations of the SAME data

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. HISTOGRAM - Shows Distribution
ax1 = axes[0, 0]
ax1.hist(df_rainfall['Rainfall_mm'], bins=15, color='steelblue', 
         edgecolor='navy', alpha=0.7)
ax1.axvline(df_rainfall['Rainfall_mm'].mean(), color='darkred', 
            linestyle='--', linewidth=2, label=f'Mean: {df_rainfall["Rainfall_mm"].mean():.0f} mm')
ax1.set_xlabel('Annual Rainfall (mm)', fontsize=12)
ax1.set_ylabel('Frequency (number of years)', fontsize=12)
ax1.set_title('1. Histogram: What\'s the Distribution Shape?\n(Reveals: Roughly normal, high variability)', 
              fontsize=12, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)

# 2. SCATTER PLOT - Shows Temporal Trend
ax2 = axes[0, 1]
ax2.scatter(df_rainfall['Year'], df_rainfall['Rainfall_mm'], 
            color='steelblue', alpha=0.6, s=50)
# Add trend line
z = np.polyfit(df_rainfall['Year'], df_rainfall['Rainfall_mm'], 1)
p = np.poly1d(z)
ax2.plot(df_rainfall['Year'], p(df_rainfall['Year']), 
         "r--", linewidth=2, label=f'Trend: {z[0]:.2f} mm/year')
ax2.set_xlabel('Year', fontsize=12)
ax2.set_ylabel('Annual Rainfall (mm)', fontsize=12)
ax2.set_title('2. Scatter Plot: Is There a Trend Over Time?\n(Reveals: Slight decline, ~75 mm over 50 years)', 
              fontsize=12, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

# 3. LINE GRAPH - Shows Temporal Pattern
ax3 = axes[1, 0]
ax3.plot(df_rainfall['Year'], df_rainfall['Rainfall_mm'], 
         color='steelblue', linewidth=2, marker='o', markersize=4)
ax3.axhline(df_rainfall['Rainfall_mm'].mean(), color='darkred', 
            linestyle='--', linewidth=2, alpha=0.7, label='Overall Mean')
ax3.fill_between(df_rainfall['Year'], 
                  df_rainfall['Rainfall_mm'].mean() - df_rainfall['Rainfall_mm'].std(),
                  df_rainfall['Rainfall_mm'].mean() + df_rainfall['Rainfall_mm'].std(),
                  color='red', alpha=0.1, label='¬±1œÉ range')
ax3.set_xlabel('Year', fontsize=12)
ax3.set_ylabel('Annual Rainfall (mm)', fontsize=12)
ax3.set_title('3. Time Series: How Does Rainfall Vary Year to Year?\n(Reveals: High fluctuation, some extreme years)', 
              fontsize=12, fontweight='bold')
ax3.legend(fontsize=10)
ax3.grid(True, alpha=0.3)

# 4. BOX PLOT BY DECADE - Shows Changes Over Time
ax4 = axes[1, 1]
decade_order = sorted(df_rainfall['Decade'].unique())
bp = ax4.boxplot([df_rainfall[df_rainfall['Decade'] == d]['Rainfall_mm'].values 
                   for d in decade_order],
                  labels=[f"{d}s" for d in decade_order],
                  patch_artist=True,
                  medianprops=dict(color='darkred', linewidth=2),
                  boxprops=dict(facecolor='steelblue', alpha=0.6))
ax4.set_xlabel('Decade', fontsize=12)
ax4.set_ylabel('Annual Rainfall (mm)', fontsize=12)
ax4.set_title('4. Box Plot by Decade: How Do Different Decades Compare?\n(Reveals: Median declining, variability consistent)', 
              fontsize=12, fontweight='bold')
ax4.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nüéØ Key Insight:")
print("Same data, four different insights:")
print("1. Histogram ‚Üí Shows overall distribution is roughly normal")
print("2. Scatter ‚Üí Reveals slight declining trend over decades")
print("3. Time series ‚Üí Shows extreme year-to-year variability")
print("4. Box plots ‚Üí Confirms medians declining by decade")
print("\nüí° Professor's Wisdom: Choose visualization to answer YOUR question!")

### Ananya's Realization

Looking at these four graphs, Ananya understood Professor's point:

*"If I want to show Uncle Bikram's year was unusual, I use the time series with confidence intervals. If I want to show the insurance company that rainfall is declining over decades, I use box plots. If I want to show farmers the overall pattern, I use the histogram."*

**Professor nodded:** *"Exactly. Same data, different questions, different visualizations. That's the art of it."*

---

## Part 6: Good vs. Misleading Visualizations

### The Ethics of Visualization

**Professor's warning:** *"With power comes responsibility. You can use visualization to clarify‚Äîor to deceive. Let me show you the dark side."*

### Common Ways Graphs Lie:

1. **Truncated Y-axis** - Makes small differences look huge
2. **Cherry-picked time periods** - Shows only the convenient data
3. **Inappropriate scales** - Logarithmic when linear is honest
4. **3D effects** - Distort perception of relative sizes
5. **Missing error bars** - Hides uncertainty
6. **Misleading colors** - Suggests differences that don't exist

Let's demonstrate:

In [None]:
# Demonstration: How to Lie with Statistics (and Graphics)

# Scenario: Two crops with very similar yields
crops = ['Crop A', 'Crop B']
yields = [1000, 1050]  # Only 5% difference!

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# MISLEADING VERSION - Truncated Y-axis
ax1.bar(crops, yields, color=['red', 'green'], alpha=0.7, edgecolor='black', linewidth=2)
ax1.set_ylim(950, 1100)  # Truncated! Makes difference look huge
ax1.set_ylabel('Yield (kg/hectare)', fontsize=12)
ax1.set_title('‚ùå MISLEADING: "Crop B Yields 50% More!"\n(Truncated Y-axis exaggerates difference)', 
              fontsize=12, fontweight='bold', color='darkred')
ax1.grid(True, alpha=0.3, axis='y')

# Add deceptive annotation
ax1.annotate('', xy=('Crop B', 1050), xytext=('Crop B', 1000),
            arrowprops=dict(arrowstyle='<->', color='red', lw=2))
ax1.text(1.5, 1025, 'Look at this\nHUGE\ndifference!', 
         ha='center', fontsize=11, color='red', fontweight='bold')

# HONEST VERSION - Full Y-axis starting at zero
ax2.bar(crops, yields, color=['steelblue', 'steelblue'], alpha=0.7, edgecolor='black', linewidth=2)
ax2.set_ylim(0, 1200)  # Starts at zero - honest scale
ax2.set_ylabel('Yield (kg/hectare)', fontsize=12)
ax2.set_title('‚úÖ HONEST: "Crop B Yields 5% More"\n(Full Y-axis shows true relative difference)', 
              fontsize=12, fontweight='bold', color='darkgreen')
ax2.grid(True, alpha=0.3, axis='y')

# Add honest annotation
ax2.text(1.5, 600, 'Actually, the\ndifference is\nquite small', 
         ha='center', fontsize=11, color='darkgreen', fontweight='bold')

plt.tight_layout()
plt.show()

print("\n‚ö†Ô∏è  The Danger of Truncated Axes:")
print(f"- Actual difference: {yields[1] - yields[0]} kg/hectare")
print(f"- Percentage difference: {((yields[1]/yields[0]) - 1) * 100:.1f}%")
print(f"- But left graph makes it look 5-10x more dramatic!")
print("\nüí° Rule: For bar charts, usually start Y-axis at zero")
print("   (Exception: Time series can have focused Y-range if clearly labeled)")

In [None]:
# Another Example: Cherry-Picking Time Periods

# Simulate a volatile stock or commodity price
np.random.seed(123)
time = np.arange(0, 100)
# Create price with overall growth but volatility
price = 100 + 0.5 * time + 10 * np.sin(time/8) + np.random.normal(0, 3, 100)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# MISLEADING VERSION - Cherry-picked period showing only growth
cherry_pick_start = 20
cherry_pick_end = 40
ax1.plot(time[cherry_pick_start:cherry_pick_end], 
         price[cherry_pick_start:cherry_pick_end], 
         color='green', linewidth=3, marker='o', markersize=5)
ax1.fill_between(time[cherry_pick_start:cherry_pick_end], 
                  price[cherry_pick_start:cherry_pick_end], 
                  alpha=0.3, color='green')
ax1.set_xlabel('Time Period', fontsize=12)
ax1.set_ylabel('Price', fontsize=12)
ax1.set_title('‚ùå MISLEADING: "Steady Growth!"\n(Cherry-picked period showing only upward trend)', 
              fontsize=12, fontweight='bold', color='darkred')
ax1.grid(True, alpha=0.3)
ax1.text(30, 108, '"Invest now!\nPure growth!"', 
         fontsize=11, color='green', fontweight='bold',
         bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.7))

# HONEST VERSION - Full time series showing volatility
ax2.plot(time, price, color='steelblue', linewidth=2, alpha=0.7)
# Highlight the cherry-picked region
ax2.axvspan(cherry_pick_start, cherry_pick_end, 
            alpha=0.3, color='yellow', label='Cherry-picked period')
ax2.plot(time[cherry_pick_start:cherry_pick_end], 
         price[cherry_pick_start:cherry_pick_end], 
         color='red', linewidth=3, marker='o', markersize=4)
ax2.set_xlabel('Time Period', fontsize=12)
ax2.set_ylabel('Price', fontsize=12)
ax2.set_title('‚úÖ HONEST: "Growth with Volatility"\n(Full timeline shows ups AND downs)', 
              fontsize=12, fontweight='bold', color='darkgreen')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.text(70, 120, 'Reality: volatile\nwith overall trend', 
         fontsize=11, color='darkgreen', fontweight='bold',
         bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))

plt.tight_layout()
plt.show()

print("\n‚ö†Ô∏è  The Danger of Cherry-Picking:")
print("- Left graph shows only periods 20-40 (growth phase)")
print("- Right graph shows full data (0-100) with volatility visible")
print("- Same data source, completely different impression!")
print("\nüí° Rule: Always show sufficient historical context")
print("   Don't hide inconvenient data points")

### Professor Mishra's Ethics Lesson

*"You've seen how easy it is to mislead with graphs. Truncate an axis, cherry-pick a time period, use 3D effects‚Äîsuddenly a 5% difference looks like 50%. This isn't just dishonest. It's dangerous."*

*"Insurance companies do this. Politicians do this. Companies do this. Your job, as pattern seekers, is to spot it. And more importantly, never do it yourselves."*

**Ananya's pledge:** *"I promise. Clarity first. Honesty always."*

---

## Part 7: Iterative Design - The Flood Risk Map

### Ananya's Real Challenge: Communicating to the Village Panchayat

The village panchayat needed a flood risk map for development planning. Ananya had the data and the model, but creating a visualization that village elders could understand and use required iteration.

Let's recreate her three attempts:

In [None]:
# Generate synthetic flood risk data for villages
np.random.seed(42)

# Village locations (simplified)
villages = [
    'Barpali', 'Sohela', 'Jamankira', 'Maneswar', 'Bhatli',
    'Dhankauda', 'Katarbaga', 'Naktideul', 'Budharaja', 'Ainthapali'
]

# Generate flood risk scores based on elevation, distance to river, historical data
# Low score = high risk, High score = low risk (for realism)
risk_scores = np.random.uniform(2, 9, len(villages))
# Make a few villages clearly high-risk
risk_scores[1] = 2.1  # Sohela - very high risk
risk_scores[4] = 2.5  # Bhatli - very high risk
risk_scores[7] = 3.2  # Naktideul - high risk

# Population
population = np.random.randint(800, 5000, len(villages))

df_villages = pd.DataFrame({
    'Village': villages,
    'Risk_Score': risk_scores,
    'Population': population
})

# Calculate risk categories
df_villages['Risk_Category'] = pd.cut(df_villages['Risk_Score'], 
                                       bins=[0, 3, 5, 7, 10],
                                       labels=['Very High Risk', 'High Risk', 
                                              'Moderate Risk', 'Low Risk'])

print("Village flood risk data:")
print(df_villages.sort_values('Risk_Score'))
print(f"\nüö® Villages at Very High/High Risk: {len(df_villages[df_villages['Risk_Score'] < 5])}")
print(f"üë• Population in high-risk zones: {df_villages[df_villages['Risk_Score'] < 5]['Population'].sum():,}")

In [None]:
# ATTEMPT 1: Too Technical (Failed)

fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Attempt 1: Technical scatter plot
ax1 = axes[0]
scatter = ax1.scatter(df_villages['Risk_Score'], df_villages['Population'], 
                      c=df_villages['Risk_Score'], cmap='RdYlGn', 
                      s=200, alpha=0.6, edgecolors='black')
ax1.set_xlabel('Flood Risk Score', fontsize=11)
ax1.set_ylabel('Population', fontsize=11)
ax1.set_title('Attempt 1: Too Technical\n(Village elders: "What is this?")', 
              fontsize=12, fontweight='bold')
ax1.grid(True, alpha=0.3)
plt.colorbar(scatter, ax=ax1, label='Risk Score')
ax1.text(5, 4500, '‚ùå Problem:\nNo village names\nAbstract axes\nNot actionable', 
         fontsize=10, bbox=dict(boxstyle='round', facecolor='pink', alpha=0.7))

# Attempt 2: Better but still not ideal
ax2 = axes[1]
colors = ['darkred' if x < 3 else 'orange' if x < 5 else 'yellow' if x < 7 else 'green' 
          for x in df_villages['Risk_Score']]
bars = ax2.barh(df_villages['Village'], df_villages['Risk_Score'], 
                color=colors, alpha=0.7, edgecolor='black')
ax2.set_xlabel('Risk Score (Lower = More Dangerous)', fontsize=11)
ax2.set_title('Attempt 2: Better\n(But still confusing: lower = worse?)', 
              fontsize=12, fontweight='bold')
ax2.invert_yaxis()
ax2.grid(True, alpha=0.3, axis='x')
ax2.text(7, 2, '‚ö†Ô∏è  Problem:\nCounter-intuitive\nLeft = danger\nNot memorable', 
         fontsize=10, bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.7))

# Attempt 3: THE WINNING VERSION - Simple, Clear, Actionable
ax3 = axes[2]
df_sorted = df_villages.sort_values('Risk_Score')

# Color by category
color_map = {'Very High Risk': 'darkred', 'High Risk': 'orange', 
             'Moderate Risk': 'gold', 'Low Risk': 'green'}
colors_final = [color_map[cat] for cat in df_sorted['Risk_Category']]

bars = ax3.barh(df_sorted['Village'], 
                [10 - x for x in df_sorted['Risk_Score']],  # Invert so higher = more risk
                color=colors_final, alpha=0.8, edgecolor='black', linewidth=1.5)

ax3.set_xlabel('Flood Danger Level', fontsize=11)
ax3.set_title('Attempt 3: SUCCESS! ‚úÖ\n"Simple, Clear, Actionable"', 
              fontsize=12, fontweight='bold', color='darkgreen')
ax3.invert_yaxis()
ax3.set_xlim(0, 10)
ax3.grid(True, alpha=0.3, axis='x')

# Add risk zone labels
ax3.axvspan(0, 5, alpha=0.1, color='green', label='Lower Risk')
ax3.axvspan(5, 7, alpha=0.1, color='yellow', label='Moderate Risk')
ax3.axvspan(7, 10, alpha=0.1, color='red', label='HIGH RISK')

# Add population annotations for high-risk villages
for i, row in df_sorted[df_sorted['Risk_Score'] < 5].iterrows():
    idx = list(df_sorted['Village']).index(row['Village'])
    ax3.text(10 - row['Risk_Score'] + 0.3, idx, 
             f"{row['Population']:,} people", 
             fontsize=9, va='center')

ax3.text(2, 8, '‚úÖ Success:\n‚Ä¢ Village names clear\n‚Ä¢ Red = danger (intuitive)\n‚Ä¢ Population shown\n‚Ä¢ Actionable priorities', 
         fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.7))

plt.tight_layout()
plt.show()

print("\nüìä The Iterative Design Process:")
print("\nAttempt 1: Too abstract - no one could use it")
print("Attempt 2: Better but counter-intuitive scale")
print("Attempt 3: Clear, actionable, memorable")
print("\nüí° Key improvements in final version:")
print("  ‚úÖ Village names prominently displayed")
print("  ‚úÖ Intuitive color coding (red = danger)")
print("  ‚úÖ Sorted by risk (most urgent at top)")
print("  ‚úÖ Population data for prioritization")
print("  ‚úÖ Simple enough for village elders to understand at a glance")

### The Village Panchayat Presentation

When Ananya presented the final map to the village panchayat, speaking in Odia:

*"Sarpanch saheb, ye dekho. Lal rang matlab bahut khatre mein. (Sir, look here. Red means great danger.) Sohela aur Bhatli ke gaon sabse zyada risk mein hain. (Sohela and Bhatli villages are at highest risk.)"*

The sarpanch studied the chart, then nodded: *"Ye samajh mein aa gaya. Hum in gaon ke liye pehle planning karenge. (This is clear. We'll plan for these villages first.)"*

**Professor Mishra later explained:** *"That's the power of good visualization. You didn't need to explain confidence intervals or p-values. One glance, and they understood the priority. That's clarity. That's success."*

---

## Part 8: Try This! - Your Turn to Visualize

### Exercise 1: Compare Two Distributions

You have rainfall data for two regions. Create visualizations to compare them and answer: Which region is riskier for farming?

In [None]:
# Exercise Data: Two regions with different rainfall patterns
np.random.seed(100)

# Region A: High mean, low variability (stable)
region_a_rainfall = np.random.normal(1200, 150, 50)

# Region B: Similar mean, high variability (risky)
region_b_rainfall = np.random.normal(1200, 350, 50)

# Create your visualizations here!
# Suggestions:
# 1. Overlapping histograms
# 2. Box plots side by side
# 3. Violin plots

# TODO: Your code here!
# Hint: Use the patterns from earlier in this notebook

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))

# 1. Overlapping Histograms
ax1.hist(region_a_rainfall, bins=15, alpha=0.6, color='steelblue', 
         label='Region A', edgecolor='navy')
ax1.hist(region_b_rainfall, bins=15, alpha=0.6, color='coral', 
         label='Region B', edgecolor='darkred')
ax1.set_xlabel('Annual Rainfall (mm)')
ax1.set_ylabel('Frequency')
ax1.set_title('Histogram Comparison')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Box Plots
bp = ax2.boxplot([region_a_rainfall, region_b_rainfall],
                  labels=['Region A', 'Region B'],
                  patch_artist=True,
                  medianprops=dict(color='darkred', linewidth=2))
bp['boxes'][0].set_facecolor('steelblue')
bp['boxes'][1].set_facecolor('coral')
ax2.set_ylabel('Annual Rainfall (mm)')
ax2.set_title('Box Plot Comparison')
ax2.grid(True, alpha=0.3, axis='y')

# 3. Statistics Table as Bar Chart
metrics = ['Mean', 'Std Dev', 'CV%']
region_a_stats = [region_a_rainfall.mean(), region_a_rainfall.std(), 
                  (region_a_rainfall.std()/region_a_rainfall.mean())*100]
region_b_stats = [region_b_rainfall.mean(), region_b_rainfall.std(), 
                  (region_b_rainfall.std()/region_b_rainfall.mean())*100]

x = np.arange(len(metrics))
width = 0.35
ax3.bar(x - width/2, region_a_stats, width, label='Region A', 
        color='steelblue', alpha=0.8)
ax3.bar(x + width/2, region_b_stats, width, label='Region B', 
        color='coral', alpha=0.8)
ax3.set_ylabel('Value')
ax3.set_title('Statistics Comparison')
ax3.set_xticks(x)
ax3.set_xticklabels(metrics)
ax3.legend()
ax3.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nüìä What the data tells us:")
print(f"\nRegion A: Mean = {region_a_rainfall.mean():.0f} mm, Std = {region_a_rainfall.std():.0f} mm")
print(f"Region B: Mean = {region_b_rainfall.mean():.0f} mm, Std = {region_b_rainfall.std():.0f} mm")
print(f"\nCoefficient of Variation:")
print(f"Region A: {(region_a_rainfall.std()/region_a_rainfall.mean())*100:.1f}% (more predictable)")
print(f"Region B: {(region_b_rainfall.std()/region_b_rainfall.mean())*100:.1f}% (more variable)")
print("\nüí° Conclusion: Region B is riskier due to high variability!")

### Exercise 2: Spot the Deception

Below are three graphs. Two are misleading, one is honest. Can you identify which and why?

In [None]:
# Exercise: Spot the Deception

# Scenario: Company revenue over 5 years
years = [2019, 2020, 2021, 2022, 2023]
revenue = [100, 102, 105, 108, 110]  # Slow steady growth

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))

# Graph A: Truncated Y-axis
ax1.plot(years, revenue, marker='o', linewidth=3, markersize=10, color='green')
ax1.fill_between(years, revenue, alpha=0.3, color='green')
ax1.set_ylim(95, 115)
ax1.set_xlabel('Year')
ax1.set_ylabel('Revenue ($ millions)')
ax1.set_title('Graph A: "Explosive Growth!"')
ax1.grid(True, alpha=0.3)

# Graph B: Honest scale
ax2.plot(years, revenue, marker='o', linewidth=2, markersize=8, color='steelblue')
ax2.set_ylim(0, 130)
ax2.set_xlabel('Year')
ax2.set_ylabel('Revenue ($ millions)')
ax2.set_title('Graph B: "Steady Progress"')
ax2.grid(True, alpha=0.3)

# Graph C: Misleading 3D effect (simulated)
colors_3d = plt.cm.Greens(np.linspace(0.4, 0.9, len(years)))
bars = ax3.bar(years, revenue, color=colors_3d, edgecolor='black', linewidth=2)
# Add fake 3D perspective
for i, bar in enumerate(bars):
    height = bar.get_height()
    # Add pseudo-3D top
    ax3.plot([years[i]-0.3, years[i]+0.3], [height, height], 
             color='darkgreen', linewidth=4)
    # Add pseudo-3D side
    ax3.plot([years[i]+0.3, years[i]+0.3], [0, height], 
             color='darkgreen', linewidth=2, alpha=0.5)
ax3.set_ylim(95, 115)
ax3.set_xlabel('Year')
ax3.set_ylabel('Revenue ($ millions)')
ax3.set_title('Graph C: "Dynamic Growth!" (3D effect)')
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚ùì YOUR TASK:")
print("Which graph is honest? Which ones are misleading? Why?")
print("\nThink about:")
print("- Y-axis ranges")
print("- Visual effects")
print("- What impression each gives")
print("\n--- ANSWER BELOW (don't peek first!) ---\n")
print("Graph A: ‚ùå MISLEADING - Truncated Y-axis makes 10% growth look huge")
print("Graph B: ‚úÖ HONEST - Full Y-axis shows true scale of growth")
print("Graph C: ‚ùå MISLEADING - 3D effect + truncated axis = double deception")

---

## Part 9: Real-World Application - Kabir and Priya's Projects

### Kabir's Cricket Analytics Dashboard

Inspired by Ananya's work, Kabir created a dashboard comparing IPL player performances.

In [None]:
# Kabir's IPL Player Comparison Dashboard

# Synthetic player data
players = ['Player A', 'Player B', 'Player C', 'Player D']
batting_avg = [45.2, 38.5, 52.1, 41.8]
strike_rate = [135, 142, 128, 155]
consistency = [0.15, 0.25, 0.12, 0.30]  # Lower = more consistent (CV)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Batting Average
ax1 = axes[0, 0]
bars1 = ax1.barh(players, batting_avg, color='steelblue', alpha=0.7, edgecolor='navy')
ax1.set_xlabel('Batting Average')
ax1.set_title('Batting Average Comparison', fontweight='bold')
ax1.grid(True, alpha=0.3, axis='x')
for i, v in enumerate(batting_avg):
    ax1.text(v + 1, i, f"{v:.1f}", va='center', fontsize=10)

# 2. Strike Rate
ax2 = axes[0, 1]
bars2 = ax2.barh(players, strike_rate, color='coral', alpha=0.7, edgecolor='darkred')
ax2.set_xlabel('Strike Rate')
ax2.set_title('Strike Rate Comparison', fontweight='bold')
ax2.grid(True, alpha=0.3, axis='x')
for i, v in enumerate(strike_rate):
    ax2.text(v + 2, i, f"{v}", va='center', fontsize=10)

# 3. Consistency (CV)
ax3 = axes[1, 0]
colors_consistency = ['green' if x < 0.2 else 'orange' if x < 0.25 else 'red' 
                      for x in consistency]
bars3 = ax3.barh(players, consistency, color=colors_consistency, alpha=0.7, edgecolor='black')
ax3.set_xlabel('Coefficient of Variation (lower = more consistent)')
ax3.set_title('Consistency Comparison', fontweight='bold')
ax3.grid(True, alpha=0.3, axis='x')
for i, v in enumerate(consistency):
    ax3.text(v + 0.01, i, f"{v:.2f}", va='center', fontsize=10)

# 4. Combined Score (normalized)
ax4 = axes[1, 1]
# Normalize and combine (higher is better)
norm_avg = np.array(batting_avg) / max(batting_avg)
norm_sr = np.array(strike_rate) / max(strike_rate)
norm_cons = 1 - (np.array(consistency) / max(consistency))  # Invert so higher = better
combined_score = (norm_avg + norm_sr + norm_cons) / 3 * 100

bars4 = ax4.barh(players, combined_score, color='gold', alpha=0.7, edgecolor='darkorange')
ax4.set_xlabel('Combined Performance Score (0-100)')
ax4.set_title('Overall Rating', fontweight='bold')
ax4.grid(True, alpha=0.3, axis='x')
for i, v in enumerate(combined_score):
    ax4.text(v + 1, i, f"{v:.1f}", va='center', fontsize=10)

plt.suptitle('Kabir\'s IPL Player Analytics Dashboard', 
             fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

print("\nüèè Kabir's Insight:")
print("\nBest batsman (average): Player C")
print("Most aggressive (strike rate): Player D")
print("Most consistent: Player C")
print(f"\nOverall best performer: {players[np.argmax(combined_score)]} (score: {max(combined_score):.1f})")
print("\nüí° Kabir realized: Multiple metrics needed - no single stat tells full story!")

### Priya's Medical Statistics Project

Priya used visualization to communicate vaccine effectiveness to her biology class.

In [None]:
# Priya's Vaccine Effectiveness Visualization

# Scenario: Disease incidence before and after vaccination campaign
years_medical = np.arange(2010, 2024)
vaccination_start = 2017

# Cases before vaccination (high and variable)
cases_before = np.random.poisson(500, vaccination_start - 2010)

# Cases after vaccination (declining trend)
years_after = 2024 - vaccination_start
cases_after = []
for i in range(years_after):
    # Exponential decline with noise
    expected = 500 * np.exp(-0.3 * i)
    cases_after.append(max(np.random.poisson(expected), 10))

cases_all = np.concatenate([cases_before, cases_after])

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Left: Time series with intervention marked
ax1.plot(years_medical, cases_all, marker='o', linewidth=2, 
         markersize=8, color='steelblue', label='Reported Cases')

# Mark vaccination campaign start
ax1.axvline(vaccination_start, color='green', linestyle='--', 
            linewidth=3, label='Vaccination Campaign Started', alpha=0.7)
ax1.axvspan(2010, vaccination_start, alpha=0.2, color='red', label='Before Vaccination')
ax1.axvspan(vaccination_start, 2023, alpha=0.2, color='green', label='After Vaccination')

ax1.set_xlabel('Year', fontsize=12)
ax1.set_ylabel('Disease Cases', fontsize=12)
ax1.set_title('Impact of Vaccination Campaign Over Time', fontsize=13, fontweight='bold')
ax1.legend(loc='upper right', fontsize=10)
ax1.grid(True, alpha=0.3)

# Add annotations
ax1.annotate('High case load\nbefore intervention', 
             xy=(2014, 500), xytext=(2012, 700),
             fontsize=10, 
             bbox=dict(boxstyle='round', facecolor='pink', alpha=0.7),
             arrowprops=dict(arrowstyle='->', color='red', lw=2))

ax1.annotate('Dramatic decline\nafter vaccination', 
             xy=(2020, 100), xytext=(2019, 300),
             fontsize=10, 
             bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.7),
             arrowprops=dict(arrowstyle='->', color='green', lw=2))

# Right: Before/After comparison
ax2.bar(['Before\nVaccination\n(2010-2016)', 'After\nVaccination\n(2017-2023)'],
        [cases_before.mean(), np.array(cases_after).mean()],
        color=['red', 'green'], alpha=0.7, edgecolor='black', linewidth=2)

ax2.set_ylabel('Average Annual Cases', fontsize=12)
ax2.set_title('Before vs. After: Average Impact', fontsize=13, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')

# Add percentage reduction
reduction = (1 - np.array(cases_after).mean() / cases_before.mean()) * 100
ax2.text(1, np.array(cases_after).mean() + 50, 
         f"{reduction:.0f}% reduction!", 
         fontsize=12, fontweight='bold', color='green',
         bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8))

# Add values on bars
ax2.text(0, cases_before.mean() + 20, f"{cases_before.mean():.0f} cases", 
         ha='center', fontsize=11, fontweight='bold')
ax2.text(1, np.array(cases_after).mean() + 20, f"{np.array(cases_after).mean():.0f} cases", 
         ha='center', fontsize=11, fontweight='bold')

plt.suptitle('Priya\'s Medical Statistics: Vaccine Effectiveness Visualization', 
             fontsize=16, fontweight='bold', y=0.98)
plt.tight_layout()
plt.show()

print("\nüíâ Priya's Presentation to Biology Class:")
print(f"\nAverage cases BEFORE vaccination: {cases_before.mean():.0f} per year")
print(f"Average cases AFTER vaccination: {np.array(cases_after).mean():.0f} per year")
print(f"\nReduction: {reduction:.1f}%")
print(f"Lives potentially saved: {(cases_before.mean() - np.array(cases_after).mean()) * 7:.0f} over 7 years")
print("\nüéØ Priya's Conclusion: 'One graph can show what a thousand words cannot.'")

---

## Chapter Summary: What We Learned

### Historical Lessons
1. **John Snow's cholera map (1854)** - Spatial visualization can reveal causal patterns
2. **Florence Nightingale's rose diagrams (1858)** - Visual rhetoric can change policy

### The Four Principles
1. **CLARITY** - Can your grandmother understand it?
2. **HONESTY** - Does it tell the truth?
3. **EFFICIENCY** - Does it use ink wisely?
4. **BEAUTY** - Does it invite exploration?

### Key Insights
- Same data can tell different stories depending on visualization choice
- Choose visualization to answer YOUR specific question
- Visualizations can clarify OR deceive - use power responsibly
- Iterative design improves communication
- Good visualization serves the audience, not the creator

### Skills Practiced
‚úÖ Creating multiple visualization types  
‚úÖ Recognizing misleading graphics  
‚úÖ Iterative design for clarity  
‚úÖ Communicating to non-technical audiences  
‚úÖ Ethical visualization practices  

---

### Professor Mishra's Final Words

*"You've learned to see patterns in data. You've learned to build models. Now you can make the invisible visible. This is power. Use it wisely. Make graphs that clarify, not confuse. That reveal, not conceal. That serve truth, not agenda."*

*"And remember: The best visualization is the one your audience can understand and act upon. Always ask: Does this help or does this hide?"*

---

## References and Further Reading

### Books
- Tufte, E. R. (2001). *The Visual Display of Quantitative Information* (2nd ed.). Graphics Press.
- Cairo, A. (2013). *The Functional Art: An Introduction to Information Graphics and Visualization*. New Riders.
- Huff, D. (1954). *How to Lie with Statistics*. W. W. Norton & Company.
- Few, S. (2012). *Show Me the Numbers: Designing Tables and Graphs to Enlighten* (2nd ed.). Analytics Press.

### Historical References
- Snow, J. (1855). *On the Mode of Communication of Cholera* (2nd ed.). John Churchill.
- Nightingale, F. (1858). *Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army*. Harrison and Sons.

### Online Resources
- [D3.js Gallery](https://observablehq.com/@d3/gallery) - Interactive visualizations
- [Information is Beautiful](https://informationisbeautiful.net/) - Award-winning data viz
- [FlowingData](https://flowingdata.com/) - Statistics and visualization blog

---

## What's Next?

**Chapter 12: The Pattern Seekers**

One year after Uncle Bikram's insurance denial, Ananya reflects on how her thinking has changed. She's not just learned probability and statistics‚Äîshe's become a pattern seeker. 

The final chapter explores:
- What changes when you think probabilistically?
- How do you balance understanding with exam preparation?
- Where does the journey go from here?

*"What's changed in you? Not what you've learned, but how you think?"* - Professor Mishra

---

**END OF CHAPTER 11 NOTEBOOK**