# Bills of Mortality - Parish and Geographic Analysis

This notebook focuses on spatial and parish-level patterns in the Bills of Mortality data, including:
- Parish-level mortality patterns
- Comparative analysis across parishes
- Geographic clustering and patterns
- Urban vs. suburban differences (if identifiable)
- Parish vulnerability and resilience

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
from scipy import stats
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (15, 8)
warnings.filterwarnings('ignore')

# Data directory
DATA_DIR = Path('../data')

print("🏛️ Bills of Mortality - Parish and Geographic Analysis")
print("="*55)

🏛️ Bills of Mortality - Parish and Geographic Analysis


## Load and Prepare Data

In [5]:
# Load datasets
bills = pd.read_csv(DATA_DIR / 'all_bills.csv')
parishes = pd.read_csv(DATA_DIR / 'parishes.csv')
years = pd.read_csv(DATA_DIR / 'years.csv')

print(f"📊 Loaded {len(bills):,} bill records")
print(f"🏛️ {len(parishes)} parishes in dataset")
print(f"📅 Time range: {bills['year'].min()} - {bills['year'].max()}")

# Create parish name mapping
parish_map = parishes.set_index('id')['parish_name'].to_dict()
bills['parish_name'] = bills['parish_id'].map(parish_map)

# Calculate parish-level statistics
parish_stats = bills.groupby(['parish_id', 'parish_name']).agg({
    'count': ['sum', 'mean', 'std', 'count'],
    'year': ['min', 'max'],
    'week_id': 'nunique'
}).round(2)

# Flatten column names
parish_stats.columns = ['_'.join(col).strip() for col in parish_stats.columns]
parish_stats = parish_stats.reset_index()

# Calculate additional metrics
parish_stats['total_deaths'] = parish_stats['count_sum']
parish_stats['avg_deaths_per_record'] = parish_stats['count_mean']
parish_stats['years_active'] = parish_stats['year_max'] - parish_stats['year_min'] + 1
parish_stats['records_per_year'] = parish_stats['count_count'] / parish_stats['years_active']
parish_stats['weeks_covered'] = parish_stats['week_id_nunique']

print(f"\n✓ Parish statistics calculated for {len(parish_stats)} parishes")

📊 Loaded 1,292,566 bill records
🏛️ 156 parishes in dataset
📅 Time range: 1635 - 1752

✓ Parish statistics calculated for 149 parishes


## Parish Mortality Overview

In [3]:
# Parish mortality rankings
print("🏆 Top 10 Parishes by Total Deaths:")
top_parishes = parish_stats.nlargest(10, 'total_deaths')[['parish_name', 'total_deaths', 'count_count', 'years_active']]
for i, (_, parish) in enumerate(top_parishes.iterrows(), 1):
    print(f"  {i:2d}. {parish['parish_name'][:40]:40} - {parish['total_deaths']:,} deaths ({parish['count_count']:,} records, {parish['years_active']} years)")

print(f"\n📊 Parish Statistics Summary:")
print(f"Total deaths across all parishes: {parish_stats['total_deaths'].sum():,}")
print(f"Average deaths per parish: {parish_stats['total_deaths'].mean():.0f}")
print(f"Median deaths per parish: {parish_stats['total_deaths'].median():.0f}")
print(f"Most active parish: {parish_stats.loc[parish_stats['count_count'].idxmax(), 'parish_name']} ({parish_stats['count_count'].max():,} records)")
print(f"Longest active parish: {parish_stats.loc[parish_stats['years_active'].idxmax(), 'parish_name']} ({parish_stats['years_active'].max()} years)")

🏆 Top 10 Parishes by Total Deaths:
   1. Stepney Parish                           - 209,163 deaths (8,215 records, 94 years)
   2. St Martin In The Fields                  - 189,107 deaths (9,391 records, 118 years)
   3. St Giles Cripplegate                     - 161,180 deaths (9,391 records, 118 years)
   4. St Giles In The Field                    - 157,313 deaths (9,391 records, 118 years)
   5. St Andrew Holborn                        - 131,962 deaths (9,391 records, 118 years)
   6. St Mary Whitechappel                     - 127,960 deaths (9,391 records, 118 years)
   7. St Margaret Westminster                  - 119,899 deaths (8,215 records, 94 years)
   8. St James In Westminster                  - 105,481 deaths (6,125 records, 91 years)
   9. St Botolph Aldgate                       - 102,058 deaths (9,391 records, 118 years)
  10. St Olave Southwark                       - 95,940 deaths (9,391 records, 118 years)

📊 Parish Statistics Summary:
Total deaths across all paris

## Distribution Visualizations

In [None]:
# Distribution plots
fig, axes = plt.subplots(3, 2, figsize=(16, 18))
fig.suptitle('Parish Mortality Distributions', fontsize=16, fontweight='bold')

# Total deaths distribution (log scale)
axes[0,0].hist(parish_stats['total_deaths'], bins=30, alpha=0.7, color='darkred', edgecolor='black')
axes[0,0].set_xlabel('Total Deaths')
axes[0,0].set_ylabel('Number of Parishes')
axes[0,0].set_title('Distribution of Total Deaths per Parish')
axes[0,0].set_yscale('log')

# Average deaths per record
axes[0,1].hist(parish_stats['avg_deaths_per_record'], bins=30, alpha=0.7, color='steelblue', edgecolor='black')
axes[0,1].set_xlabel('Average Deaths per Record')
axes[0,1].set_ylabel('Number of Parishes')
axes[0,1].set_title('Distribution of Average Deaths per Record')

# Years active
axes[1,0].hist(parish_stats['years_active'], bins=20, alpha=0.7, color='green', edgecolor='black')
axes[1,0].set_xlabel('Years Active')
axes[1,0].set_ylabel('Number of Parishes')
axes[1,0].set_title('Distribution of Years Active')

# Records per year (data completeness)
axes[1,1].hist(parish_stats['records_per_year'], bins=25, alpha=0.7, color='orange', edgecolor='black')
axes[1,1].set_xlabel('Records per Year')
axes[1,1].set_ylabel('Number of Parishes')
axes[1,1].set_title('Distribution of Records per Year (Completeness)')

# Buried vs Plague: Stacked bar chart for top parishes
top_15_parishes = count_type_by_parish.head(15)
x_pos = range(len(top_15_parishes))
axes[2,0].bar(x_pos, top_15_parishes['buried'], label='Buried', alpha=0.8, color='steelblue')
axes[2,0].bar(x_pos, top_15_parishes['plague'], bottom=top_15_parishes['buried'], 
              label='Plague', alpha=0.8, color='darkred')
axes[2,0].set_xlabel('Parish')
axes[2,0].set_ylabel('Total Deaths')
axes[2,0].set_title('Top 15 Parishes: Buried vs Plague Deaths')
axes[2,0].set_xticks(x_pos)
axes[2,0].set_xticklabels([name[:15] + '...' if len(name) > 15 else name 
                          for name in top_15_parishes['parish_name']], 
                         rotation=45, ha='right')
axes[2,0].legend()

# Plague percentage distribution (only for parishes with plague deaths)
plague_affected_for_hist = count_type_by_parish[count_type_by_parish['plague'] > 0]
if len(plague_affected_for_hist) > 0:
    axes[2,1].hist(plague_affected_for_hist['plague_pct'], bins=20, alpha=0.7, 
                   color='darkred', edgecolor='black')
    axes[2,1].set_xlabel('Plague Percentage (%)')
    axes[2,1].set_ylabel('Number of Parishes')
    axes[2,1].set_title(f'Distribution of Plague Percentage\n(Among {len(plague_affected_for_hist)} parishes with plague deaths)')
    axes[2,1].axvline(x=plague_affected_for_hist['plague_pct'].mean(), color='yellow', 
                      linestyle='--', linewidth=2, label=f'Mean: {plague_affected_for_hist["plague_pct"].mean():.1f}%')
    axes[2,1].legend()
else:
    axes[2,1].text(0.5, 0.5, 'No parishes with\nplague deaths found', 
                   transform=axes[2,1].transAxes, ha='center', va='center', fontsize=12)
    axes[2,1].set_title('Plague Percentage Distribution')

plt.tight_layout()
plt.show()

# Additional visualization: Buried vs Plague scatter plot
plt.figure(figsize=(12, 8))
plt.scatter(count_type_by_parish['buried'], count_type_by_parish['plague'], 
           alpha=0.6, s=60, c=count_type_by_parish['plague_pct'], cmap='Reds')
plt.xlabel('Buried Deaths')
plt.ylabel('Plague Deaths')
plt.title('Parish Mortality: Buried vs Plague Deaths')
plt.grid(True, alpha=0.3)

# Add colorbar
cbar = plt.colorbar()
cbar.set_label('Plague Percentage (%)')

# Add diagonal line for reference (equal buried and plague)
max_val = max(count_type_by_parish['buried'].max(), count_type_by_parish['plague'].max())
plt.plot([0, max_val], [0, max_val], 'k--', alpha=0.5, label='Equal buried/plague line')

# Annotate some interesting parishes
for i, (_, parish) in enumerate(count_type_by_parish.head(5).iterrows()):
    plt.annotate(parish['parish_name'][:20], 
                (parish['buried'], parish['plague']),
                xytext=(5, 5), textcoords='offset points', fontsize=8, alpha=0.8)

plt.legend()
plt.tight_layout()
plt.show()

print("📊 Visualization Notes:")
print("  • Stacked bar shows buried (blue) and plague (red) deaths for top parishes")
print("  • Histogram shows distribution of plague percentages among affected parishes")
print("  • Scatter plot reveals relationship between buried and plague deaths")
print("  • Color intensity in scatter plot indicates plague percentage")

## Summary and Insights

This notebook provides parish-level analysis of the Bills of Mortality data. Key areas for exploration:

1. **Parish Rankings**: Identification of highest mortality parishes
2. **Distribution Analysis**: Understanding the spread of mortality across parishes
3. **Temporal Patterns**: How parish activity varies over time
4. **Data Quality**: Assessment of completeness by parish

### Next Steps:
- Add clustering analysis for parish groupings
- Implement vulnerability assessment during crisis periods
- Create correlation analysis between parishes
- Add geographic mapping if coordinates available