# üîç Aadhaar Data Analysis - Novel EDA
## Discovering Unique Identity Lifecycle Patterns at Micro-Geographic Level

**Datasets:**
- Biometric Updates (1.86M records)
- Demographic Updates (2.07M records)  
- Enrolment Data (1M records)

**Novel Approach:** Analyzing identity system behavior at pincode granularity to discover:
1. Identity Lifecycle Velocity (how fast identity data changes)
2. Biometric Stress Patterns (regions with high biometric update needs)
3. Demographic Volatility Index (address/name change patterns)
4. Age-cohort specific identity behaviors


In [None]:
# Core Imports
import pandas as pd
import numpy as np
import glob
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)


## 1. Data Loading & Initial Exploration


In [None]:
# Load all datasets
print("Loading datasets...")

# Biometric data
bio_files = glob.glob('data/api_data_aadhar_biometric/*.csv')
bio_df = pd.concat([pd.read_csv(f) for f in bio_files], ignore_index=True)

# Demographic data
demo_files = glob.glob('data/api_data_aadhar_demographic/*.csv')
demo_df = pd.concat([pd.read_csv(f) for f in demo_files], ignore_index=True)

# Enrolment data
enrol_files = glob.glob('data/api_data_aadhar_enrolment/*.csv')
enrol_df = pd.concat([pd.read_csv(f) for f in enrol_files], ignore_index=True)

print(f"Biometric: {bio_df.shape}")
print(f"Demographic: {demo_df.shape}")
print(f"Enrolment: {enrol_df.shape}")


In [None]:
# Display sample data
print("\n=== BIOMETRIC DATA ===")
display(bio_df.head())

print("\n=== DEMOGRAPHIC DATA ===")
display(demo_df.head())

print("\n=== ENROLMENT DATA ===")
display(enrol_df.head())


## 2. Data Quality Analysis & Cleaning


In [None]:
# State name standardization mapping
STATE_MAPPING = {
    # West Bengal variations
    'WEST BENGAL': 'West Bengal', 'WESTBENGAL': 'West Bengal', 'West  Bengal': 'West Bengal',
    'West Bangal': 'West Bengal', 'West Bengli': 'West Bengal', 'Westbengal': 'West Bengal',
    'west Bengal': 'West Bengal',
    # Odisha variations
    'ODISHA': 'Odisha', 'Orissa': 'Odisha', 'odisha': 'Odisha',
    # Others
    'andhra pradesh': 'Andhra Pradesh', 'Tamilnadu': 'Tamil Nadu',
    'Jammu & Kashmir': 'Jammu and Kashmir', 'Jammu And Kashmir': 'Jammu and Kashmir',
    'Chhatisgarh': 'Chhattisgarh', 'Uttaranchal': 'Uttarakhand', 'Pondicherry': 'Puducherry',
    'Andaman & Nicobar Islands': 'Andaman and Nicobar Islands',
    'Dadra & Nagar Haveli': 'Dadra and Nagar Haveli and Daman and Diu',
    'Dadra and Nagar Haveli': 'Dadra and Nagar Haveli and Daman and Diu',
    'Daman & Diu': 'Dadra and Nagar Haveli and Daman and Diu',
    'Daman and Diu': 'Dadra and Nagar Haveli and Daman and Diu',
    'The Dadra And Nagar Haveli And Daman And Diu': 'Dadra and Nagar Haveli and Daman and Diu',
}

# Invalid state entries (data entry errors)
INVALID_STATES = ['100000', 'BALANAGAR', 'Darbhanga', 'Jaipur', 'Madanapalle', 
                  'Nagpur', 'Puttenahalli', 'Raja Annamalai Puram']

def clean_state_name(df):
    """Standardize state names and remove invalid entries"""
    df = df.copy()
    df['state'] = df['state'].replace(STATE_MAPPING)
    df = df[~df['state'].isin(INVALID_STATES)]
    return df

# Apply cleaning
bio_df = clean_state_name(bio_df)
demo_df = clean_state_name(demo_df)
enrol_df = clean_state_name(enrol_df)

# Convert date columns to datetime
bio_df['date'] = pd.to_datetime(bio_df['date'], format='%d-%m-%Y')
demo_df['date'] = pd.to_datetime(demo_df['date'], format='%d-%m-%Y')
enrol_df['date'] = pd.to_datetime(enrol_df['date'], format='%d-%m-%Y')

# Add derived time columns
for df in [bio_df, demo_df, enrol_df]:
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df['month_year'] = df['date'].dt.to_period('M')
    df['day_of_week'] = df['date'].dt.dayofweek

print(f"After cleaning:")
print(f"  Biometric: {bio_df.shape}")
print(f"  Demographic: {demo_df.shape}")
print(f"  Enrolment: {enrol_df.shape}")
print(f"\nUnique states: {bio_df['state'].nunique()}")
print(f"Date range: {bio_df['date'].min()} to {bio_df['date'].max()}")


## 3. Feature Engineering - Novel Identity Lifecycle Metrics

**Creating unique derived metrics not seen in standard Aadhaar analyses:**
- **Identity Velocity Index (IVI)**: Updates per capita - measures identity data volatility
- **Biometric Stress Index (BSI)**: Bio/Demo ratio - indicates biometric-related issues
- **Youth Update Intensity**: Age-group specific patterns


In [None]:
# Add total columns
bio_df['total_bio_updates'] = bio_df['bio_age_5_17'] + bio_df['bio_age_17_']
demo_df['total_demo_updates'] = demo_df['demo_age_5_17'] + demo_df['demo_age_17_']
enrol_df['total_enrolments'] = enrol_df['age_0_5'] + enrol_df['age_5_17'] + enrol_df['age_18_greater']

# Age group ratios
bio_df['youth_ratio_bio'] = bio_df['bio_age_5_17'] / (bio_df['total_bio_updates'] + 1)
demo_df['youth_ratio_demo'] = demo_df['demo_age_5_17'] / (demo_df['total_demo_updates'] + 1)
enrol_df['infant_ratio'] = enrol_df['age_0_5'] / (enrol_df['total_enrolments'] + 1)

print("Added derived columns to all dataframes")


## 4. NOVEL ANALYSIS 1: Pincode-Level Identity Velocity Index

**Concept:** Measure how "dynamic" identity data is at each pincode by combining biometric & demographic update rates normalized by enrolment base.

This creates an **Identity Velocity Score** - higher means more identity changes per capita


In [None]:
# Aggregate by pincode
bio_pincode = bio_df.groupby(['pincode', 'state', 'district']).agg({
    'bio_age_5_17': 'sum', 'bio_age_17_': 'sum', 'total_bio_updates': 'sum',
    'date': 'count'
}).rename(columns={'date': 'bio_records'}).reset_index()

demo_pincode = demo_df.groupby(['pincode', 'state', 'district']).agg({
    'demo_age_5_17': 'sum', 'demo_age_17_': 'sum', 'total_demo_updates': 'sum',
    'date': 'count'
}).rename(columns={'date': 'demo_records'}).reset_index()

enrol_pincode = enrol_df.groupby(['pincode', 'state', 'district']).agg({
    'age_0_5': 'sum', 'age_5_17': 'sum', 'age_18_greater': 'sum',
    'total_enrolments': 'sum', 'date': 'count'
}).rename(columns={'date': 'enrol_records'}).reset_index()

print(f"Unique pincodes - Bio: {len(bio_pincode)}, Demo: {len(demo_pincode)}, Enrol: {len(enrol_pincode)}")


In [None]:
# Merge all three datasets at pincode level
pincode_merged = bio_pincode.merge(
    demo_pincode[['pincode', 'demo_age_5_17', 'demo_age_17_', 'total_demo_updates']], 
    on='pincode', how='outer'
).merge(
    enrol_pincode[['pincode', 'age_0_5', 'age_5_17', 'age_18_greater', 'total_enrolments']], 
    on='pincode', how='outer'
)

pincode_merged = pincode_merged.fillna(0)

# Calculate Identity Velocity Index (IVI)
pincode_merged['total_updates'] = pincode_merged['total_bio_updates'] + pincode_merged['total_demo_updates']
pincode_merged['identity_velocity_index'] = (
    pincode_merged['total_updates'] / (pincode_merged['total_enrolments'] + 1)
) * 100

# Biometric Stress Index (BSI)
pincode_merged['biometric_stress_index'] = (
    pincode_merged['total_bio_updates'] / (pincode_merged['total_demo_updates'] + 1)
)

# Youth Update Intensity
pincode_merged['youth_update_ratio'] = (
    (pincode_merged['bio_age_5_17'] + pincode_merged['demo_age_5_17']) / 
    (pincode_merged['total_updates'] + 1)
)

print(f"Merged dataset shape: {pincode_merged.shape}")
print("\nNovel Indices Statistics:")
print(pincode_merged[['identity_velocity_index', 'biometric_stress_index', 'youth_update_ratio']].describe())


In [None]:
# Visualize Identity Velocity Index Distribution
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# IVI Distribution
ax1 = axes[0]
data_ivi = pincode_merged['identity_velocity_index']
data_ivi_clipped = data_ivi[data_ivi < data_ivi.quantile(0.99)]  # Remove outliers
ax1.hist(data_ivi_clipped, bins=50, color='#2E86AB', edgecolor='white', alpha=0.8)
ax1.set_xlabel('Identity Velocity Index')
ax1.set_ylabel('Frequency')
ax1.set_title('Distribution of Identity Velocity Index\n(Updates per Enrolment)')
ax1.axvline(data_ivi_clipped.median(), color='red', linestyle='--', label=f'Median: {data_ivi_clipped.median():.1f}')
ax1.legend()

# BSI Distribution
ax2 = axes[1]
data_bsi = pincode_merged['biometric_stress_index']
data_bsi_clipped = data_bsi[data_bsi < data_bsi.quantile(0.99)]
ax2.hist(data_bsi_clipped, bins=50, color='#E94F37', edgecolor='white', alpha=0.8)
ax2.set_xlabel('Biometric Stress Index')
ax2.set_ylabel('Frequency')
ax2.set_title('Distribution of Biometric Stress Index\n(Bio/Demo Update Ratio)')
ax2.axvline(data_bsi_clipped.median(), color='blue', linestyle='--', label=f'Median: {data_bsi_clipped.median():.2f}')
ax2.legend()

# Youth Update Ratio
ax3 = axes[2]
ax3.hist(pincode_merged['youth_update_ratio'], bins=50, color='#44AF69', edgecolor='white', alpha=0.8)
ax3.set_xlabel('Youth Update Ratio')
ax3.set_ylabel('Frequency')
ax3.set_title('Distribution of Youth Update Ratio\n(5-17 Age Group Share)')
ax3.axvline(pincode_merged['youth_update_ratio'].median(), color='red', linestyle='--', 
            label=f'Median: {pincode_merged["youth_update_ratio"].median():.2f}')
ax3.legend()

plt.tight_layout()
plt.savefig('identity_indices_distribution.png', dpi=150, bbox_inches='tight')
plt.show()


## 5. NOVEL ANALYSIS 2: State-Level Identity Behavior Profiling

**Concept:** Profile states based on identity update patterns to identify:
- High-stress states (need biometric alternatives like iris/OTP)
- High-volatility states (need better document verification)
- Stable states (best practices to replicate)


In [None]:
# Aggregate by state
state_bio = bio_df.groupby('state').agg({
    'bio_age_5_17': 'sum', 'bio_age_17_': 'sum', 'total_bio_updates': 'sum'
}).reset_index()

state_demo = demo_df.groupby('state').agg({
    'demo_age_5_17': 'sum', 'demo_age_17_': 'sum', 'total_demo_updates': 'sum'
}).reset_index()

state_enrol = enrol_df.groupby('state').agg({
    'age_0_5': 'sum', 'age_5_17': 'sum', 'age_18_greater': 'sum', 'total_enrolments': 'sum'
}).reset_index()

# Merge
state_merged = state_bio.merge(state_demo, on='state').merge(state_enrol, on='state')

# Calculate state-level indices
state_merged['total_updates'] = state_merged['total_bio_updates'] + state_merged['total_demo_updates']
state_merged['IVI'] = state_merged['total_updates'] / (state_merged['total_enrolments'] + 1) * 100
state_merged['BSI'] = state_merged['total_bio_updates'] / (state_merged['total_demo_updates'] + 1)
state_merged['youth_ratio'] = (
    (state_merged['bio_age_5_17'] + state_merged['demo_age_5_17']) / 
    (state_merged['total_updates'] + 1)
)

print(f"State-level data: {state_merged.shape}")
state_merged.sort_values('total_updates', ascending=False).head(10)


In [None]:
# Visualize State-wise comparison
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Top 15 states by total updates
top_states = state_merged.nlargest(15, 'total_updates')

# 1. Total Updates by State
ax1 = axes[0, 0]
colors = plt.cm.viridis(np.linspace(0, 0.8, len(top_states)))
ax1.barh(top_states['state'], top_states['total_updates']/1e6, color=colors)
ax1.set_xlabel('Total Updates (Millions)')
ax1.set_title('Top 15 States by Total Identity Updates', fontweight='bold')
ax1.invert_yaxis()

# 2. Biometric vs Demographic Updates
ax2 = axes[0, 1]
x = np.arange(len(top_states))
width = 0.35
ax2.barh(x - width/2, top_states['total_bio_updates']/1e6, width, label='Biometric', color='#2E86AB')
ax2.barh(x + width/2, top_states['total_demo_updates']/1e6, width, label='Demographic', color='#E94F37')
ax2.set_yticks(x)
ax2.set_yticklabels(top_states['state'])
ax2.set_xlabel('Updates (Millions)')
ax2.set_title('Biometric vs Demographic Updates', fontweight='bold')
ax2.legend()
ax2.invert_yaxis()

# 3. Identity Velocity Index
ax3 = axes[1, 0]
state_sorted_ivi = state_merged.nlargest(15, 'IVI')
colors_ivi = ['#E94F37' if x > state_merged['IVI'].median() else '#44AF69' for x in state_sorted_ivi['IVI']]
ax3.barh(state_sorted_ivi['state'], state_sorted_ivi['IVI'], color=colors_ivi)
ax3.set_xlabel('Identity Velocity Index')
ax3.set_title('Top 15 States by Identity Velocity Index\n(Higher = More Updates per Enrolment)', fontweight='bold')
ax3.invert_yaxis()

# 4. Biometric Stress Index
ax4 = axes[1, 1]
state_sorted_bsi = state_merged.nlargest(15, 'BSI')
colors_bsi = plt.cm.Reds(np.linspace(0.3, 0.9, len(state_sorted_bsi)))
ax4.barh(state_sorted_bsi['state'], state_sorted_bsi['BSI'], color=colors_bsi)
ax4.set_xlabel('Biometric Stress Index')
ax4.set_title('Top 15 States by Biometric Stress Index\n(Higher = More Bio Updates vs Demo)', fontweight='bold')
ax4.invert_yaxis()

plt.tight_layout()
plt.savefig('state_analysis.png', dpi=150, bbox_inches='tight')
plt.show()


## 6. NOVEL ANALYSIS 3: Temporal Patterns Discovery

**Concept:** Discover operational patterns in Aadhaar updates:
- Which days see most activity? (Operational planning)
- Weekly seasonality (Resource allocation)
- Month-end surge patterns (Administrative deadline effects)


In [None]:
# Day-of-week analysis
day_names = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

bio_by_day = bio_df.groupby('day_of_week')['total_bio_updates'].sum()
demo_by_day = demo_df.groupby('day_of_week')['total_demo_updates'].sum()
enrol_by_day = enrol_df.groupby('day_of_week')['total_enrolments'].sum()

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Biometric by day
ax1 = axes[0]
ax1.bar([day_names[i] for i in bio_by_day.index], bio_by_day.values/1e6, color='#2E86AB')
ax1.set_ylabel('Updates (Millions)')
ax1.set_title('Biometric Updates by Day of Week', fontweight='bold')
ax1.tick_params(axis='x', rotation=45)

# Demographic by day
ax2 = axes[1]
ax2.bar([day_names[i] for i in demo_by_day.index], demo_by_day.values/1e6, color='#E94F37')
ax2.set_ylabel('Updates (Millions)')
ax2.set_title('Demographic Updates by Day of Week', fontweight='bold')
ax2.tick_params(axis='x', rotation=45)

# Enrolments by day
ax3 = axes[2]
ax3.bar([day_names[i] for i in enrol_by_day.index], enrol_by_day.values/1e6, color='#44AF69')
ax3.set_ylabel('Enrolments (Millions)')
ax3.set_title('New Enrolments by Day of Week', fontweight='bold')
ax3.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('day_of_week_analysis.png', dpi=150, bbox_inches='tight')
plt.show()


In [None]:
# Time series - Monthly trends
bio_monthly = bio_df.groupby('month_year')['total_bio_updates'].sum()
demo_monthly = demo_df.groupby('month_year')['total_demo_updates'].sum()
enrol_monthly = enrol_df.groupby('month_year')['total_enrolments'].sum()

fig, ax = plt.subplots(figsize=(14, 6))

x_bio = [str(x) for x in bio_monthly.index]
x_demo = [str(x) for x in demo_monthly.index]
x_enrol = [str(x) for x in enrol_monthly.index]

ax.plot(x_bio, bio_monthly.values/1e6, marker='o', linewidth=2, label='Biometric Updates', color='#2E86AB')
ax.plot(x_demo, demo_monthly.values/1e6, marker='s', linewidth=2, label='Demographic Updates', color='#E94F37')
ax.plot(x_enrol, enrol_monthly.values/1e6, marker='^', linewidth=2, label='New Enrolments', color='#44AF69')

ax.set_xlabel('Month-Year')
ax.set_ylabel('Count (Millions)')
ax.set_title('Monthly Trends in Aadhaar Activity', fontweight='bold', fontsize=14)
ax.legend()
ax.tick_params(axis='x', rotation=45)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('monthly_trends.png', dpi=150, bbox_inches='tight')
plt.show()


## 7. NOVEL ANALYSIS 4: Correlation Analysis - Biometric vs Demographic

**Key Question:** Do areas with high biometric updates also have high demographic updates, or are they inversely related?

This reveals whether identity maintenance patterns are uniform or specialized.


In [None]:
# Correlation at pincode level
correlation_data = pincode_merged[['total_bio_updates', 'total_demo_updates', 'total_enrolments', 
                                   'bio_age_5_17', 'bio_age_17_', 'demo_age_5_17', 'demo_age_17_',
                                   'age_0_5', 'age_5_17', 'age_18_greater']].copy()

correlation_data.columns = ['Bio Total', 'Demo Total', 'Enrol Total',
                            'Bio 5-17', 'Bio 17+', 'Demo 5-17', 'Demo 17+',
                            'Enrol 0-5', 'Enrol 5-17', 'Enrol 18+']

corr_matrix = correlation_data.corr()

fig, ax = plt.subplots(figsize=(12, 10))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt='.2f', cmap='RdBu_r', center=0,
            square=True, linewidths=0.5, ax=ax, vmin=-1, vmax=1)
ax.set_title('Correlation Matrix: Pincode-Level Activity Metrics', fontweight='bold', fontsize=14)

plt.tight_layout()
plt.savefig('correlation_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nKey Correlations:")
print(f"Bio Total vs Demo Total: {corr_matrix.loc['Bio Total', 'Demo Total']:.3f}")
print(f"Bio Total vs Enrol Total: {corr_matrix.loc['Bio Total', 'Enrol Total']:.3f}")
print(f"Demo Total vs Enrol Total: {corr_matrix.loc['Demo Total', 'Enrol Total']:.3f}")


## 8. Summary Statistics & Key Findings


In [None]:
print("="*70)
print("                    KEY FINDINGS SUMMARY")
print("="*70)

print("\nüìä DATASET OVERVIEW:")
print(f"   ‚Ä¢ Total Biometric Updates: {bio_df['total_bio_updates'].sum():,.0f}")
print(f"   ‚Ä¢ Total Demographic Updates: {demo_df['total_demo_updates'].sum():,.0f}")
print(f"   ‚Ä¢ Total New Enrolments: {enrol_df['total_enrolments'].sum():,.0f}")
print(f"   ‚Ä¢ Unique Pincodes Covered: {pincode_merged['pincode'].nunique():,}")
print(f"   ‚Ä¢ States/UTs Covered: {bio_df['state'].nunique()}")

print("\nüîç NOVEL INDICES COMPUTED:")
print(f"   ‚Ä¢ Identity Velocity Index (IVI)")
print(f"     - Median: {pincode_merged['identity_velocity_index'].median():.2f}")
print(f"     - Max: {pincode_merged['identity_velocity_index'].max():.2f}")
print(f"   ‚Ä¢ Biometric Stress Index (BSI)")
print(f"     - Median: {pincode_merged['biometric_stress_index'].median():.2f}")
print(f"     - Max: {pincode_merged['biometric_stress_index'].max():.2f}")

print("\nüèÜ TOP STATES BY ACTIVITY:")
top3 = state_merged.nlargest(3, 'total_updates')
for i, row in enumerate(top3.itertuples(), 1):
    print(f"   {i}. {row.state}: {row.total_updates:,.0f} total updates")

print("\n‚ö†Ô∏è HIGH BIOMETRIC STRESS STATES:")
high_bsi = state_merged.nlargest(3, 'BSI')
for i, row in enumerate(high_bsi.itertuples(), 1):
    print(f"   {i}. {row.state}: BSI = {row.BSI:.2f}")

total_updates = bio_df['total_bio_updates'].sum() + demo_df['total_demo_updates'].sum()
youth_updates = bio_df['bio_age_5_17'].sum() + demo_df['demo_age_5_17'].sum()
adult_updates = bio_df['bio_age_17_'].sum() + demo_df['demo_age_17_'].sum()

print("\nüìà AGE GROUP INSIGHTS:")
print(f"   ‚Ä¢ Youth (5-17) share of updates: {youth_updates / total_updates * 100:.1f}%")
print(f"   ‚Ä¢ Adult (17+) share of updates: {adult_updates / total_updates * 100:.1f}%")
print(f"   ‚Ä¢ Infant (0-5) enrolments: {enrol_df['age_0_5'].sum():,.0f} ({enrol_df['age_0_5'].sum() / enrol_df['total_enrolments'].sum() * 100:.1f}%)")

print("\n" + "="*70)


## üéØ NOVEL IDEAS FOR DEEPER ANALYSIS & ML

Based on EDA, here are **truly unique and feasible** approaches not commonly seen:

---

### üî• IDEA 1: **Pincode-Level Identity Lifecycle Scoring System**
**What:** Create a composite score for each pincode that predicts future update load
**Novel aspect:** No existing Aadhaar analysis does micro-geographic prediction
**Use:** Resource allocation for enrolment centers

---

### üî• IDEA 2: **Age-Cohort Biometric Degradation Model**
**What:** Track how biometric update rates change across age groups over time
**Novel aspect:** Quantify "biometric aging" at population level
**Use:** Plan for iris/OTP fallback in specific demographics

---

### üî• IDEA 3: **State Identity Stability Ranking**
**What:** Rank states by identity data stability using IVI, BSI, and update patterns
**Novel aspect:** Creates actionable policy-ready rankings
**Use:** Identify states needing intervention vs best-practice states

---

### üî• IDEA 4: **Temporal Anomaly Detection System**
**What:** ML model to detect unusual update spikes at pincode level
**Novel aspect:** Real-time operational intelligence
**Use:** Early warning system for data quality issues

---

### üî• IDEA 5: **Youth Identity Trajectory Prediction**
**What:** Predict when youth (5-17) will need updates based on patterns
**Novel aspect:** Proactive service delivery instead of reactive
**Use:** Send notifications before mandatory updates

---

### üî• IDEA 6: **Cross-Dataset Coherence Analysis**
**What:** Check if enrolment patterns match update patterns (logical consistency)
**Novel aspect:** Data quality validation at scale
**Use:** Identify regions with data integrity issues
