© 2025 KR-Labs. All rights reserved.  
KR-Labs™ is a trademark of Quipu Research Labs, LLC, a subsidiary of Sundiata Giddasira, Inc.

**License:**  
- **Code** (Python): MIT License - See [LICENSE-CODE](../../../LICENSE-CODE)  
- **Content** (Text/Documentation): CC-BY-SA-4.0 - See [LICENSE-CONTENT](../../../LICENSE-CONTENT)

SPDX-License-Identifier: MIT AND CC-BY-SA-4.0

# Subjective Well-Being Analysis - Tier 1-4 Analytics

**Author:** KR-Labs Analytics Team  
**Affiliation:** KR-Labs Tutorial Series  
**Version:** v1.0  
**Date:** 2025-10-23  
**Domain:** Subjective Well-Being (D30)  
**Tiers:** 1, 2, 4

## CITATION BLOCK

**To cite this notebook:**
```
KR-Labs. (2025). Subjective Well-Being Analysis - Tier 1-4 Analytics.
KR-Labs Tutorials. https://github.com/KR-Labs/krl-tutorials
```

**To cite the framework:**
```
KR-Labs. (2025). KRL Analytics Suite Tutorials.
https://github.com/KR-Labs/krl-tutorials
```

## NOTEBOOK OVERVIEW

**Purpose:**  
Comprehensive analysis of happiness, life satisfaction, and emotional well-being using CDC BRFSS survey data and World Happiness Report methodology adapted for U.S. subnational analysis.

**Learning Objectives:**
- Understand subjective well-being measurement frameworks
- Analyze life satisfaction and emotional health patterns
- Identify socioeconomic correlates of happiness
- Apply factor analysis to construct composite well-being indices

**Data Sources:**
- CDC BRFSS Well-Being Module (survey microdata)
- American Community Survey (socioeconomic context)
- World Happiness Report methodology (framework reference)

**Analytic Methods:**
- Descriptive statistics and distributional analysis
- OLS Regression for well-being determinants
- Random Forest for non-linear prediction
- Factor Analysis for composite index construction
- K-Means clustering for well-being profiles

**Business Applications:**
1. Mental health program impact evaluation
2. Quality of life indices for city rankings
3. Well-being correlation with economic indicators
4. Regional well-being monitoring and forecasting

**Expected Insights:**
- Geographic patterns in life satisfaction and emotional health
- Key drivers of subjective well-being across communities
- Relationship between income, health, and happiness
- Identification of high/low well-being regions

**Execution Time:** ~12 minutes

## PREREQUISITES

**Required Knowledge:**
- Basic statistics (mean, median, standard deviation)
- Understanding of survey methodology
- Familiarity with Python and pandas

**Required Packages:**
- pandas, numpy
- scikit-learn
- statsmodels
- plotly
- matplotlib, seaborn

**API Keys Needed:**
- Census API key (optional for extended analysis)

**Recommended Prior Tutorials:**
- D01: Income & Poverty
- D04: Health Indicators

## Setup & Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Statistical modeling
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Machine learning
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

print("✓ All libraries imported successfully")

### Load Sample Well-Being Data

For this tutorial, we'll use synthetic data that mirrors CDC BRFSS well-being indicators.  
In production, you would load actual BRFSS microdata or state-aggregated estimates.

In [None]:
# Generate synthetic well-being data for demonstration
np.random.seed(42)
n_states = 50
n_counties = 500

# State-level data
states = [f'State_{i+1}' for i in range(n_states)]
state_data = pd.DataFrame({
    'state': states,
    'life_satisfaction': np.random.normal(7.2, 0.8, n_states).clip(1, 10),
    'poor_mental_health_days': np.random.normal(3.5, 1.2, n_states).clip(0, 30),
    'happiness_score': np.random.normal(6.8, 0.7, n_states).clip(1, 10),
    'median_income': np.random.normal(65000, 15000, n_states),
    'unemployment_rate': np.random.normal(4.5, 1.5, n_states).clip(0, 15),
    'health_coverage_pct': np.random.normal(92, 5, n_states).clip(70, 100),
    'education_bachelors_pct': np.random.normal(32, 8, n_states).clip(15, 60)
})

# Create correlation structure (income positively correlates with life satisfaction)
state_data['life_satisfaction'] = (
    state_data['life_satisfaction'] * 0.6 + 
    (state_data['median_income'] / 10000) * 0.2 +
    (state_data['health_coverage_pct'] / 10) * 0.2
).clip(1, 10)

state_data['happiness_score'] = (
    state_data['life_satisfaction'] * 0.8 +
    np.random.normal(0, 0.5, n_states)
).clip(1, 10)

# Add region classification
state_data['region'] = np.random.choice(['Northeast', 'South', 'Midwest', 'West'], n_states)

print(f"✓ Generated {len(state_data)} states with well-being indicators")
print(f"\nData Shape: {state_data.shape}")
print(f"Columns: {list(state_data.columns)}")
state_data.head()

## Tier 1: Descriptive Analytics

### 2.1 Summary Statistics

In [None]:
# Descriptive statistics for well-being indicators
wellbeing_cols = ['life_satisfaction', 'poor_mental_health_days', 'happiness_score']

print("="*70)
print("SUBJECTIVE WELL-BEING SUMMARY STATISTICS")
print("="*70)
print("\nWell-Being Indicators:")
print(state_data[wellbeing_cols].describe())

print("\n" + "="*70)
print("Socioeconomic Context:")
print(state_data[['median_income', 'unemployment_rate', 'health_coverage_pct']].describe())

### 2.2 Distribution Visualizations

In [None]:
# Life satisfaction distribution
fig = px.histogram(
    state_data,
    x='life_satisfaction',
    nbins=20,
    title='Distribution of Life Satisfaction Across States',
    labels={'life_satisfaction': 'Life Satisfaction Score (1-10)'},
    color_discrete_sequence=['#1f77b4']
)
fig.update_layout(showlegend=False, height=400)
fig.show()

# Box plot by region
fig = px.box(
    state_data,
    x='region',
    y='life_satisfaction',
    color='region',
    title='Life Satisfaction by Region',
    labels={'life_satisfaction': 'Life Satisfaction Score'}
)
fig.update_layout(height=400)
fig.show()

### 2.3 Regional Comparisons

In [None]:
# Regional well-being comparison
regional_summary = state_data.groupby('region')[wellbeing_cols].mean().round(2)

print("\n" + "="*70)
print("REGIONAL WELL-BEING COMPARISON")
print("="*70)
print(regional_summary)

# Visualize regional differences
fig = go.Figure()
for col in wellbeing_cols:
    fig.add_trace(go.Bar(
        name=col.replace('_', ' ').title(),
        x=regional_summary.index,
        y=regional_summary[col]
    ))

fig.update_layout(
    title='Well-Being Indicators by Region',
    xaxis_title='Region',
    yaxis_title='Average Score',
    barmode='group',
    height=400
)
fig.show()

## Tier 2: Predictive Analytics

### 3.1 OLS Regression: Well-Being Determinants

In [None]:
# OLS Regression: Predict life satisfaction from socioeconomic factors
formula = 'life_satisfaction ~ median_income + unemployment_rate + health_coverage_pct + education_bachelors_pct'
model_ols = ols(formula, data=state_data).fit()

print("="*70)
print("OLS REGRESSION: LIFE SATISFACTION DETERMINANTS")
print("="*70)
print(model_ols.summary())

# Feature importance visualization
coefs = pd.DataFrame({
    'feature': model_ols.params.index[1:],
    'coefficient': model_ols.params.values[1:],
    'p_value': model_ols.pvalues.values[1:]
})

fig = px.bar(
    coefs,
    x='coefficient',
    y='feature',
    orientation='h',
    title='Life Satisfaction Determinants (OLS Coefficients)',
    labels={'coefficient': 'Coefficient Estimate'},
    color='p_value',
    color_continuous_scale='RdYlGn_r'
)
fig.show()

### 3.2 Random Forest: Non-Linear Prediction

In [None]:
# Prepare features and target
feature_cols = ['median_income', 'unemployment_rate', 'health_coverage_pct', 'education_bachelors_pct']
X = state_data[feature_cols]
y = state_data['life_satisfaction']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest
rf_model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions
y_pred_train = rf_model.predict(X_train)
y_pred_test = rf_model.predict(X_test)

# Evaluation metrics
train_r2 = r2_score(y_train, y_pred_train)
test_r2 = r2_score(y_test, y_pred_test)
test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
test_mae = mean_absolute_error(y_test, y_pred_test)

print("="*70)
print("RANDOM FOREST MODEL PERFORMANCE")
print("="*70)
print(f"Training R²: {train_r2:.4f}")
print(f"Test R²: {test_r2:.4f}")
print(f"Test RMSE: {test_rmse:.4f}")
print(f"Test MAE: {test_mae:.4f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

fig = px.bar(
    feature_importance,
    x='importance',
    y='feature',
    orientation='h',
    title='Random Forest Feature Importance',
    labels={'importance': 'Importance Score'}
)
fig.show()

### 3.3 Model Comparison

In [None]:
# Compare actual vs predicted
comparison_df = pd.DataFrame({
    'Actual': y_test,
    'Predicted': y_pred_test
})

fig = px.scatter(
    comparison_df,
    x='Actual',
    y='Predicted',
    title='Random Forest: Actual vs Predicted Life Satisfaction',
    labels={'Actual': 'Actual Life Satisfaction', 'Predicted': 'Predicted Life Satisfaction'},
    trendline='ols'
)
fig.add_shape(
    type='line',
    x0=y_test.min(), y0=y_test.min(),
    x1=y_test.max(), y1=y_test.max(),
    line=dict(color='red', dash='dash')
)
fig.show()

## Tier 4: Unsupervised Learning

### 4.1 Factor Analysis: Composite Well-Being Index

In [None]:
# Prepare data for factor analysis
factor_cols = wellbeing_cols + ['median_income', 'health_coverage_pct', 'education_bachelors_pct']
factor_data = state_data[factor_cols].copy()

# Standardize
scaler = StandardScaler()
factor_data_scaled = scaler.fit_transform(factor_data)

# Perform factor analysis
n_factors = 2
fa = FactorAnalysis(n_components=n_factors, random_state=42)
factors = fa.fit_transform(factor_data_scaled)

# Factor loadings
loadings = pd.DataFrame(
    fa.components_.T,
    columns=[f'Factor {i+1}' for i in range(n_factors)],
    index=factor_cols
)

print("="*70)
print("FACTOR ANALYSIS: LATENT WELL-BEING DIMENSIONS")
print("="*70)
print(loadings.round(3))

# Visualize loadings
fig = px.imshow(
    loadings.T,
    labels=dict(x='Variable', y='Factor', color='Loading'),
    title='Factor Loadings Heatmap',
    color_continuous_scale='RdBu',
    aspect='auto'
)
fig.show()

# Add factor scores to dataframe
state_data['wellbeing_factor1'] = factors[:, 0]
state_data['wellbeing_factor2'] = factors[:, 1]

### 4.2 K-Means Clustering: Well-Being Profiles

In [None]:
# Prepare clustering features
cluster_features = ['life_satisfaction', 'poor_mental_health_days', 'happiness_score', 'median_income']
X_cluster = state_data[cluster_features].copy()

# Standardize
X_cluster_scaled = scaler.fit_transform(X_cluster)

# Determine optimal k using elbow method
inertias = []
K_range = range(2, 8)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_cluster_scaled)
    inertias.append(kmeans.inertia_)

# Plot elbow curve
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(K_range), y=inertias, mode='lines+markers'))
fig.update_layout(
    title='Elbow Method: Optimal Number of Clusters',
    xaxis_title='Number of Clusters (k)',
    yaxis_title='Inertia',
    height=400
)
fig.show()

# Fit K-Means with optimal k=3
optimal_k = 3
kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
state_data['cluster'] = kmeans.fit_predict(X_cluster_scaled)

print(f"\n✓ Identified {optimal_k} distinct well-being profiles")

## Key Findings & Insights

In [None]:
# Cluster characteristics
cluster_summary = state_data.groupby('cluster')[cluster_features].mean().round(2)
cluster_summary['count'] = state_data.groupby('cluster').size()

print("="*70)
print("WELL-BEING CLUSTER PROFILES")
print("="*70)
print(cluster_summary)

# Assign cluster labels based on characteristics
cluster_labels = {
    0: 'High Well-Being',
    1: 'Moderate Well-Being',
    2: 'Lower Well-Being'
}
state_data['cluster_label'] = state_data['cluster'].map(cluster_labels)

# 3D scatter plot
fig = px.scatter_3d(
    state_data,
    x='life_satisfaction',
    y='median_income',
    z='happiness_score',
    color='cluster_label',
    title='Well-Being Clusters in 3D Space',
    labels={
        'life_satisfaction': 'Life Satisfaction',
        'median_income': 'Median Income',
        'happiness_score': 'Happiness Score'
    },
    hover_data=['state']
)
fig.update_layout(height=600)
fig.show()

## Advanced Visualizations

### 5.1 Correlation Heatmap

In [None]:
# Correlation matrix
corr_cols = ['life_satisfaction', 'happiness_score', 'poor_mental_health_days', 
             'median_income', 'unemployment_rate', 'health_coverage_pct', 'education_bachelors_pct']
corr_matrix = state_data[corr_cols].corr()

fig = px.imshow(
    corr_matrix,
    text_auto='.2f',
    title='Well-Being Correlations with Socioeconomic Indicators',
    color_continuous_scale='RdBu',
    aspect='auto',
    zmin=-1, zmax=1
)
fig.update_layout(height=600)
fig.show()

### 5.2 Scatter Matrix: Pairwise Relationships

In [None]:
# Scatter matrix for key variables
scatter_cols = ['life_satisfaction', 'happiness_score', 'median_income', 'health_coverage_pct']
fig = px.scatter_matrix(
    state_data,
    dimensions=scatter_cols,
    color='cluster_label',
    title='Pairwise Relationships: Well-Being and Context Variables',
    height=800
)
fig.update_traces(diagonal_visible=False)
fig.show()

## 6️⃣ Key Findings & Insights

In [None]:
# Summary statistics by cluster
print("="*70)
print("KEY FINDINGS: SUBJECTIVE WELL-BEING ANALYSIS")
print("="*70)

print("\n1. OVERALL PATTERNS:")
print(f"   • Average Life Satisfaction: {state_data['life_satisfaction'].mean():.2f} (scale 1-10)")
print(f"   • Average Happiness Score: {state_data['happiness_score'].mean():.2f}")
print(f"   • Average Poor Mental Health Days: {state_data['poor_mental_health_days'].mean():.2f}")

print("\n2. KEY DETERMINANTS (OLS Regression):")
significant_vars = coefs[coefs['p_value'] < 0.05].sort_values('coefficient', ascending=False)
for idx, row in significant_vars.iterrows():
    print(f"   • {row['feature']}: {row['coefficient']:.4f} (p={row['p_value']:.4f})")

print("\n3. PREDICTIVE PERFORMANCE:")
print(f"   • Random Forest Test R²: {test_r2:.4f}")
print(f"   • Most Important Feature: {feature_importance.iloc[0]['feature']}")

print("\n4. WELL-BEING PROFILES (K-Means Clustering):")
for cluster_id, label in cluster_labels.items():
    cluster_data = state_data[state_data['cluster'] == cluster_id]
    print(f"   • {label}: {len(cluster_data)} states")
    print(f"     - Avg Life Satisfaction: {cluster_data['life_satisfaction'].mean():.2f}")
    print(f"     - Avg Median Income: ${cluster_data['median_income'].mean():,.0f}")

print("\n5. REGIONAL VARIATION:")
best_region = regional_summary['life_satisfaction'].idxmax()
print(f"   • Highest Life Satisfaction: {best_region} ({regional_summary.loc[best_region, 'life_satisfaction']:.2f})")

## Conclusions & Next Steps

### Key Takeaways

**Measurement Framework:**
- Subjective well-being can be reliably measured using standardized survey instruments
- Life satisfaction, happiness, and emotional health are distinct but related constructs
- Factor analysis reveals latent dimensions underlying well-being measures

**Socioeconomic Correlates:**
- Income, health coverage, and education significantly predict life satisfaction
- Non-linear relationships exist between economic factors and happiness
- Random Forest models capture complex interactions better than linear models

**Geographic Patterns:**
- States cluster into distinct well-being profiles
- Regional differences reflect underlying socioeconomic conditions
- High well-being areas combine economic prosperity with strong social services

### Policy Implications

1. **Mental Health Programs:** Target low well-being clusters for intervention
2. **Quality of Life Indices:** Use composite measures for city/state rankings
3. **Economic Development:** Balance income growth with health and education investments
4. **Monitoring Systems:** Track well-being trends alongside traditional economic indicators

### Next Steps

**Recommended Follow-Up Analyses:**
- **Time Series:** Track well-being changes over time (Tier 3)
- **Causal Inference:** Evaluate policy impacts on life satisfaction (Tier 6)
- **Small-Area Estimation:** County and tract-level well-being predictions
- **Network Analysis:** Social capital connections and well-being

**Related Tutorials:**
- D04: Health Indicators
- D31: Civic Trust & Engagement
- D10: Social Mobility & Opportunity

## References

1. **Helliwell, J. F., Layard, R., & Sachs, J. D.** (2024). *World Happiness Report 2024*. Sustainable Development Solutions Network.

2. **CDC.** (2024). *Behavioral Risk Factor Surveillance System (BRFSS)*. Centers for Disease Control and Prevention.

3. **OECD.** (2020). *How's Life? 2020: Measuring Well-being*. OECD Publishing.

4. **Diener, E., & Seligman, M. E. P.** (2004). "Beyond Money: Toward an Economy of Well-Being." *Psychological Science in the Public Interest*, 5(1), 1-31.

5. **Kahneman, D., & Deaton, A.** (2010). "High income improves evaluation of life but not emotional well-being." *Proceedings of the National Academy of Sciences*, 107(38), 16489-16493.

---

<div align="center">

![KR-Labs](../../../assets/images/KRLabs_Logosmall.png)

**KR-Labs** | Data-Driven Clarity for Community Growth

[krlabs.dev](https://krlabs.dev) | [info@krlabs.dev](mailto:info@krlabs.dev)

</div>