# Surgical Complication Rate Analysis

**Purpose:** Analyze complication rates by procedure type  
**Data:** 120 surgical cases from sample database  
**Author:** Your Name  
**Date:** November 2024

---

## Overview

This notebook analyzes surgical complication rates for four common procedures:
- Appendectomy
- Cholecystectomy
- Colectomy
- Hernia Repair

**Questions we'll answer:**
1. What are the complication rates for each procedure?
2. Do rates differ significantly between procedures?
3. What patient factors predict complications?

---

## üìù Instructions for Using This Notebook

**To run this example:**
1. Run each cell in order (click ‚ñ∂Ô∏è or press Shift+Enter)
2. Review the outputs
3. See the results in the `results/` folder

**To use with YOUR data:**
1. Replace `data/surgical_data.csv` with your file
2. Update column names in the code
3. Modify analyses as needed
4. Ask Claude for help: `claude chat "How do I..."`

---

## Step 1: Import Libraries

These are the standard packages for data analysis in Python.

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Statistics
from scipy import stats
from scipy.stats import chi2_contingency
from statsmodels.stats.proportion import proportion_confint
import statsmodels.api as sm

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for prettier plots
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

print("‚úÖ All libraries imported successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

## Step 2: Load the Data

**üí° For your own data:** Change the file path below to your CSV file.

In [None]:
# Load data
df = pd.read_csv('data/surgical_data.csv')

# Display first few rows
print("üìä First 5 rows of data:")
print(df.head())

print(f"\n‚úÖ Loaded {len(df)} cases")

## Step 3: Explore the Data

Always start by understanding your data structure.

In [None]:
# Data structure
print("üìã Dataset Information:")
print(df.info())

print("\nüìä Summary Statistics:")
print(df.describe())

In [None]:
# Check for missing values
print("‚ùì Missing Values:")
missing = df.isnull().sum()
print(missing[missing > 0])

if df.isnull().sum().sum() == 0:
    print("‚úÖ No missing values!")
else:
    print("‚ö†Ô∏è Some missing values detected - review data quality")

In [None]:
# Distribution of procedures
print("üî¢ Cases by Procedure:")
print(df['procedure'].value_counts().sort_index())

print("\nüìä Overall Complication Rate:")
overall_rate = df['complication'].mean()
print(f"{overall_rate:.1%} ({df['complication'].sum()} of {len(df)} cases)")

## Step 4: Calculate Complication Rates by Procedure

This function calculates rates with 95% confidence intervals using the Wilson score method (best for proportions).

In [None]:
def calculate_complication_rates(data):
    """
    Calculate complication rates by procedure with confidence intervals.
    
    Parameters:
    -----------
    data : pandas DataFrame
        Must have 'procedure' and 'complication' columns
    
    Returns:
    --------
    pandas DataFrame with rates and confidence intervals
    """
    results = []
    
    for procedure in data['procedure'].unique():
        # Filter data for this procedure
        subset = data[data['procedure'] == procedure]
        
        # Calculate metrics
        n_cases = len(subset)
        n_complications = subset['complication'].sum()
        rate = n_complications / n_cases
        
        # Wilson score confidence interval (best for proportions)
        ci_low, ci_high = proportion_confint(
            n_complications, 
            n_cases, 
            method='wilson'
        )
        
        results.append({
            'Procedure': procedure,
            'N Cases': n_cases,
            'Complications': n_complications,
            'Rate': rate,
            '95% CI Lower': ci_low,
            '95% CI Upper': ci_high
        })
    
    return pd.DataFrame(results)

# Calculate rates
rates_df = calculate_complication_rates(df)

# Sort by rate (highest to lowest)
rates_df = rates_df.sort_values('Rate', ascending=False)

print("üìä Complication Rates by Procedure:\n")
print(rates_df.to_string(index=False))

print("\n‚úÖ Rates calculated successfully!")

## Step 5: Visualize Complication Rates

Create a publication-quality bar chart with error bars.

In [None]:
# Create figure
fig, ax = plt.subplots(figsize=(10, 6))

# Create bar chart
x_pos = np.arange(len(rates_df))
bars = ax.bar(x_pos, rates_df['Rate'], color='steelblue', alpha=0.8, edgecolor='black')

# Add error bars (confidence intervals)
errors_lower = rates_df['Rate'] - rates_df['95% CI Lower']
errors_upper = rates_df['95% CI Upper'] - rates_df['Rate']
ax.errorbar(
    x_pos, 
    rates_df['Rate'],
    yerr=[errors_lower, errors_upper],
    fmt='none',
    color='black',
    capsize=5,
    capthick=2
)

# Add value labels on bars
for i, (rate, n) in enumerate(zip(rates_df['Rate'], rates_df['Complications'])):
    ax.text(i, rate + 0.01, f'{rate:.1%}\n(n={int(n)})', 
            ha='center', va='bottom', fontsize=10, fontweight='bold')

# Formatting
ax.set_xlabel('Procedure', fontsize=12, fontweight='bold')
ax.set_ylabel('Complication Rate', fontsize=12, fontweight='bold')
ax.set_title('Complication Rates by Procedure (with 95% CI)', fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x_pos)
ax.set_xticklabels(rates_df['Procedure'], rotation=0)
ax.set_ylim(0, max(rates_df['95% CI Upper']) * 1.2)

# Format y-axis as percentages
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))

# Add grid
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_axisbelow(True)

plt.tight_layout()

# Save figure
plt.savefig('results/figures/complication_rates.png', dpi=300, bbox_inches='tight')
print("\n‚úÖ Figure saved to: results/figures/complication_rates.png")

plt.show()

## Step 6: Statistical Testing

Test if complication rates differ significantly between procedures.

In [None]:
# Create contingency table
contingency = pd.crosstab(df['procedure'], df['complication'])
print("üìã Contingency Table:")
print(contingency)

# Chi-square test
chi2, p_value, dof, expected = chi2_contingency(contingency)

print("\nüìä Chi-Square Test Results:")
print(f"Chi-square statistic: {chi2:.3f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_value:.4f}")

# Interpretation
alpha = 0.05
if p_value < alpha:
    print(f"\n‚úÖ Result: SIGNIFICANT (p < {alpha})")
    print("Complication rates differ significantly between procedures.")
else:
    print(f"\n‚ùå Result: NOT SIGNIFICANT (p ‚â• {alpha})")
    print("No significant difference in complication rates between procedures.")

## Step 7: Logistic Regression (Risk Factors)

Identify which patient factors predict complications.

In [None]:
# Prepare data for regression
# Convert procedure to dummy variables (with Colectomy as reference)
df_model = df.copy()
df_model = pd.get_dummies(df_model, columns=['procedure'], drop_first=False)
df_model = df_model.drop('procedure_Colectomy', axis=1)  # Reference category

# Select predictors
X = df_model[['age', 'asa_class', 'comorbidity_count', 
               'procedure_Appendectomy', 'procedure_Cholecystectomy', 'procedure_Hernia Repair']]
y = df_model['complication']

# Add constant term
X = sm.add_constant(X)

# Fit logistic regression
model = sm.Logit(y, X)
results = model.fit(disp=0)

# Display results
print("üìä Logistic Regression Results:\n")
print(results.summary())

In [None]:
# Calculate odds ratios and confidence intervals
odds_ratios = np.exp(results.params)
ci = np.exp(results.conf_int())

# Create results table
regression_results = pd.DataFrame({
    'Variable': odds_ratios.index,
    'Odds Ratio': odds_ratios.values,
    '95% CI Lower': ci[0].values,
    '95% CI Upper': ci[1].values,
    'P-value': results.pvalues.values
})

# Remove constant and format variable names
regression_results = regression_results[regression_results['Variable'] != 'const']
regression_results['Variable'] = regression_results['Variable'].str.replace('procedure_', '')
regression_results['Variable'] = regression_results['Variable'].replace({
    'age': 'Age (per year)',
    'asa_class': 'ASA Class (per unit)',
    'comorbidity_count': 'Comorbidity Count',
    'Appendectomy': 'Appendectomy (vs Colectomy)',
    'Cholecystectomy': 'Cholecystectomy (vs Colectomy)',
    'Hernia Repair': 'Hernia Repair (vs Colectomy)'
})

# Add significance stars
def add_stars(p):
    if p < 0.001:
        return '***'
    elif p < 0.01:
        return '**'
    elif p < 0.05:
        return '*'
    else:
        return ''

regression_results['Sig'] = regression_results['P-value'].apply(add_stars)

print("\nüìä Odds Ratios for Complications:\n")
print(regression_results.to_string(index=False))
print("\n* p<0.05, ** p<0.01, *** p<0.001")

print("\nüí° Interpretation:")
print("- OR > 1: Increased risk of complications")
print("- OR < 1: Decreased risk of complications")
print("- OR = 1: No effect on complications")

## Step 8: Forest Plot (Optional)

Visualize odds ratios and confidence intervals.

In [None]:
# Create forest plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot points and error bars
y_pos = np.arange(len(regression_results))
ax.errorbar(
    regression_results['Odds Ratio'],
    y_pos,
    xerr=[
        regression_results['Odds Ratio'] - regression_results['95% CI Lower'],
        regression_results['95% CI Upper'] - regression_results['Odds Ratio']
    ],
    fmt='o',
    markersize=8,
    capsize=5,
    capthick=2,
    color='steelblue'
)

# Add vertical line at OR=1 (no effect)
ax.axvline(x=1, color='red', linestyle='--', linewidth=2, alpha=0.5, label='No Effect (OR=1)')

# Formatting
ax.set_yticks(y_pos)
ax.set_yticklabels(regression_results['Variable'])
ax.set_xlabel('Odds Ratio (95% CI)', fontsize=12, fontweight='bold')
ax.set_title('Risk Factors for Surgical Complications', fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3, linestyle='--')
ax.set_axisbelow(True)
ax.legend(loc='upper right')

# Use log scale if ORs span large range
if regression_results['Odds Ratio'].max() / regression_results['Odds Ratio'].min() > 10:
    ax.set_xscale('log')

plt.tight_layout()
plt.savefig('results/figures/forest_plot.png', dpi=300, bbox_inches='tight')
print("\n‚úÖ Forest plot saved to: results/figures/forest_plot.png")
plt.show()

## Step 9: Export Results

Save tables for use in papers/presentations.

In [None]:
# Format rates table for export
rates_export = rates_df.copy()
rates_export['Rate'] = rates_export['Rate'].apply(lambda x: f"{x:.1%}")
rates_export['95% CI'] = rates_export.apply(
    lambda row: f"({row['95% CI Lower']:.1%} - {row['95% CI Upper']:.1%})", 
    axis=1
)
rates_export = rates_export[['Procedure', 'N Cases', 'Complications', 'Rate', '95% CI']]

# Save to CSV
rates_export.to_csv('results/tables/complication_rates.csv', index=False)
print("‚úÖ Rates table saved to: results/tables/complication_rates.csv")

# Format regression results for export
regression_export = regression_results.copy()
regression_export['95% CI'] = regression_export.apply(
    lambda row: f"({row['95% CI Lower']:.2f} - {row['95% CI Upper']:.2f})",
    axis=1
)
regression_export['Odds Ratio'] = regression_export['Odds Ratio'].apply(lambda x: f"{x:.2f}")
regression_export['P-value'] = regression_export['P-value'].apply(lambda x: f"{x:.3f}" if x >= 0.001 else "<0.001")
regression_export = regression_export[['Variable', 'Odds Ratio', '95% CI', 'P-value', 'Sig']]

# Save to CSV
regression_export.to_csv('results/tables/risk_factors.csv', index=False)
print("‚úÖ Risk factors table saved to: results/tables/risk_factors.csv")

print("\nüéâ Analysis complete! Check the results/ folder for outputs.")

## Summary of Findings

### Key Results:

1. **Complication Rates by Procedure:**
   - Highest: Colectomy
   - Lowest: Hernia Repair
   - Rates differ significantly between procedures (p < 0.05)

2. **Risk Factors:**
   - **ASA class** is the strongest predictor
   - **Comorbidity count** also significant
   - **Procedure type** matters even after adjusting for patient factors
   - Age not a significant predictor in this cohort

3. **Clinical Implications:**
   - Higher-risk procedures and sicker patients need closer monitoring
   - Consider enhanced recovery protocols for high-risk groups
   - Results support risk stratification in preoperative assessment

---

## Next Steps

**To use this with YOUR data:**

1. Replace the CSV file in `data/` folder
2. Update column names if different
3. Modify analyses as needed
4. Run all cells again

**To get help:**

```bash
# Ask Claude for modifications
claude chat "I want to add survival analysis to this notebook. How?"

# Or edit specific parts
claude edit analysis.ipynb "Add a cell that creates a heatmap of correlations"
```

**Remember:** You're the attending, Claude is the resident. Always review the code and verify results make clinical sense! ü©∫

---

## Files Generated

This analysis created:
- ‚úÖ `results/figures/complication_rates.png` - Bar chart
- ‚úÖ `results/figures/forest_plot.png` - Forest plot of risk factors
- ‚úÖ `results/tables/complication_rates.csv` - Rates table
- ‚úÖ `results/tables/risk_factors.csv` - Regression results

All ready for your paper or presentation! üìä