This project analyzes healthcare insurance data to identify factors driving medical costs and segment high-risk individuals.
The analysis focuses on age, BMI, smoking status, and region to provide actionable insights for insurers.
Objective:
Identify key drivers of medical insurance charges and recommend strategies for cost reduction and risk management.
- Source: Kaggle - Medical Cost Personal Dataset
- Size: 1 CSV file, ~7,000 records
- Columns:
Column Description age Age of the customer sex Male / Female bmi Body Mass Index children Number of dependents smoker Yes / No region Customer region in the US charges Medical insurance cost (target)
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Google Colab
- Power BI for dashboard visualization
- Checked for missing values and duplicates
- Ensured correct data types
- Created derived columns:
- Age Group: Young / Adult / Senior
- BMI Category: Underweight / Normal / Overweight / Obese
- Risk Level: Low / Medium / High based on age, BMI, and smoker status
- Compared average charges by smoker status
- Analyzed charges vs age and charges vs BMI
- Checked regional cost differences
- Visualized risk-level cost patterns using boxplots
- High Risk: Smoker OR Obese OR Senior
- Medium Risk: Adult OR Overweight
- Low Risk: Young, Non-smoker, Normal BMI
- Calculated average charges per risk group
- Smoking is the biggest cost driver — smokers have 3–4x higher charges.
- Older age and higher BMI correlate strongly with higher costs.
- High-risk customers contribute disproportionately to total medical expenses.
- Regional differences are minor compared to smoking or BMI effects.
- Wellness Programs: Encourage high-risk customers to join preventive care programs.
- Premium Adjustments: Adjust insurance premiums based on risk level.
- Targeted Awareness: Educate high-risk segments about lifestyle impacts on costs.
- Preventive Interventions: Offer regular checkups or incentives for early detection.
- Scatter plots: Age vs Charges, BMI vs Charges
- Boxplots: Region vs Charges, Risk Level vs Charges
- Smoker vs Non-smoker average charges bar chart
- Performed data analysis on 7,000+ healthcare insurance records to identify cost drivers like smoking, age, and BMI.
- Segmented customers into Low, Medium, High Risk groups and calculated average medical charges.
- Delivered actionable insights for insurers, including wellness programs and premium recommendations.
- Visualized patterns using Matplotlib and Seaborn, highlighting age, BMI, region, and smoking impacts.