Skip to content

Vampire38/Healthcare_Insurance_DataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Healthcare Insurance Data Analysis

Overview

This project analyzes healthcare insurance data to identify factors driving medical costs and segment high-risk individuals.
The analysis focuses on age, BMI, smoking status, and region to provide actionable insights for insurers.

Objective:

Identify key drivers of medical insurance charges and recommend strategies for cost reduction and risk management.


Dataset

  • Source: Kaggle - Medical Cost Personal Dataset
  • Size: 1 CSV file, ~7,000 records
  • Columns:
    Column Description
    age Age of the customer
    sex Male / Female
    bmi Body Mass Index
    children Number of dependents
    smoker Yes / No
    region Customer region in the US
    charges Medical insurance cost (target)

Tools & Libraries

  • Python (Pandas, NumPy, Matplotlib, Seaborn)
  • Google Colab
  • Power BI for dashboard visualization

Steps

1. Data Cleaning

  • Checked for missing values and duplicates
  • Ensured correct data types
  • Created derived columns:
    • Age Group: Young / Adult / Senior
    • BMI Category: Underweight / Normal / Overweight / Obese
    • Risk Level: Low / Medium / High based on age, BMI, and smoker status

2. Exploratory Data Analysis (EDA)

  • Compared average charges by smoker status
  • Analyzed charges vs age and charges vs BMI
  • Checked regional cost differences
  • Visualized risk-level cost patterns using boxplots

3. Segmentation

  • High Risk: Smoker OR Obese OR Senior
  • Medium Risk: Adult OR Overweight
  • Low Risk: Young, Non-smoker, Normal BMI
  • Calculated average charges per risk group

Key Insights

  • Smoking is the biggest cost driver — smokers have 3–4x higher charges.
  • Older age and higher BMI correlate strongly with higher costs.
  • High-risk customers contribute disproportionately to total medical expenses.
  • Regional differences are minor compared to smoking or BMI effects.

Recommendations

  • Wellness Programs: Encourage high-risk customers to join preventive care programs.
  • Premium Adjustments: Adjust insurance premiums based on risk level.
  • Targeted Awareness: Educate high-risk segments about lifestyle impacts on costs.
  • Preventive Interventions: Offer regular checkups or incentives for early detection.

Visualizations

  • Scatter plots: Age vs Charges, BMI vs Charges
  • Boxplots: Region vs Charges, Risk Level vs Charges
  • Smoker vs Non-smoker average charges bar chart

  • Performed data analysis on 7,000+ healthcare insurance records to identify cost drivers like smoking, age, and BMI.
  • Segmented customers into Low, Medium, High Risk groups and calculated average medical charges.
  • Delivered actionable insights for insurers, including wellness programs and premium recommendations.
  • Visualized patterns using Matplotlib and Seaborn, highlighting age, BMI, region, and smoking impacts.

About

This project analyzes healthcare insurance data to identify **factors driving medical costs** and segment high-risk individuals. The analysis focuses on **age, BMI, smoking status, and region** to provide actionable insights for insurers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors