🧾 Insurance Charges Prediction

This project performs Exploratory Data Analysis (EDA) and builds a Linear Regression model to predict medical insurance charges based on personal and lifestyle factors such as age, BMI, smoking status, and region.

📘 Project Overview

The aim of this project is to understand what factors influence medical insurance costs and to create a predictive model that estimates charges for new individuals.
The analysis includes:

Data exploration and visualization
Feature encoding and preparation
Model training and evaluation
Insight generation and visualization

📂 Dataset Information

Dataset: Insurance.csv
Features:

age: Age of the policyholder
sex: Gender (male/female)
bmi: Body Mass Index
children: Number of dependents
smoker: Smoking status (yes/no)
region: Residential region (northeast, northwest, southeast, southwest)
charges: Medical insurance cost (target variable)

📊 Exploratory Data Analysis (EDA)

Key visualizations used to explore data:

Histogram of Charges – Distribution of insurance costs
Boxplot of Charges by Smoker – Effect of smoking on cost
Scatter Plot of BMI vs Charges – Relationship between BMI and charges
Correlation Heatmap – Strength of relationships between numeric variables
Pair Plot – Relationships among age, BMI, children, and charges

Key Insights:

Smokers have significantly higher charges than non-smokers.
Age and BMI moderately increase medical costs.
Children and region have minimal effect.
Charges distribution is right-skewed (few high-cost outliers).

⚙️ Model Building

Algorithm Used: Linear Regression

Steps:

Encoded categorical features using one-hot encoding.
Split dataset into training (80%) and testing (20%).
Trained a linear regression model.
Evaluated performance using standard metrics.

Evaluation Metrics:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R² Score

📈 Model Results

The model shows that smoking status, age, and BMI are the strongest predictors of insurance charges.
Smokers and individuals with high BMI or older age have much higher predicted costs.
The model performs well overall but struggles with extreme high-charge cases (outliers).

📊 Visualizations Supporting Insights

Visualization	Purpose
Histogram of Charges	Understand data distribution
Boxplot by Smoker	Compare charges between smokers and non-smokers
Scatter Plot (BMI vs Charges)	Show relationship between health and cost
Correlation Heatmap	Identify strong predictive features
Actual vs Predicted Plot	Evaluate model fit
Residuals Plot	Check for random distribution of errors

🚀 How to Run This Project

Upload the dataset Insurance.csv to your Google Colab or local environment.

Install required libraries:

pip install pandas numpy seaborn matplotlib scikit-learn

📜 Author atchi.venkataramana94@gmail.com github.com/VenkataRamanaA

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Exercises_on_Linear_Regression.ipynb		Exercises_on_Linear_Regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Insurance Charges Prediction

📘 Project Overview

📂 Dataset Information

📊 Exploratory Data Analysis (EDA)

⚙️ Model Building

📈 Model Results

📊 Visualizations Supporting Insights

🚀 How to Run This Project

About

Uh oh!

Releases

Packages

Languages

VenkataRamanaA/Python-for-Data-Science

Folders and files

Latest commit

History

Repository files navigation

🧾 Insurance Charges Prediction

📘 Project Overview

📂 Dataset Information

📊 Exploratory Data Analysis (EDA)

⚙️ Model Building

📈 Model Results

📊 Visualizations Supporting Insights

🚀 How to Run This Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages