# Insurance Charges Prediction Using Machine Learning - Google Colab Version

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/)

## Project Overview
**Goal**: Predict insurance charges based on client attributes such as age, BMI, smoking status, and other health factors.

**Techniques Used**: Linear Regression, Random Forest, XGBoost

**Dataset**: Insurance dataset with features like age, sex, BMI, children, smoker, region, and charges

**Evaluation Metrics**: Mean Squared Error (MSE) and R² Score

---

## 🚀 Google Colab Benefits
- ✅ Pre-installed ML packages
- ✅ Free GPU/TPU access
- ✅ No local setup required
- ✅ Easy sharing and collaboration

## 1. Setup and Install Packages for Google Colab

In [None]:
# Install XGBoost (only package not pre-installed in Colab)
!pip install xgboost

print("✅ XGBoost installation complete!")

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb
import warnings
import pickle
import os
from google.colab import files  # For file upload/download

warnings.filterwarnings('ignore')

# Set style for plots
plt.style.use('default')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")
print("📊 Ready for machine learning!")

## 2. Data Loading and Setup

You can either upload your own insurance.csv file or use the sample data generated below.

In [None]:
# Create models directory for saving trained models
if not os.path.exists('models'):
    os.makedirs('models')
    print("📁 Created 'models' directory")

print("📂 To upload your own dataset:")
print("   1. Click the folder icon on the left sidebar")
print("   2. Click 'Upload to session storage'")
print("   3. Select your insurance.csv file")
print("\n📝 Or we'll use sample data for demonstration")

In [None]:
# Load the dataset (with fallback to sample data)
try:
    df = pd.read_csv('insurance.csv')
    print("✅ Dataset loaded successfully from uploaded file!")
    print(f"Dataset shape: {df.shape}")
except FileNotFoundError:
    print("📝 Creating sample dataset for demonstration...")
    # Creating a more realistic sample dataset
    np.random.seed(42)
    n_samples = 1000
    
    # Generate realistic data
    age = np.random.randint(18, 65, n_samples)
    sex = np.random.choice(['male', 'female'], n_samples)
    bmi = np.random.normal(28, 5, n_samples)
    bmi = np.clip(bmi, 15, 50)  # Clip to reasonable range
    children = np.random.poisson(1, n_samples)  # More realistic distribution
    children = np.clip(children, 0, 5)
    smoker = np.random.choice(['yes', 'no'], n_samples, p=[0.2, 0.8])  # 20% smokers
    region = np.random.choice(['northeast', 'northwest', 'southeast', 'southwest'], n_samples)
    
    # Generate charges with realistic relationships
    base_charge = 3000 + age * 50 + (bmi - 25) * 100 + children * 500
    smoker_multiplier = np.where(smoker == 'yes', 2.5, 1.0)
    charges = base_charge * smoker_multiplier + np.random.normal(0, 2000, n_samples)
    charges = np.clip(charges, 1000, 50000)  # Reasonable range
    
    sample_data = {
        'age': age,
        'sex': sex,
        'bmi': bmi,
        'children': children,
        'smoker': smoker,
        'region': region,
        'charges': charges
    }
    df = pd.DataFrame(sample_data)
    print("✅ Sample dataset created successfully!")
    print(f"Dataset shape: {df.shape}")

print("\n🎯 Dataset ready for analysis!")