# Insurance Charges Prediction Using Machine Learning

## Project Overview
**Goal**: Predict insurance charges based on client attributes such as age, BMI, smoking status, and other health factors.

**Techniques Used**: Linear Regression, Random Forest, XGBoost

**Dataset**: Insurance dataset with features like age, sex, BMI, children, smoker, region, and charges

**Evaluation Metrics**: Mean Squared Error (MSE) and R² Score

---

## Dataset Information
The dataset contains the following columns:
- **age**: Age of the person
- **sex**: Gender (male/female)
- **bmi**: Body Mass Index
- **children**: Number of children/dependents
- **smoker**: Smoking status (yes/no)
- **region**: Geographic region (northeast, northwest, southeast, southwest)
- **charges**: Insurance charges (target variable)

## 1. Install and Import Required Libraries

First, let's install the necessary packages and import all required libraries.

In [None]:
# Install required packages (uncomment if needed)
# !pip install pandas numpy matplotlib seaborn scikit-learn xgboost

print("Required packages installation complete!")

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

# Set style for plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("All libraries imported successfully!")

## 2. Load and Explore the Dataset

Let's load the insurance dataset and explore its structure, data types, and basic statistics.

In [None]:
# Load the dataset
# Note: Make sure you have downloaded the insurance.csv file from Kaggle
# Dataset URL: https://www.kaggle.com/datasets/thedevastator/prediction-of-insurance-charges-using-age-gender

try:
    df = pd.read_csv('insurance.csv')
    print("✅ Dataset loaded successfully!")
    print(f"Dataset shape: {df.shape}")
except FileNotFoundError:
    print("❌ Dataset file not found. Please download 'insurance.csv' from Kaggle.")
    print("Dataset URL: https://www.kaggle.com/datasets/thedevastator/prediction-of-insurance-charges-using-age-gender")
    # Creating a sample dataset for demonstration
    np.random.seed(42)
    sample_data = {
        'age': np.random.randint(18, 65, 100),
        'sex': np.random.choice(['male', 'female'], 100),
        'bmi': np.random.normal(28, 5, 100),
        'children': np.random.randint(0, 5, 100),
        'smoker': np.random.choice(['yes', 'no'], 100),
        'region': np.random.choice(['northeast', 'northwest', 'southeast', 'southwest'], 100),
        'charges': np.random.normal(13000, 5000, 100)
    }
    df = pd.DataFrame(sample_data)
    print("📝 Using sample dataset for demonstration purposes.")

In [None]:
# Display basic information about the dataset
print("📊 Dataset Overview:")
print("=" * 50)
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print("\n🔍 First 5 rows:")
print(df.head())

print("\n📋 Dataset Info:")
print(df.info())

print("\n📈 Summary Statistics:")
print(df.describe())

print("\n🔍 Data Types:")
print(df.dtypes)

print("\n❓ Missing Values:")
print(df.isnull().sum())

print("\n🏷️ Unique Values in Categorical Columns:")
categorical_cols = ['sex', 'smoker', 'region']
for col in categorical_cols:
    if col in df.columns:
        print(f"{col}: {df[col].unique()}")

## 3. Visualize Data Distributions and Relationships

Let's create various plots to understand the data distribution and relationships between variables.

In [None]:
# Create comprehensive visualizations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Insurance Dataset - Data Distribution and Relationships', fontsize=16, fontweight='bold')

# 1. Age distribution
axes[0, 0].hist(df['age'], bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[0, 0].set_title('Age Distribution')
axes[0, 0].set_xlabel('Age')
axes[0, 0].set_ylabel('Frequency')

# 2. BMI distribution
axes[0, 1].hist(df['bmi'], bins=20, alpha=0.7, color='lightgreen', edgecolor='black')
axes[0, 1].set_title('BMI Distribution')
axes[0, 1].set_xlabel('BMI')
axes[0, 1].set_ylabel('Frequency')

# 3. Charges distribution
axes[0, 2].hist(df['charges'], bins=20, alpha=0.7, color='salmon', edgecolor='black')
axes[0, 2].set_title('Charges Distribution')
axes[0, 2].set_xlabel('Charges ($)')
axes[0, 2].set_ylabel('Frequency')

# 4. Smoker vs Charges
sns.boxplot(x='smoker', y='charges', data=df, ax=axes[1, 0])
axes[1, 0].set_title('Smoker vs Charges')
axes[1, 0].set_xlabel('Smoker')
axes[1, 0].set_ylabel('Charges ($)')

# 5. Region vs Charges
sns.boxplot(x='region', y='charges', data=df, ax=axes[1, 1])
axes[1, 1].set_title('Region vs Charges')
axes[1, 1].set_xlabel('Region')
axes[1, 1].set_ylabel('Charges ($)')
axes[1, 1].tick_params(axis='x', rotation=45)

# 6. Age vs Charges
axes[1, 2].scatter(df['age'], df['charges'], alpha=0.6, color='purple')
axes[1, 2].set_title('Age vs Charges')
axes[1, 2].set_xlabel('Age')
axes[1, 2].set_ylabel('Charges ($)')

plt.tight_layout()
plt.show()

# Additional insights
print("📊 Key Insights from Visualizations:")
print("=" * 50)
print(f"• Age range: {df['age'].min()} - {df['age'].max()} years")
print(f"• BMI range: {df['bmi'].min():.1f} - {df['bmi'].max():.1f}")
print(f"• Charges range: ${df['charges'].min():.2f} - ${df['charges'].max():.2f}")
print(f"• Average charges: ${df['charges'].mean():.2f}")
print(f"• Smokers: {(df['smoker'] == 'yes').sum()} ({(df['smoker'] == 'yes').mean()*100:.1f}%)")
print(f"• Non-smokers: {(df['smoker'] == 'no').sum()} ({(df['smoker'] == 'no').mean()*100:.1f}%)")

## 4. Preprocess Data (Encoding, Correlation)

Now let's preprocess the data by encoding categorical variables and examining correlations.

In [None]:
# Handle categorical variables using one-hot encoding
print("🔄 Processing categorical variables...")

# Create a copy of the dataframe
df_encoded = df.copy()

# Apply one-hot encoding to categorical columns
categorical_columns = ['sex', 'smoker', 'region']
df_encoded = pd.get_dummies(df_encoded, columns=categorical_columns, drop_first=True)

print(f"Original shape: {df.shape}")
print(f"Encoded shape: {df_encoded.shape}")
print(f"New columns after encoding: {list(df_encoded.columns)}")

# Display the first few rows of encoded data
print("\n🔍 First 5 rows of encoded data:")
print(df_encoded.head())

# Check for any missing values after encoding
print(f"\n❓ Missing values after encoding: {df_encoded.isnull().sum().sum()}")

In [None]:
# Create correlation matrix and heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = df_encoded.corr()

# Create heatmap
sns.heatmap(correlation_matrix, 
            annot=True, 
            cmap='coolwarm', 
            center=0,
            square=True,
            fmt='.2f',
            cbar_kws={'label': 'Correlation Coefficient'})

plt.title('Correlation Matrix - Insurance Dataset Features', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Find features most correlated with charges
print("🔍 Features most correlated with charges:")
print("=" * 50)
charges_correlation = correlation_matrix['charges'].abs().sort_values(ascending=False)
for feature, corr in charges_correlation.items():
    if feature != 'charges':
        print(f"{feature}: {corr:.3f}")

# Identify strongly correlated features
print(f"\n🔥 Features with correlation > 0.3 with charges:")
strong_corr = charges_correlation[charges_correlation > 0.3]
strong_corr = strong_corr[strong_corr.index != 'charges']
print(list(strong_corr.index))

## 5. Prepare Features and Target Variables

Let's separate our features and target variable, then split the data into training and testing sets.

In [None]:
# Separate features and target variable
print("🎯 Preparing features and target variable...")

# Features (X) - all columns except charges
X = df_encoded.drop('charges', axis=1)

# Target variable (y) - charges column
y = df_encoded['charges']

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Feature columns: {list(X.columns)}")

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=None  # Since this is regression, not classification
)

print(f"\n📊 Data Split Summary:")
print(f"Training set size: {X_train.shape[0]} samples")
print(f"Test set size: {X_test.shape[0]} samples")
print(f"Training set percentage: {(X_train.shape[0] / len(X)) * 100:.1f}%")
print(f"Test set percentage: {(X_test.shape[0] / len(X)) * 100:.1f}%")

# Display basic statistics of the split
print(f"\n📈 Target Variable Statistics:")
print(f"Training set - Mean: ${y_train.mean():.2f}, Std: ${y_train.std():.2f}")
print(f"Test set - Mean: ${y_test.mean():.2f}, Std: ${y_test.std():.2f}")

## 6. Train and Evaluate Linear Regression Model

Let's start with a Linear Regression model as our baseline.

In [None]:
# Train Linear Regression Model
print("🤖 Training Linear Regression Model...")

# Initialize the model
lr_model = LinearRegression()

# Train the model
lr_model.fit(X_train, y_train)

# Make predictions
lr_pred_train = lr_model.predict(X_train)
lr_pred_test = lr_model.predict(X_test)

# Calculate evaluation metrics
lr_mse_train = mean_squared_error(y_train, lr_pred_train)
lr_mse_test = mean_squared_error(y_test, lr_pred_test)
lr_r2_train = r2_score(y_train, lr_pred_train)
lr_r2_test = r2_score(y_test, lr_pred_test)

print("✅ Linear Regression Model Results:")
print("=" * 50)
print(f"Training MSE: {lr_mse_train:.2f}")
print(f"Test MSE: {lr_mse_test:.2f}")
print(f"Training R² Score: {lr_r2_train:.4f}")
print(f"Test R² Score: {lr_r2_test:.4f}")
print(f"Training RMSE: {np.sqrt(lr_mse_train):.2f}")
print(f"Test RMSE: {np.sqrt(lr_mse_test):.2f}")

# Display feature coefficients
print(f"\n🔍 Feature Coefficients:")
feature_importance_lr = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': lr_model.coef_
}).sort_values('Coefficient', key=abs, ascending=False)

print(feature_importance_lr)

# Check for overfitting
if lr_r2_train - lr_r2_test > 0.1:
    print("⚠️  Potential overfitting detected!")
else:
    print("✅ No significant overfitting detected.")

## 7. Train and Evaluate Random Forest Model

Now let's try a Random Forest model, which can capture non-linear relationships.

In [None]:
# Train Random Forest Model
print("🌳 Training Random Forest Model...")

# Initialize the model
rf_model = RandomForestRegressor(
    n_estimators=100,
    random_state=42,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2
)

# Train the model
rf_model.fit(X_train, y_train)

# Make predictions
rf_pred_train = rf_model.predict(X_train)
rf_pred_test = rf_model.predict(X_test)

# Calculate evaluation metrics
rf_mse_train = mean_squared_error(y_train, rf_pred_train)
rf_mse_test = mean_squared_error(y_test, rf_pred_test)
rf_r2_train = r2_score(y_train, rf_pred_train)
rf_r2_test = r2_score(y_test, rf_pred_test)

print("✅ Random Forest Model Results:")
print("=" * 50)
print(f"Training MSE: {rf_mse_train:.2f}")
print(f"Test MSE: {rf_mse_test:.2f}")
print(f"Training R² Score: {rf_r2_train:.4f}")
print(f"Test R² Score: {rf_r2_test:.4f}")
print(f"Training RMSE: {np.sqrt(rf_mse_train):.2f}")
print(f"Test RMSE: {np.sqrt(rf_mse_test):.2f}")

# Feature importance
feature_importance_rf = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print(f"\n🔍 Feature Importance (Random Forest):")
print(feature_importance_rf)

# Check for overfitting
if rf_r2_train - rf_r2_test > 0.1:
    print("⚠️  Potential overfitting detected!")
else:
    print("✅ No significant overfitting detected.")

In [None]:
# Visualize Feature Importance
plt.figure(figsize=(10, 6))
sns.barplot(data=feature_importance_rf, x='Importance', y='Feature', palette='viridis')
plt.title('Feature Importance - Random Forest Model', fontsize=14, fontweight='bold')
plt.xlabel('Importance Score')
plt.ylabel('Features')
plt.tight_layout()
plt.show()

# Print top 5 most important features
print("🏆 Top 5 Most Important Features:")
print("=" * 40)
for i, (idx, row) in enumerate(feature_importance_rf.head().iterrows()):
    print(f"{i+1}. {row['Feature']}: {row['Importance']:.4f}")

## 8. Train and Evaluate XGBoost Model

Finally, let's try XGBoost, which is known for its excellent performance in many machine learning competitions.

In [None]:
# Train XGBoost Model
print("🚀 Training XGBoost Model...")

# Initialize the model
xgb_model = xgb.XGBRegressor(
    n_estimators=100,
    random_state=42,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8
)

# Train the model
xgb_model.fit(X_train, y_train)

# Make predictions
xgb_pred_train = xgb_model.predict(X_train)
xgb_pred_test = xgb_model.predict(X_test)

# Calculate evaluation metrics
xgb_mse_train = mean_squared_error(y_train, xgb_pred_train)
xgb_mse_test = mean_squared_error(y_test, xgb_pred_test)
xgb_r2_train = r2_score(y_train, xgb_pred_train)
xgb_r2_test = r2_score(y_test, xgb_pred_test)

print("✅ XGBoost Model Results:")
print("=" * 50)
print(f"Training MSE: {xgb_mse_train:.2f}")
print(f"Test MSE: {xgb_mse_test:.2f}")
print(f"Training R² Score: {xgb_r2_train:.4f}")
print(f"Test R² Score: {xgb_r2_test:.4f}")
print(f"Training RMSE: {np.sqrt(xgb_mse_train):.2f}")
print(f"Test RMSE: {np.sqrt(xgb_mse_test):.2f}")

# Feature importance
feature_importance_xgb = pd.DataFrame({
    'Feature': X.columns,
    'Importance': xgb_model.feature_importances_
}).sort_values('Importance', ascending=False)

print(f"\n🔍 Feature Importance (XGBoost):")
print(feature_importance_xgb)

# Check for overfitting
if xgb_r2_train - xgb_r2_test > 0.1:
    print("⚠️  Potential overfitting detected!")
else:
    print("✅ No significant overfitting detected.")

## 9. Compare Model Performance and Visualize Predictions

Let's compare all three models and visualize their predictions against actual values.

In [None]:
# Compare all models
print("🏆 Model Performance Comparison:")
print("=" * 60)

# Create comparison dataframe
results = pd.DataFrame({
    'Model': ['Linear Regression', 'Random Forest', 'XGBoost'],
    'Train_MSE': [lr_mse_train, rf_mse_train, xgb_mse_train],
    'Test_MSE': [lr_mse_test, rf_mse_test, xgb_mse_test],
    'Train_R²': [lr_r2_train, rf_r2_train, xgb_r2_train],
    'Test_R²': [lr_r2_test, rf_r2_test, xgb_r2_test],
    'Train_RMSE': [np.sqrt(lr_mse_train), np.sqrt(rf_mse_train), np.sqrt(xgb_mse_train)],
    'Test_RMSE': [np.sqrt(lr_mse_test), np.sqrt(rf_mse_test), np.sqrt(xgb_mse_test)]
})

print(results.round(4))

# Find the best model based on test R² score
best_model_idx = results['Test_R²'].idxmax()
best_model_name = results.loc[best_model_idx, 'Model']
best_r2_score = results.loc[best_model_idx, 'Test_R²']

print(f"\n🥇 Best Model: {best_model_name}")
print(f"Best Test R² Score: {best_r2_score:.4f}")

# Calculate improvement over baseline
baseline_r2 = results.loc[0, 'Test_R²']  # Linear Regression as baseline
if best_model_idx != 0:
    improvement = ((best_r2_score - baseline_r2) / baseline_r2) * 100
    print(f"Improvement over Linear Regression: {improvement:.2f}%")

In [None]:
# Visualize predictions vs actual values
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
fig.suptitle('Model Predictions vs Actual Values', fontsize=16, fontweight='bold')

# Linear Regression
axes[0].scatter(y_test, lr_pred_test, alpha=0.6, color='blue')
axes[0].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
axes[0].set_xlabel('Actual Charges ($)')
axes[0].set_ylabel('Predicted Charges ($)')
axes[0].set_title(f'Linear Regression\n(R² = {lr_r2_test:.4f})')
axes[0].grid(True, alpha=0.3)

# Random Forest
axes[1].scatter(y_test, rf_pred_test, alpha=0.6, color='green')
axes[1].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
axes[1].set_xlabel('Actual Charges ($)')
axes[1].set_ylabel('Predicted Charges ($)')
axes[1].set_title(f'Random Forest\n(R² = {rf_r2_test:.4f})')
axes[1].grid(True, alpha=0.3)

# XGBoost
axes[2].scatter(y_test, xgb_pred_test, alpha=0.6, color='orange')
axes[2].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
axes[2].set_xlabel('Actual Charges ($)')
axes[2].set_ylabel('Predicted Charges ($)')
axes[2].set_title(f'XGBoost\n(R² = {xgb_r2_test:.4f})')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Create residual plots
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
fig.suptitle('Residual Plots (Actual - Predicted)', fontsize=16, fontweight='bold')

# Linear Regression residuals
lr_residuals = y_test - lr_pred_test
axes[0].scatter(lr_pred_test, lr_residuals, alpha=0.6, color='blue')
axes[0].axhline(y=0, color='red', linestyle='--')
axes[0].set_xlabel('Predicted Charges ($)')
axes[0].set_ylabel('Residuals ($)')
axes[0].set_title('Linear Regression Residuals')
axes[0].grid(True, alpha=0.3)

# Random Forest residuals
rf_residuals = y_test - rf_pred_test
axes[1].scatter(rf_pred_test, rf_residuals, alpha=0.6, color='green')
axes[1].axhline(y=0, color='red', linestyle='--')
axes[1].set_xlabel('Predicted Charges ($)')
axes[1].set_ylabel('Residuals ($)')
axes[1].set_title('Random Forest Residuals')
axes[1].grid(True, alpha=0.3)

# XGBoost residuals
xgb_residuals = y_test - xgb_pred_test
axes[2].scatter(xgb_pred_test, xgb_residuals, alpha=0.6, color='orange')
axes[2].axhline(y=0, color='red', linestyle='--')
axes[2].set_xlabel('Predicted Charges ($)')
axes[2].set_ylabel('Residuals ($)')
axes[2].set_title('XGBoost Residuals')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 10. Save the Best Model and Feature Columns

Let's save the best performing model and feature columns for future use and deployment.

In [None]:
# Save the best model and feature columns
import pickle
import os

# Determine the best model
models = {
    'Linear Regression': (lr_model, lr_r2_test),
    'Random Forest': (rf_model, rf_r2_test),
    'XGBoost': (xgb_model, xgb_r2_test)
}

best_model_name = max(models, key=lambda x: models[x][1])
best_model = models[best_model_name][0]
best_score = models[best_model_name][1]

print(f"💾 Saving the best model: {best_model_name}")
print(f"Best model R² score: {best_score:.4f}")

# Create models directory if it doesn't exist
if not os.path.exists('models'):
    os.makedirs('models')

# Save the best model
with open('models/best_insurance_model.pkl', 'wb') as f:
    pickle.dump(best_model, f)

# Save feature columns
with open('models/feature_columns.pkl', 'wb') as f:
    pickle.dump(X.columns.tolist(), f)

# Save model metadata
model_info = {
    'model_name': best_model_name,
    'r2_score': best_score,
    'mse_score': models[best_model_name][1],
    'feature_columns': X.columns.tolist(),
    'training_date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
}

with open('models/model_info.pkl', 'wb') as f:
    pickle.dump(model_info, f)

print("✅ Model saved successfully!")
print(f"📁 Files saved in 'models/' directory:")
print("  - best_insurance_model.pkl")
print("  - feature_columns.pkl")
print("  - model_info.pkl")

# Test loading the model
print("\n🔍 Testing model loading...")
try:
    with open('models/best_insurance_model.pkl', 'rb') as f:
        loaded_model = pickle.load(f)
    
    with open('models/feature_columns.pkl', 'rb') as f:
        loaded_features = pickle.load(f)
    
    # Test prediction
    sample_prediction = loaded_model.predict(X_test.iloc[:1])
    print(f"✅ Model loaded successfully!")
    print(f"Sample prediction: ${sample_prediction[0]:.2f}")
    
except Exception as e:
    print(f"❌ Error loading model: {e}")

## 🎉 Project Conclusion

### Summary of Results

This project successfully implemented three machine learning models to predict insurance charges:

1. **Linear Regression** - Baseline model for comparison
2. **Random Forest** - Ensemble method capturing non-linear relationships
3. **XGBoost** - Gradient boosting algorithm

### Key Findings

- **Best Model**: The model with the highest R² score performed best
- **Important Features**: Smoking status, BMI, and age were typically the most important predictors
- **Model Performance**: All models showed reasonable performance in predicting insurance charges

### Next Steps

1. **Hyperparameter Tuning**: Use GridSearchCV or RandomizedSearchCV to optimize model parameters
2. **Feature Engineering**: Create additional features like BMI categories, age groups
3. **Cross-Validation**: Implement k-fold cross-validation for more robust evaluation
4. **Deployment**: Create a web application using Streamlit or Flask
5. **Monitoring**: Set up model monitoring in production

### Deployment Instructions

To deploy this model:
1. Use the saved model files in the `models/` directory
2. Create a web application using Streamlit (see the step-by-step guide)
3. Handle categorical encoding consistently with training data
4. Validate input data before making predictions

**Great job completing this insurance charges prediction project! 🚀**

## 🚀 Streamlit Deployment

Now that we have trained and saved our model, let's deploy it using Streamlit for interactive predictions!

In [None]:
# Streamlit Deployment Instructions
print("🚀 STREAMLIT DEPLOYMENT READY!")
print("=" * 50)

# Check if model files exist
import os
model_files = ['models/best_insurance_model.pkl', 'models/feature_columns.pkl', 'models/model_info.pkl']
all_files_exist = all(os.path.exists(file) for file in model_files)

if all_files_exist:
    print("✅ All model files are ready for deployment!")
    print("\n📁 Required files found:")
    for file in model_files:
        size = os.path.getsize(file) / 1024  # Size in KB
        print(f"  • {file} ({size:.1f} KB)")
    
    print(f"\n🎯 To deploy your model with Streamlit:")
    print("1. Install Streamlit: pip install streamlit")
    print("2. Run the app: streamlit run streamlit_app.py")
    print("3. Or use the batch file: run_app.bat")
    print("\n🌐 Your app will open at: http://localhost:8501")
    
    print(f"\n📊 Model Information:")
    try:
        with open('models/model_info.pkl', 'rb') as f:
            model_info = pickle.load(f)
        print(f"  • Best Model: {model_info['model_name']}")
        print(f"  • R² Score: {model_info['r2_score']:.4f}")
        print(f"  • Training Date: {model_info['training_date']}")
    except:
        print("  • Model info available in deployment")
    
    print(f"\n🎨 Streamlit App Features:")
    print("  • Interactive input forms")
    print("  • Real-time predictions")
    print("  • Risk assessment")
    print("  • Feature importance charts")
    print("  • Modern, responsive design")
    
else:
    print("❌ Some model files are missing!")
    print("Please make sure all previous cells have been executed successfully.")
    
print(f"\n📝 Next Steps:")
print("1. Open terminal/command prompt")
print("2. Navigate to this directory")
print("3. Run: streamlit run streamlit_app.py")
print("4. Enjoy your deployed ML model! 🎉")