# SimpleMLR Tutorial: Machine Learning Made Simple

Welcome to SimpleMLR! This tutorial will show you how to use all three boosting algorithms (XGBoost, LightGBM, and sklearn GBM) with a simple, consistent interface.

## What You'll Learn:
- 🚀 One-line model training and evaluation
- 🎯 Automatic hyperparameter optimization with smart strategies
- 📊 Beautiful visualizations to understand your model's performance
- ⚡ GPU acceleration when available
- 🔧 How to compare different algorithms easily

We'll use the **Tips dataset** from Seaborn to predict tip amounts based on bill features.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

print("📚 Libraries loaded successfully!")

## 1. Load and Explore the Data

We'll use the famous **Tips dataset** - it contains information about restaurant bills and tips. Our goal is to predict the tip amount based on various features.

In [None]:
# Load the tips dataset from seaborn
tips = sns.load_dataset('tips')

print("🍽️ Tips Dataset Overview:")
print(f"Shape: {tips.shape}")
print(f"\nColumns: {list(tips.columns)}")
print(f"\nFirst few rows:")
tips.head()

In [None]:
# Explore the data
print("📊 Dataset Statistics:")
print(tips.describe())

print("\n🔍 Data Types:")
print(tips.dtypes)

print("\n❓ Missing Values:")
print(tips.isnull().sum())

In [None]:
# Visualize the target variable and key relationships
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Tip distribution
axes[0,0].hist(tips['tip'], bins=20, alpha=0.7, color='skyblue')
axes[0,0].set_title('Distribution of Tips')
axes[0,0].set_xlabel('Tip Amount ($)')
axes[0,0].set_ylabel('Frequency')

# Tip vs Total Bill
axes[0,1].scatter(tips['total_bill'], tips['tip'], alpha=0.6, color='coral')
axes[0,1].set_title('Tips vs Total Bill')
axes[0,1].set_xlabel('Total Bill ($)')
axes[0,1].set_ylabel('Tip ($)')

# Tips by day
sns.boxplot(data=tips, x='day', y='tip', ax=axes[1,0])
axes[1,0].set_title('Tips by Day of Week')
axes[1,0].tick_params(axis='x', rotation=45)

# Tips by party size
tips_by_size = tips.groupby('size')['tip'].mean()
axes[1,1].bar(tips_by_size.index, tips_by_size.values, color='lightgreen')
axes[1,1].set_title('Average Tip by Party Size')
axes[1,1].set_xlabel('Party Size')
axes[1,1].set_ylabel('Average Tip ($)')

plt.tight_layout()
plt.show()

## 2. Prepare the Data

Let's split our data into features (X) and target (y), then create train/test splits.

In [None]:
# Prepare features and target
# We'll predict tip amount based on all other features
X = tips.drop('tip', axis=1)  # Features: everything except tip
y = tips['tip']               # Target: tip amount

print("🎯 Target variable (what we're predicting):")
print(f"Tip amount - Mean: ${y.mean():.2f}, Std: ${y.std():.2f}")

print("\n🔧 Features (what we're using to predict):")
print(f"Features: {list(X.columns)}")
print(f"Shape: {X.shape}")

# Create train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, shuffle=True
)

print(f"\n📊 Data Split:")
print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

## 3. Import SimpleMLR and Start Modeling!

Now comes the fun part - let's import SimpleMLR and see how easy it is to build powerful machine learning models!

In [None]:
# Import SimpleMLR - all three algorithms with the same interface!
try:
    from simple_mlr import (
        # Basic regressors - simple fit/predict interface
        XGBRegressor, LGBMRegressor, GBMRegressor,
        
        # Auto-tuners - automatically find best parameters
        xgb_auto, lgbm_auto, gbm_auto,
        
        # Convenience functions
        xgb_regressor, lgbm_regressor, gbm_regressor
    )
    print("🚀 SimpleMLR imported successfully!")
    print("Ready to build amazing models with just a few lines of code!")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Make sure SimpleMLR is installed and in your Python path.")
    print("You can install it with: pip install -e .")

## 4. Method 1: Quick One-Line Models

The fastest way to build models with SimpleMLR - just one line per algorithm!

In [None]:
print("🏃‍♂️ Building Quick Models (One line each!)")
print("=" * 50)

# One-line model building - SimpleMLR handles everything!
print("\n1️⃣ XGBoost Model:")
xgb_model = xgb_regressor(X_train, y_train)

print("\n2️⃣ LightGBM Model:")
lgbm_model = lgbm_regressor(X_train, y_train)

print("\n3️⃣ Sklearn GBM Model:")
gbm_model = gbm_regressor(X_train, y_train)

print("\n✅ All models trained successfully!")

## 5. Instant Model Analysis with quick_graph()

SimpleMLR's `quick_graph()` method gives you instant insights into your model's performance!

In [None]:
print("📊 XGBoost Performance Analysis:")
xgb_model.quick_graph(X_test, y_test)

In [None]:
print("📊 LightGBM Performance Analysis:")
lgbm_model.quick_graph(X_test, y_test)

In [None]:
print("📊 Sklearn GBM Performance Analysis:")
gbm_model.quick_graph(X_test, y_test)

## 6. Method 2: Class-Based Approach with Custom Settings

For more control, you can use the class-based approach and customize parameters:

In [None]:
print("🎛️ Building Models with Custom Settings")
print("=" * 40)

# Create models with custom parameters
print("\n🚀 XGBoost with custom settings:")
xgb_custom = XGBRegressor(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.1,
    random_state=42
)
xgb_custom.fit(X_train, y_train)

print("\n⚡ LightGBM with custom settings:")
lgbm_custom = LGBMRegressor(
    num_boost_round=200,
    num_leaves=31,
    learning_rate=0.1,
    random_state=42
)
lgbm_custom.fit(X_train, y_train)

print("\n🛡️ Sklearn GBM with custom settings:")
gbm_custom = GBMRegressor(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.1,
    random_state=42
)
gbm_custom.fit(X_train, y_train)

print("\n✅ Custom models trained successfully!")

## 7. Method 3: Automatic Hyperparameter Optimization 🎯

This is where SimpleMLR really shines! Automatic hyperparameter optimization with different strategies:

In [None]:
print("🎯 Automatic Hyperparameter Optimization")
print("=" * 45)
print("This will find the best parameters automatically!")
print("\n🚀 XGBoost Auto-Optimization (Fast Strategy):")

# Auto-tuned XGBoost with 'fast' strategy for quick results
xgb_optimized = xgb_auto(
    X_train, y_train,
    strategy='fast',        # Quick optimization
    n_trials=25,           # Number of parameter combinations to try
    verbose=1              # Show progress
)

print("\n⚡ XGBoost optimization completed!")

In [None]:
print("⚡ LightGBM Auto-Optimization (Fast Strategy):")

# Auto-tuned LightGBM
lgbm_optimized = lgbm_auto(
    X_train, y_train,
    strategy='fast',
    n_trials=25,
    verbose=1
)

print("\n⚡ LightGBM optimization completed!")

In [None]:
print("🛡️ Sklearn GBM Auto-Optimization (Stable Strategy):")

# Auto-tuned GBM with 'stable' strategy for reliability
gbm_optimized = gbm_auto(
    X_train, y_train,
    strategy='stable',      # Stable, reliable parameters
    n_trials=25,
    verbose=1
)

print("\n🛡️ Sklearn GBM optimization completed!")

## 8. Comprehensive Model Analysis

Let's use SimpleMLR's advanced `plot_analysis()` method for detailed model insights:

In [None]:
print("📈 Comprehensive XGBoost Analysis:")
_, xgb_metrics = xgb_optimized.plot_analysis(
    X_test, y_test,
    title="XGBoost Optimized Model - Tips Prediction",
    style='modern'
)

print(f"\n📊 XGBoost Detailed Metrics:")
for metric, value in xgb_metrics.items():
    if isinstance(value, (int, float)):
        print(f"  {metric}: {value:.4f}")
    else:
        print(f"  {metric}: {value}")

In [None]:
print("📈 Comprehensive LightGBM Analysis:")
_, lgbm_metrics = lgbm_optimized.plot_analysis(
    X_test, y_test,
    title="LightGBM Optimized Model - Tips Prediction",
    style='modern'
)

In [None]:
print("📈 Comprehensive Sklearn GBM Analysis:")
_, gbm_metrics = gbm_optimized.plot_analysis(
    X_test, y_test,
    title="Sklearn GBM Optimized Model - Tips Prediction",
    style='modern'
)

## 9. Model Comparison and Performance Summary

Let's compare all our models to see which performs best:

In [None]:
# Get predictions from all optimized models
models = {
    'XGBoost (Optimized)': xgb_optimized,
    'LightGBM (Optimized)': lgbm_optimized,
    'Sklearn GBM (Optimized)': gbm_optimized
}

results = []

for name, model in models.items():
    # Get predictions
    y_pred = model.predict(X_test)
    
    # Calculate metrics
    from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
    r2 = r2_score(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae = mean_absolute_error(y_test, y_pred)
    
    results.append({
        'Model': name,
        'R² Score': r2,
        'RMSE': rmse,
        'MAE': mae
    })

# Create comparison DataFrame
comparison_df = pd.DataFrame(results)
comparison_df = comparison_df.round(4)

print("🏆 Model Performance Comparison:")
print("=" * 40)
print(comparison_df.to_string(index=False))

# Find best model
best_model_idx = comparison_df['R² Score'].idxmax()
best_model_name = comparison_df.loc[best_model_idx, 'Model']
best_r2 = comparison_df.loc[best_model_idx, 'R² Score']

print(f"\n🥇 Best Model: {best_model_name}")
print(f"   R² Score: {best_r2:.4f}")
print(f"   (Higher R² is better - closer to 1.0 means better predictions)")

## 10. Feature Importance Analysis

Let's see which features are most important for predicting tips:

In [None]:
# Get feature importance from the best model
best_model = models[best_model_name]

if hasattr(best_model, 'get_feature_importance'):
    print(f"🔍 Feature Importance Analysis - {best_model_name}:")
    print("=" * 50)
    
    feature_importance = best_model.get_feature_importance()
    print(feature_importance)
    
    # Create a feature importance plot
    plt.figure(figsize=(10, 6))
    plt.barh(feature_importance['feature'], feature_importance['importance'], color='skyblue')
    plt.xlabel('Importance Score')
    plt.title(f'Feature Importance - {best_model_name}')
    plt.gca().invert_yaxis()
    
    # Add value labels on bars
    for i, v in enumerate(feature_importance['importance']):
        plt.text(v + 0.01, i, f'{v:.3f}', va='center')
    
    plt.tight_layout()
    plt.show()
    
    print("\n💡 Insights:")
    top_feature = feature_importance.iloc[0]['feature']
    print(f"   Most important feature: {top_feature}")
    print(f"   This makes sense - {top_feature} likely has strong correlation with tip amount!")
else:
    print("Feature importance not available for this model.")

## 11. Making Predictions on New Data

Let's see how to use our trained model to make predictions on new restaurant visits:

In [None]:
# Create some example new data points
new_data = pd.DataFrame({
    'total_bill': [25.50, 45.20, 15.75, 62.30],
    'sex': ['Male', 'Female', 'Male', 'Female'],
    'smoker': ['No', 'Yes', 'No', 'No'],
    'day': ['Sat', 'Sun', 'Fri', 'Sat'],
    'time': ['Dinner', 'Dinner', 'Lunch', 'Dinner'],
    'size': [2, 4, 1, 3]
})

print("🍽️ New Restaurant Visits to Predict:")
print(new_data)

# Make predictions
predictions = best_model.predict(new_data)

print(f"\n💰 Predicted Tips using {best_model_name}:")
print("=" * 40)
for i, (_, row) in enumerate(new_data.iterrows()):
    tip_pred = predictions[i]
    tip_percent = (tip_pred / row['total_bill']) * 100
    print(f"Visit {i+1}: Bill=${row['total_bill']:.2f} → Predicted Tip=${tip_pred:.2f} ({tip_percent:.1f}%)")

print(f"\n📊 Summary:")
print(f"Average predicted tip: ${np.mean(predictions):.2f}")
print(f"Range: ${np.min(predictions):.2f} - ${np.max(predictions):.2f}")

## 12. Advanced Features and Tips

Here are some advanced SimpleMLR features you can explore:

In [None]:
print("🚀 Advanced SimpleMLR Features:")
print("=" * 35)

print("\n1️⃣ GPU Acceleration (if available):")
print("   xgb_auto(X, y, use_gpu=True)")
print("   lgbm_auto(X, y, use_gpu=True)")

print("\n2️⃣ Different Optimization Strategies:")
print("   - 'fast': Quick optimization for experimentation")
print("   - 'stable': Reliable settings for production")
print("   - 'aggressive': Maximum performance, more exploration")
print("   - 'balanced': Good all-around choice")

print("\n3️⃣ Parameter Override Examples:")
print("   xgb_auto(X, y, strategy='fast', override_params={")
print("       'max_depth': 6,              # Fix max_depth to 6")
print("       'learning_rate': (0.05, 0.2) # Custom range")
print("   })")

print("\n4️⃣ Validation Split (faster than cross-validation):")
print("   xgb_auto(X, y, validation_split=0.2)")

print("\n5️⃣ More Trials for Better Results:")
print("   xgb_auto(X, y, n_trials=100)  # More exploration")

print("\n6️⃣ Save Plots:")
print("   model.plot_analysis(X, y, save_path='analysis.png')")

## 🎉 Congratulations!

You've completed the SimpleMLR tutorial! Here's what you've learned:

### ✅ What You Accomplished:
1. **Data Loading & Exploration** - Used Seaborn's tips dataset
2. **Quick Model Building** - One-line models with three algorithms
3. **Automatic Optimization** - Found best parameters automatically
4. **Beautiful Visualizations** - Comprehensive model analysis
5. **Model Comparison** - Compared XGBoost, LightGBM, and sklearn GBM
6. **Feature Importance** - Understood which features matter most
7. **Real Predictions** - Made predictions on new data

### 🚀 Next Steps:
- Try SimpleMLR on your own datasets
- Experiment with different strategies (`'aggressive'`, `'competition'`)
- Use GPU acceleration for larger datasets
- Explore parameter overrides for fine-tuning
- Save your best models for production use

### 📚 Key Takeaways:
- **SimpleMLR makes machine learning accessible** - complex algorithms with simple interfaces
- **Consistent API across algorithms** - learn once, use everywhere
- **Automatic optimization saves time** - no need to manually tune parameters
- **Beautiful visualizations** - understand your models instantly
- **Production ready** - from prototype to deployment seamlessly

Happy modeling! 🎯✨

In [None]:
# Final summary
print("🎯 SimpleMLR Tutorial Complete! 🎯")
print("=" * 35)
print(f"📊 Dataset: {tips.shape[0]} restaurant visits analyzed")
print(f"🤖 Models trained: 3 different algorithms")
print(f"🏆 Best model: {best_model_name} (R² = {best_r2:.4f})")
print(f"⚡ Total predictions made: {len(predictions)} new visits")
print("")
print("🚀 Ready to use SimpleMLR on your own data!")
print("📖 Check the documentation for more advanced features")
print("💡 Happy machine learning!")