## Evaluating Regression Model Performance

## üìò Introduction
This notebook provides a comprehensive overview of how to evaluate regression models using common metrics, and offers guidance on how to select the best model depending on your dataset and problem.

### Models already covered:
 - Simple Linear Regression
 - Multiple Linear Regression with OLS Backward Elimination
 - Polynomial Regression
 - Support Vector Regression (SVR)
 - Decision Tree Regression
 - Random Forest Regression
 - XGBoost Regression

In [None]:
# üìä Step 1: Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score

In [None]:
# üìÅ Step 2: Load the Dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values.reshape(-1, 1)

In [None]:
# Store models and their predictions
models = {}
results = {}

In [None]:
# üìê Step 3: Define Evaluation Function
def evaluate_model(name, y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_true, y_pred)
    results[name] = {'MAE': mae, 'MSE': mse, 'RMSE': rmse, 'R2': r2}

In [None]:
# 1Ô∏è‚É£ Simple Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
y_pred = lin_reg.predict(X)
evaluate_model("Linear Regression", y, y_pred)

# 2Ô∏è‚É£ Polynomial Regression (Degree 4)
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly, y)
y_pred = lin_reg2.predict(X_poly)
evaluate_model("Polynomial Regression", y, y_pred)

# 3Ô∏è‚É£ Support Vector Regression (SVR)
sc_X = StandardScaler()
sc_y = StandardScaler()
X_scaled = sc_X.fit_transform(X)
y_scaled = sc_y.fit_transform(y).flatten()
svr = SVR(kernel='rbf')
svr.fit(X_scaled, y_scaled)
y_pred = sc_y.inverse_transform(svr.predict(X_scaled).reshape(-1, 1))
evaluate_model("SVR", y, y_pred)

# 4Ô∏è‚É£ Decision Tree Regression
dtr = DecisionTreeRegressor(random_state=0)
dtr.fit(X, y)
y_pred = dtr.predict(X).reshape(-1, 1)
evaluate_model("Decision Tree", y, y_pred)

# 5Ô∏è‚É£ Random Forest Regression
rfr = RandomForestRegressor(n_estimators=300, random_state=0)
rfr.fit(X, y.ravel())
y_pred = rfr.predict(X).reshape(-1, 1)
evaluate_model("Random Forest", y, y_pred)

# 6Ô∏è‚É£ XGBoost Regression
xgb = XGBRegressor(n_estimators=300, learning_rate=0.1, random_state=0)
xgb.fit(X, y)
y_pred = xgb.predict(X).reshape(-1, 1)
evaluate_model("XGBoost", y, y_pred)

In [None]:
# üìà Step 4: Display the Evaluation Results
results_df = pd.DataFrame(results).T
print("\nEvaluation Metrics Summary:")
print(results_df)

In [None]:
# üìä Step 5: Visual Comparison
results_df.plot(kind='bar', figsize=(12, 6), title="Model Evaluation Comparison")
plt.ylabel("Error Metrics")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# üéØ Step 6: k-Fold Cross Validation Example (on Random Forest)
scores = cross_val_score(estimator=rfr, X=X, y=y.ravel(), cv=10, scoring='r2')
print(f"\nRandom Forest 10-Fold CV R2 Mean: {scores.mean():.4f}, Std: {scores.std():.4f}")


In [None]:
# üìò Model Selection Guide:
# - Use Simple/Multiple Linear Regression for problems with linear trends.
# - Use Polynomial Regression for curved patterns.
# - Use SVR if you suspect non-linear boundaries and want regularization.
# - Use Decision Tree when the data has complex splits or abrupt changes.
# - Use Random Forest to reduce overfitting of Decision Trees and get better generalization.
# - Use XGBoost for high-performance, scalable predictions especially in competitions or large data.

# üîß Future Steps: Hyperparameter Tuning
# Covered in Part 10 - Model Selection (GridSearchCV, RandomizedSearchCV)

# üìå Note: Consider model interpretability, computational cost, and overfitting risks when choosing.