1. Problem identification 

2. Data wrangling

3. Exploratory data analysis

4. Prep-processing and training data development

5. **Modeling (Machine learning steps)**

6. Documentation

<div class="span5 alert alert-warning">
<h3>Predictive Models</h3>

- Regressors are used to predict continous numerical targets. 

### <font color='mediumorchid'><b>Linear Regression</b></font> 
**Supervised Learning**

Models the relationship between a dependent variable (target) and one or more independent variables (features) using a straight-line equation. The goal is to find the best-fit line that minimizes the difference between predicted and actual values.

**Math Behind It** 

1️⃣ Calculate the Means (average) of 𝑋 and 𝑌

The mean of the X values is  $\hat{X}$


 and mean of the Y values is $\hat{Y}$

2️⃣Apply the coefficient formula (Slope of best fit line) to find the coeficient.

$
\hat{\beta_1} = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
$$




3️⃣Compute the Intercept. The intercept represents the predicted value of 𝑌 when 𝑋 = 0
It is calculated using:
$$
\hat{\beta_0} = \bar{Y} - \hat{\beta_1} \bar{X}
$$



4️⃣Build the Prediction Equation and Make Predictions. $\hat{Y}$ is the predictions.


$$
\hat{Y} = \hat{\beta_0} + \hat{\beta_1} X
$$





**<span style="background-color: lightblue;">Model Code</span>**

```python
from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

# Step 1: Prepare data
# Independent variables (predictors), cannot include target variable

X = df.drop("target_variable", axis=1).values  

y = df["target_variable"].values 

# Step 2: Split data into training & testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the linear regression model

reg = LinearRegression()

reg.fit(X_train, y_train)

# Step 4: Make predictions

y_pred = reg.predict(X_test)

# Step 5: Display sample predictions

print("Predictions: {}, Actual Values: {}".format(y_pred[:2], y_test[:2]))
```

 
**<span style="background-color: lightblue;">Visualize the Model</span>**



```python
import seaborn as sns

import matplotlib.pyplot as plt

# General Template for Regression Plot

sns.regplot(x=df["predictor_variable"],  # Independent variable

            y=df["dependent_variable"],  # Target variable
            color="purple",              # Customize color
            scatter_kws={"alpha": 0.2})  # Transparency for scatter points
            

# Show the plot

plt.title("Regression Plot: Predictor vs. Target")

plt.xlabel("Predictor Variable")

plt.ylabel("Dependent Variable")

plt.show()

```

**<span style="background-color: lightblue;">LinearRegression() Model Parameters</span>**

✅ fit_intercept → If True, the model calculates an intercept (default). If False, it assumes the data has already been scaled or standardized, meaning features have been adjusted to a common scale.

✅ copy_X → If True, it makes a copy of the input data to avoid modifying it. If False, it may overwrite the original data. 

✅ n_jobs → Defines the number of CPU cores used for computation. None means it uses one core, -1 uses all available cores. 

✅ positive → If True, forces all coefficients to be positive (useful for certain applications). Default is False.


**<span style="background-color: lightblue;">Scenerios To Use this Model In </span>**

**<span style="color: skyblue;">Scenario 1: Real Estate Pricing</span>**


- Company: A property tech startup Problem: “Can you predict housing prices based on location, square footage, and number of bedrooms?”
        
**Why Linear Regression:**
        
        - Price is a continuous variable
        
        - Relationships are often linear or near-linear
        
        - Easy to interpret for stakeholders (e.g., “Each extra bedroom adds $X to the price”)

**Metrics**


- RMSE: This isn’t just a number—it’s your way of saying, “On average, our price predictions are off by $X.” You will want a **low RMSE**
  
- R squared: House prices are influenced by a lot of factors, this metric allows you to to justify why you selected the features you picked to make the predictions. "How much of the price can we explain with just the features selected?" meaning youd want a **high R 2**

- MAE: This will tell you on average how much your housing prices predictions where off by. In real estate MAE is useful to communicate the error in dollar term. 


**<span style="color: skyblue;">Scenario 2: Inventory Forecasting</span>**


- Company: A retail chain Problem: “How many units of product X will we sell next month based on advertising spend, seasonality, and historical sales?”

**Why Linear Regression:**

        - Predicting future sales volume
        
        - Can quantify impact of each feature (e.g., ad spend vs. seasonality)
        
        - Useful for budget allocation and supply chain planning


**Metrics**

- RMSE: On average our product amount prediction needed fir next month was off by X units" We are looking for a low RMSE.

- R squared: Sales are influenced by many features, this metric lets you justify why you chose the features as your predictors. "How much of next onths sales can we explain with these inputs". Looking for a high R 2.

- MAE: On average how many units where you off by? This will allow managers to plan for errors like "We will need X amount of cookies if we understocked"

**<span style="color: skyblue;">Scenario 3: Energy Consumption</span>**

- Company: A smart home device manufacturer Problem: “Can we predict daily energy usage based on temperature, time of day, and appliance usage?”

**Why Linear Regression:**

        - Continuous target (kWh)
        
        - Helps optimize device recommendations
        
        - Can feed into sustainability dashboards
        
**Metrics**

- RMSE: “On average, our energy predictions are off by X kWh.” You want a low RMSE because large errors can lead to poor device recommendations or inaccurate sustainability reporting.

- R squared: Energy usage is influenced by many factors—weather, time, appliance behavior. R² lets you justify why you selected those features. It answers, “How much of the energy consumption can we explain with just these inputs?” You want a high R²

- MAE: “How much is our model typically off in terms of actual energy consumed?” You want a low MAE to ensure your recommendations are grounded in realistic usage patterns.

- ME: This tells you whether your model consistently over- or under-predicts energy usage. A positive bias means you’re overestimating (risking unnecessary alerts), while a negative bias means you’re underestimating (missing chances to save energy). You want bias close to zero so your predictions are balanced and trustworthy.

 
**<span style="background-color: lightblue;">Model Evaluation Metrics</span>**
- Go over Metrics Sheet first and memorize what each evaluation score does. 

- MSE, RMSE, MAE, R-Squared (R²) Score : Metrics that quantify accuracy, standard metrics for lr. 
- cross validation
- Analyze cross validation metrics 
- Regularize regression 
- Regualarize regression: Ridge 
- Lasso regression for feature importance 


### <font color='mediumorchid'><b>Ordinary Least Squares </b></font> 

**Supervised Learning**

Simply another method of performing linear regression. It’s a classical statistical technique used to estimate the relationships between variables by minimizing the sum of squared differences between actual and predicted values.OLS provides more detailed statistical insights like p-values and confidence intervals. 
- Ordinary Least Squares (OLS) regression is often used to test hypotheses and analyze relationships between variables. It’s especially helpful in situations where you need to understand how one or more independent variables impact a dependent variable.

**Mathematical Steps** 

1️⃣ Compute the Errors (Residuals)
Each data point has an **actual value**and a **predicted value**. 
The error (residual) is:

$$
\text{Error} = Y_i - \hat{Y}_i
$$

2️⃣ Square the Errors
To avoid negative values canceling out, square each error:

$$
\text{Squared Error} = (Y_i - \hat{Y}_i)^2
$$

3️⃣ Sum All Squared Errors
The goal is to minimize the total squared error:

$$
L = \sum_{i=1}^{N} (Y_i - \hat{Y}_i)^2
$$

where \( L \) is the **Least Squares Loss Function**.

4️⃣ Find the Best-Fit Line
The best-fit line is found by minimizing \( L \), which leads to the same **coefficient** and **intercept** formulas:

$$
\hat{\beta_1} = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
$$

$$
\hat{\beta_0} = \bar{Y} - \hat{\beta_1} \bar{X}
$$

5️⃣ Use the Prediction Equation
Once the best-fit line is found, predictions are made using:

$$
\hat{Y} = \hat{\beta_0} + \hat{\beta_1} X
$$


**<span style="background-color: lightblue;">Model Code</span>**


```Python

import statsmodels.api as sm
from sklearn.model_selection import train_test_split

# Step 1: Define Independent (X) and Dependent (y) Variables

X = df["predictor_variable"]  # Independent variable

X = sm.add_constant(X)  # Adds intercept (β₀) - Required for OLS

y = df["target_variable"]  # Dependent variable (variable to predict)

# Step 2: Split the Data into Training & Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Step 3: Create and Train OLS Model

ols_model = sm.OLS(y_train, X_train)

ols_result = ols_model.fit()

```
**<span style="background-color: lightblue;">sm.OLS() Parameters</span>**


✅ endog → The dependent variable (target y). 

✅ exog → The independent variables (features X). 

✅ missing → Defines how missing values are handled ('none', 'drop', or 'raise'). 

✅ hasconst → Indicates whether the model includes a constant (intercept). 

✅ cov_type → Specifies the covariance type for standard errors (e.g., 'nonrobust', 'HC0'). 

✅ use_t → Determines whether to use t-distribution for inference.


**<span style="background-color: lightblue;">Scenerios To Use this Model In </span>**

Same exact scenerios and metrics for linear regression. 

 
**<span style="background-color: lightblue;">Model evaluation</span>**

- statsmodels
```python

# Step 4: Evaluate Model Performance

print(ols_result.summary())  # View regression statistics
```

- R² Score & Adjusted R²
- F-statistic & p-value
- coefficients & Standard Errors
- P-values
- Residual Analysis

# Bagging vs Boosting 

**Bagging** : Trains multiple models independently on different random subsets of the data. Each model makes a prediction, and the final result is obtained by aggregating (majority vote for classification, averaging for regression). Reduces variance and prevents overfitting. Example: Random Forest (uses multiple decision trees trained independently).

1️⃣ Bootstrap Sampling → Train multiple models on different subsets of the data. 

2️⃣ Independent Model Training → Each model learns separately. 

3️⃣ Aggregation → Predictions are combined using voting or averaging.


**Boosting**: Sequential learning—each model learns from the mistakes of the previous one. Assigns higher weights to misclassified points to improve accuracy. Reduces bias but can be prone to overfitting. Example: AdaBoost, Gradient Boosting, XGBoost.

1️⃣ Train a Weak Model → Start with a simple learner (often a decision tree). 

2️⃣ Focus on Errors → Misclassified samples get increased importance in the next model. 

3️⃣ Sequential Learning → New models correct previous mistakes, refining predictions.

### <font color='mediumorchid'><b>Random Forest Regressor </b></font> 


- Random Forest Regressor: Uses an ensemble of decision trees to capture nonlinear patterns, feature interactions, and complex splits without needing you to manually transform the data.
- Makes predictions by combining the outputs of many decision trees. Each tree gives its own prediction, and the forest averages them to get a final result. This helps reduce overfitting and improves accuracy.

**<span style="background-color: lightblue;">Model Code</span>**

```python
from sklearn.ensemble import RandomForestRegressor

from sklearn.model_selection import train_test_split

# Split dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Random Forest Regressor with parameters

regressor = RandomForestRegressor(n_estimators=100, max_features="auto", random_state=0)

# Fit the model to training data

regressor.fit(X_train, y_train)

# Make predictions on the test set

y_pred = regressor.predict(X_test)

# Print first few predictions

print("Predicted values:", y_pred[:5])

```

**<span style="background-color: lightblue;">RandomForestRegressor() Parameters</span>**
 

✅ n_estimators → Number of trees in the forest (default: 100).

✅ criterion → Function to measure split quality ("squared_error", "absolute_error", "friedman_mse", "poisson").

✅ max_depth → Maximum depth of each tree (None means trees grow until all leaves are pure).

✅ min_samples_split → Minimum samples required to split a node (int or float).

✅ min_samples_leaf → Minimum samples required at a leaf node (int or float).

✅ max_features → Number of features considered for best split ("auto", "sqrt", "log2", or a fraction).

✅ bootstrap → Whether to sample data with replacement (True by default).

✅ oob_score → Whether to use out-of-bag samples for validation (False by default).

✅ n_jobs → Number of CPU cores used for parallel processing (None means one core, -1 uses all cores).

✅ random_state → Controls randomness for reproducibility (None or an integer).

✅ verbose → Controls logging level (0 means silent, higher values show more details).

✅ warm_start → Whether to reuse previous trees when adding more (False by default).

✅ ccp_alpha → Complexity parameter for pruning (0.0 by default).

✅ max_samples → Maximum samples used for training each tree (None means all samples).



**<span style="background-color: lightblue;">Scenerios To Use this Model In </span>**

Same exact scenerios and metrics for linear regression. 


**<span style="background-color: lightblue;">Evaluation Metrics</span>**
 

- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- R² Score (Coefficient of Determination)


### <font color='mediumorchid'><b>AdaBoost Regressor (Adaptive Boosting)</b></font> 

**Supervised Learning & Boosting**

Is an ensemble learning technique that boosts the performance of weak models by focusing on misclassified instances. Unlike Bagging (used in Random Forest), which trains models independently, Boosting trains models sequentially, improving accuracy at each step.

1️⃣ Train a Weak Model → Start with a simple model (often a Decision Stump, a shallow decision tree). 

2️⃣ Adjust Weights → Increase the weight of misclassified instances so the next model focuses more on them. 

3️⃣ Train Another Model → A new model is trained, correcting previous mistakes. 

4️⃣ Repeat → This process continues, creating multiple models that progressively improve.

5️⃣ ✅ Classification → Final predictions are based on weighted voting (majority wins). ✅ Regression → Final predictions are based on weighted averaging of the weak learners' outputs.

**<span style="background-color: lightblue;">Model Code</span>**

```python
# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the AdaBoost Regressor model

regressor = AdaBoostRegressor(

    estimator=DecisionTreeRegressor(max_depth=3),  # Uses a decision tree as the base learner
    n_estimators=100,  # Number of boosting iterations
    random_state=42 )

```

**<span style="background-color: lightblue;">Regressor Model Parameters</span>**

✅ estimator → The base model used for boosting (default is DecisionTreeRegressor(max_depth=3)). You can change this to another regressor. 

✅ n_estimators → Number of weak learners (default is 50). More estimators can improve accuracy but may lead to overfitting. 

✅ learning_rate → Controls how much each weak learner contributes (default is 1.0). Lower values make learning slower but more stable. 

✅ loss → Defines the loss function used to update weights after each boosting iteration. Options: "linear", "square", "exponential". 

✅ random_state → Ensures reproducibility by setting a fixed seed for randomness.

✅ estimator_weights_ → Stores the weights assigned to each weak learner. 

✅ estimator_errors_ → Tracks the error rate of each weak learner. 

✅ feature_importances_ → Measures feature importance based on impurity reduction.

**<span style="background-color: lightblue;">Scenerios To Use this Model In </span>**

Same exact scenerios and metrics for linear regression. 

**<span style="background-color: lightblue;">Model Evaluation</span>** 

- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- R² Score (Coefficient of Determination




### <font color='mediumorchid'><b>XGBoost Regressor (Extreme Gradient Boosting)</b></font> 

**Supervised Learning & Boosting** 

- Is an optimized version of Gradient Boosting that enhances both speed and performance in predictive modeling. It is designed to handle large datasets efficiently. 

Key steps 

1️⃣ Train Initial Model – Starts with a weak learner, typically a decision tree. 

2️⃣ Compute Residuals & Fit New Trees – Each new tree learns from previous errors by predicting the residuals (differences between actual and predicted values). 

3️⃣ Final output: ✅ Classification (XGBClassifier) → Predicts discrete class labels.
                ✅ Regression (XGBRegressor) → Predicts continuous

**<span style="background-color: lightblue;">Model Code</span>** 

```python
from xgboost import XGBRegressor

from sklearn.model_selection import train_test_split

# Splitting dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing the XGBoost regressor

reg = XGBRegressor(n_estimators=100, eval_metric="rmse")

# Training the model

reg.fit(X_train, y_train)

# Making predictions

y_pred = reg.predict(X_test)

```

**<span style="background-color: lightblue;">Model Parameters</span>** 


✅ n_estimators → Number of boosting rounds (default is 100). More rounds can improve accuracy but may lead to overfitting.

✅ learning_rate → Controls how much each tree contributes (default is 0.1). Lower values improve stability but require more trees.

✅ max_depth → Maximum depth of trees (default is 6). Larger values capture more complex patterns but may overfit.

✅ gamma → Minimum loss reduction required for a split (default is 0). Higher values create more conservative trees.

✅ min_child_weight → Minimum sum of instance weights for a child node (default is 1). Helps prevent overfitting by requiring more samples per leaf.

✅ subsample → Fraction of training data used per tree (default is 1.0). Lower values reduce overfitting but may introduce bias.

✅ colsample_bytree → Fraction of features used for each tree (default is 1.0). Helps with feature selection and generalization.

✅ reg_alpha → L1 regularization penalty (default is 0). Helps create sparse models and feature selection.

✅ reg_lambda → L2 regularization penalty (default is 1). Helps reduce overfitting without eliminating features.

✅ eval_metric → Defines the evaluation metric (default is "rmse"). Options include "mae", "rmsle", and "mape".

✅ early_stopping_rounds → Stops training if validation performance doesn’t improve after a set number of rounds.

✅ random_state → Ensures reproducibility by setting a fixed seed for randomness.

✅ objective → Defines the learning task (default is "reg:squarederror" for regression).

**<span style="background-color: lightblue;">Scenerios To Use this Model In </span>**

Same exact scenerios and metrics for linear regression. 

**<span style="background-color: lightblue;">Model Evaluation</span>** 

- RMSE
- MAE
- RMSLE
- MAPE
- R2-SCORE


### <font color='mediumorchid'><b>Support Vector Regressor (SVR)</b></font> 

**Supervised Learning**

SVR is used for regression tasks, predicting continuous numerical values rather than class labels. Instead of finding a hyperplane that separates classes like Support Vector Machines (SVM), SVR finds a best-fit margin that allows for flexible predictions. It aims to minimize errors while keeping the model as simple as possible. SVR can handle linear and nonlinear relationships using kernel functions, making it useful for complex regression problems.

![image.png](attachment:2dd0a451-58e4-4ffb-80c4-92fb653e72aa.png)

- Hyperplane → The main prediction line that SVR tries to fit. Instead of separating classes, it predicts continuous values.

- Dashed Lines (Margins) → These create a "safe zone" around the hyperplane, where errors are allowed but kept small.

- Support Vectors → Important data points that help shape the prediction boundary (margin) by defining where the margin should be.

- So, instead of forcing data into strict categories like SVM does, SVR allows flexibility, letting data points sit within a margin while keeping predictions as accurate as possible


**<span style="background-color: lightblue;">Model</span>**

```python

from sklearn.model_selection import train_test_split

from sklearn.svm import SVR

# Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and Train the SVR Model

model = SVR(kernel='rbf', C=1.0, epsilon=0.1)  # RBF kernel for nonlinear patterns

model.fit(X_train, y_train)

# Make Predictions

y_pred = model.predict(X_test)

```

**<span style="background-color: lightblue;">Model Parameters</span>**


✅ kernel → Defines the type of kernel function used ('linear', 'poly', 'rbf', 'sigmoid', or custom). Default is 'rbf'. 

✅ C → Regularization parameter that controls the trade-off between achieving a low error and keeping the model simple. Higher values make the model more sensitive to errors. 

✅ epsilon → Defines the margin within which predictions are considered correct. Smaller values make the model more precise. 

✅ degree → Used only for 'poly' kernel; defines the polynomial degree. Default is 3. 

✅ gamma → Controls the influence of individual data points ('scale', 'auto', or a float value). Default is 'scale'. 

✅ coef0 → Independent term in kernel function (used for 'poly' and 'sigmoid' kernels). Default is 0.0. 

✅ tol → Tolerance for stopping criterion. Default is 0.001. 

✅ shrinking → Whether to use the shrinking heuristic (True or False). Default is True. 

✅ cache_size → Size of the kernel cache in MB. Default is 200. 

✅ verbose → Enables detailed output (True or False). Default is False. 

✅ max_iter → Maximum number of iterations for optimization (-1 means no limit).


**<span style="background-color: lightblue;">Scenerios To Use This Model In</span>**

**<span style="color: skyblue;">Scenario 1: Stock Price Prediction</span>**


- Company: A fintech startup building a retail investment app
  
- Problem: “Can we predict the closing price of a stock based on technical indicators like moving averages, volatility, and trading volume?”

**Why SVR:**

- Financial data is noisy and often nonlinear.SVR is especially well suited for noisy and non linear data. 

- SVR’s margin-based approach helps avoid overfitting while still capturing complex patterns.

- Kernel functions (like RBF) allow the model to learn subtle relationships between indicators and price movement.

**Metrics**

- RMSE: Evaluates how far off our SVR model was from actual closing prices, especially penalizing large errors that could lead to poor investment decisions. A lower RMSE meant our predictions were more stable and trustworthy for guiding user trades.”

- MAE: “On average, our housing price predictions are off by $9,000.”

- R Squared: “82% of the variation in monthly sales can be explained by our features—like ad spend, seasonality, and historical sales.”

- MSE: “On average, our squared prediction error is 11 kWh².”


**<span style="color: skyblue;">Scenario 2: Predicting Vehicle Emissions</span>**


- Company: An automotive manufacturer focused on sustainability

- Problem: “Can we estimate CO₂ emissions based on engine specs, driving behavior, and environmental conditions?”

**Why SVR:**

- Emissions data can be nonlinear due to interactions between speed, load, and temperature.

- SVR handles these nonlinearities better than OLS, especially with kernel tricks.

- It’s robust to outliers, which are common in sensor data.

**Metrics** 

- RMSE: “Our CO₂ predictions are off by about 4.29 grams per kilometer, on average, with bigger mistakes hurting more.”

- R Squared: “Our model explains 87% of the variation in CO₂ emissions using the selected features—engine specs, driving behavior, and environment.”

- MAE: “On average, our CO₂ predictions are off by 3.12 grams per kilometer, treating all errors equally.”

- MSE: “On average, our squared prediction error is 18.4 grams² per kilometer²—larger errors are penalized more heavily.”
 
**<span style="background-color: lightblue;">Evaluation Metrics</span>**


- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- R² Score (Coefficient of Determination

### <font color='mediumorchid'><b>k-Nearest Neighbors (KNN) Regressor </b></font> 

**Supervised Learning**

- Model that makes predictions based on similarity. Can predict both numerical values and categorical labels
- It just looks at near by k neighbors that are similar to each other averages their amount and that average is the prediction.

**<span style="background-color: lightblue;">Model Code</span>**

```python
from sklearn.neighbors import KNeighborsRegressor
    
from sklearn.model_selection import train_test_split

# Prepare Data

y = df["dependent_variable"].values  

X = df[["predictor1", "predictor2"]].values 

# Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

# Create and Train the KNN Regressor

knn_reg = KNeighborsRegressor(n_neighbors=6)  # Pick a number of neighbors

knn_reg.fit(X_train, y_train)

# Make Predictions
`y_pred = knn_reg.predict(X_test)

print("Predictions:", y_pred)

```
**<span style="background-color: lightblue;">KNeighborsRegressor() Parameters</span>**


✅ n_neighbors → Number of neighbors to consider when making predictions (default is 5). 

✅ weights → Defines how neighbors contribute to predictions ('uniform' gives equal weight, 'distance' gives more weight to closer neighbors). 

✅ algorithm → Determines how neighbors are searched ('auto', 'ball_tree', 'kd_tree', 'brute'). 

✅ leaf_size → Affects speed and memory usage when using tree-based algorithms (default is 30). 

✅ p → Power parameter for the Minkowski distance metric (p=1 is Manhattan distance, p=2 is Euclidean distance). 

✅ metric → Defines the distance function used ('minkowski', 'euclidean', 'manhattan', or a custom function). 

✅ metric_params → Additional arguments for the distance metric function. 

✅ n_jobs → Number of CPU cores used for parallel computation (None means one core, -1 uses all available cores).

**<span style="background-color: lightblue;">KNeighborsRegressor() Evaluation Metrics</span>**


- MSE
- RMSE
- MAE
- R² Score 
- Adjusted R² Score





<div class="span5 alert alert-warning">
<h3>Predictive Models Evaluation Metrics</h3>
    
- Mean Squared Error (MSE) This tells you how far off your predictions are from the actual values — but it squares the errors, so big mistakes count a lot more. It’s useful when you want to heavily penalize large errors.

- Root Mean Squared Error (RMSE) This is just the square root of MSE. It puts the error back into the same units as your target (like dollars or grams), so it’s easier to understand. Lower RMSE means your predictions are closer to reality.

- Mean Absolute Error (MAE) This gives you the average size of your prediction errors, treating all mistakes equally. It’s great for explaining error in plain terms — like “we’re off by $3,000 on average.”

- R Squared (R²) This tells you how much of the variation in your target variable is explained by your model. If R² is 0.85, that means 85% of the changes in your outcome can be explained by your features. Higher is better.

- Adjusted R Squared Same idea as R², but it adjusts for how many features you’re using. It helps prevent overfitting by penalizing models that add too many unnecessary variables.