1. Problem identification 

2. Data wrangling

3. Exploratory data analysis

4. Prep-processing and training data development

5. **Modeling (Machine learning steps)**

6. Documentation

# Measures of Godness of Fit 

<div class="span5 alert alert-warning">
<h3>Regression Metrics</h3>

- https://web.archive.org/web/20240117214702/https://www.kdnuggets.com/2018/04/right-metric-evaluating-machine-learning-models-1.html

## <font color='magenta'><b>MAE - Mean Absolute Error </b></font> 

Definition: Is calculated by first finding the absolute differences between each predicted value and its corresponding actual value, then averaging those differences. ∣Actual 𝑖 − Predicted 𝑖 ∣. Its only 1 number. 
- you want your mae to be low 
- This is more useful that R^2 evaluation method when you care more about how "close" your predictions are to the true values, in other words the difference between predicted and actual values rather than the variability.
- A linear metric—each error contributes equally to the final score.
- Example: Predicting inventory levels or demand for products. Businesses use MAE to ensure their predictions are accurate enough to prevent stockouts or overstock situations.

**Mathematical Steps** 

1️⃣ Find the error

Subtract each predicted value from its actual value:

$$ \text{Error} = y_{\text{actual}} - y_{\text{predicted}} $$


2️⃣ Take the absolute value

Convert all errors to positive values (so negative errors don’t cancel out positive errors):

$$ \text{Absolute Error} = |y_{\text{actual}} - y_{\text{predicted}}| $$


3️⃣ Find the average of all absolute errors

Sum up all absolute errors and divide by the total number of observations 
𝑛 :

$$ MAE = \frac{1}{n} \sum |y_{\text{actual}} - y_{\text{predicted}}| $$


**<u>Function Option <u>**

**1) Create a function that calculates the meanof the absolute errors**
- you can calculate this for both the training data and test data.
- order always matter its always actual first predicted 2nd when calling both options


`def mae(y, ypred):`

    abs_error = np.abs(y - ypred)
    mae = np.mean(abs_error)
    return mae
  
`mae(actual_values, predictions)`

**<u> sk.learn option <u>**

`from sklearn.metrics import mean_absolute_error`

`train_mae = mean_absolute_error(train_actual_values, train_predicted_values)`

`test_mae = mean_absolute_error(test_actual_values, test_predicted_values)`



-  this essentially tells you that, on average, you might expect to be off by around __ (the output of function) if you guessed based on average of actual numbers. 

## <font color='magenta'><b>Mean Absolute Percentage Error (MAPE)</b></font> 

Definition: MAPE (Mean Absolute Percentage Error) measures the average percentage difference between predicted and actual values in a regression model. It evaluates prediction accuracy in relative terms, making it useful for comparing performance across datasets of different scales.

- Lower MAPE indicates better model accuracy.
- Difference from MAE: While MAE provides error in absolute units, MAPE expresses error as a percentage, making it more interpretable when working with varying scales.


**Mathematical Steps** 

1️⃣ Find the absolute errors

Subtract the actual values from the predicted values:

$$ \text{Error} = |y_{\text{actual}} - y_{\text{predicted}}| $$


2️⃣ Convert each error into a percentage

Divide each absolute error by its actual value to get a percentage error:

$$ \text{Percentage Error} = \left( \frac{|y_{\text{actual}} - y_{\text{predicted}}|}{y_{\text{actual}}} \right) \times 100 $$

3️⃣ Calculate the average percentage error

Sum all percentage errors and divide by the total number of observations (n):

$$ MAPE = \frac{1}{n} \sum \left( \frac{|y_{\text{actual}} - y_{\text{predicted}}|}{y_{\text{actual}}} \times 100 \right) $$



**Metric** 

`from sklearn.metrics import mean_absolute_percentage_error`

`mape_score = mean_absolute_percentage_error(actual_values, predicted_values) * 100`

`print(f"MAPE: {mape_score:.2f}%")`


## <font color='magenta'><b>RMSE- Root Mean Squared Error</b></font> 

Evaluating the model using 
(RMSE) is a metric used to evaluate the accuracy of a regression model. It measures the average magnitude of the errors between the predicted values and the actual values. In other words, it tells you how well your model's predictions match the actual data. Ideally a low number is good. That will demostrate your model is accurate. 
- Tends to be the default metric
- Represents the sample standard deviation of prediction errors (residuals).
- Penalizes large errors more than Mean Absolute Error (MAE).
- Typically larger or equal to MAE, unless all prediction errors are identical.
- Scores always need to be possitive

**Mathematical Steps** 

1️⃣ Find the errors

Subtract each predicted value from its actual value:

$$ \text{Error} = y_{\text{actual}} - y_{\text{predicted}} $$

2️⃣ Square each error

This ensures larger errors are penalized more and turns negative values into possitive values. 

$$ \text{Squared Error} = (y_{\text{actual}} - y_{\text{predicted}})^2 $$


3️⃣ Find the Mean Squared Error (MSE)

Sum all squared errors and divide by the total number of observations 
𝑛:

$$ MSE = \frac{1}{n} \sum (y_{\text{actual}} - y_{\text{predicted}})^2 $$

4️⃣ Take the square root of MSE

This converts the squared error back to the original units of the data:

$$ RMSE = \sqrt{MSE} $$


**Metric**
  
`from sklearn.metrics import mean_squared_error`

`lin_rmse = mean_squared_error (y,life_expc_pred, squared = False)`

`lin_rmse`



  

## <font color='magenta'><b>R² (coefficient of determination)</b></font> 

-  Measures the part of the variance (spread) in the dependent variable (y variable - the value that was predicted) that is explained by the independent variables (existing variables used to predict the y variables) in the model.
It compares the predicted values to the actual values to determine how much of the variability in y is explained by the model.
-  In simple terms, it tells you how well your model's predictions match the actual data, with a value between 0 and 1. A higher R² value indicates a better fit

**Mathematical Steps** 

1️⃣ Find the mean of actual values

Calculate the average of all actual values:

$$ \bar{y} = \frac{1}{n} \sum y_{\text{actual}} $$

2️⃣ Compute the Total Sum of Squares (TSS)

Measure the total variation in actual values:

$$ TSS = \sum (y_{\text{actual}} - \bar{y})^2 $$

3️⃣ Compute the Residual Sum of Squares (RSS)

Measure how much error remains after predictions:

$$ RSS = \sum (y_{\text{actual}} - y_{\text{predicted}})^2 $$

4️⃣ Calculate R²

Determine the proportion of variance explained by the model:

$$ R² = 1 - \frac{RSS}{TSS} $$




**<u>Function Option <u>**

**1) Create a new variable with the data not containing any categorical data just numerical**

**2) split data training set, test set, training target values and test target values**

- x : should have all input features (all the columns except the target values)
- y : should have all the target values (used to predict)
   
 `X_train, X_test, y_train, y_test = train_test_split(data.drop(columns='target_values'),
data.target_values, test_size=0.3, 
   random_state=47)`


**3) create a function that calculates R^2** of either training data and test data. Will calculate depending what inouts you give training or test data. 
 
- ```
    def r_squared(y, ypred):

    ybar = np.sum(y) / len(y) #yes, we could use np.mean(y)
    sum_sq_tot = np.sum((y - ybar)**2) #total sum of squares error
    sum_sq_res = np.sum((y - ypred)**2) #residual sum of squares error
    R2 = 1.0 - sum_sq_res / sum_sq_tot
    return R2
  ```
  
**4) Make predictions using the mean of either the training data or test data times an array of ones the lenght of the training data or test data**

- `baseline_predictions = mean_value * np.ones(len(actual_values))
baseline_predictions[:5]`

 
    or use  `sklearn` dummy regressor that passes trough the training data

- `predictions = model.predict(input_features)
predictions[:5]]` 

**5) call the function, passing trough the actual data and the predictions**

- `r_squared(actual_values, predictions)`


**Metric**


**<u> sk.learn option <u>**

` from sklearn.metrics import r2_score`

`train_r2 = r2_score(train_actual_values, train_predicted_values)`

`test_r2= r2_score(test_actual_values, test_predicted_values)`


## <font color='magenta'><b>Adjusted R²</b></font> 


MEtric that shows how well a regression model explains the variability in the data, but with a correction for the number of predictors in the model. Unlike regular R², which increases as more predictors are added, Adjusted R² only increases if the new predictors actually add value. If they don’t, it decreases to prevent overfitting. Think of it as a smarter version of R²—helping you assess if adding more variables truly improves your model or just makes it unnecessarily complex.

**Mathematical Steps** 

1️⃣ Calculate R²

Use the standard coefficient of determination formula:

$$ R² = 1 - \frac{RSS}{TSS} $$

2️⃣ Adjust for number of predictors

Modify R² to account for model complexity and prevent overestimation:

$$ R²_{\text{adj}} = 1 - \left( \frac{(1 - R²)(n - 1)}{n - k - 1} \right) $$


**Metric** 

`from sklearn.linear_model import LinearRegression`

`from sklearn.metrics import r2_score`

Fit the model

`model = LinearRegression()`

`model.fit(X_train, y_train)`

Predict values

`y_pred = model.predict(X_test)`

`r2 = r2_score(y_test, y_pred)`

`n = X_test.shape[0]  # Number of observations`

`k = X_test.shape[1]  # Number of features`

`adjusted_r2 = 1 - ((1 - r2) * (n - 1) / (n - k - 1))`

`print("Adjusted R²:", adjusted_r2)`





<div class="span5 alert alert-warning">
<h3>Other Regression Metrics</h3>



### <font color='magenta'><b>F-test - linear regression 
</b></font> 

Purpose: Evaluates the overall significance of the linear regression model. It tests whether at least one of the predictors is significantly related to the dependent variable.

Hypothesis:

Null Hypothesis (
𝐻
0
): All regression coefficients are equal to zero (no effect).

Alternative Hypothesis (
𝐻
1
): At least one regression coefficient is not equal to zero.

### <font color='magenta'><b> t-test for Regression Coefficients: - linear regression </b></font> 

Purpose: Tests the significance of individual regression coefficients, determining whether each predictor has a significant effect on the dependent variable.

Hypothesis:

Null Hypothesis (
𝐻
0
): The coefficient is equal to zero (no effect).

Alternative Hypothesis (
𝐻
1
): The coefficient is not equal to zero.

### <font color='magenta'><b>Residual Analysis- linear regression</b></font> 

Purpose: Analyzes the residuals (differences between observed and predicted values) to check for patterns that might indicate issues with the model, such as non-linearity, heteroscedasticity, or outliers.

Plots:

Residual vs. Fitted Plot: Checks for non-linearity and equal variance.

Q-Q Plot: Assesses the normality of residuals.

### <font color='magenta'><b>Durbin-Watson Test - linear regression </b></font> 

Purpose: Tests for the presence of autocorrelation (correlation of residuals) in the residuals from a linear regression model.

Value Range:

2 indicates no autocorrelation.

Values < 2 indicate positive autocorrelation.

Values > 2 indicate negative autocorrelation.

_______________________________________________________________________________________________________________________

<div class="span5 alert alert-warning">
<h3>Classification Metrics</h3>

- https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
- https://web.archive.org/web/20240103185927/https://www.kdnuggets.com/2018/06/right-metric-evaluating-machine-learning-models-2.html

### <font color='coral'><b>Confusion Matrix - classification models </b></font>

### <font color='coral'><b>Binary Classification</b></font>


<span style="background-color: peachpuff;">Table that represents which labels where correctly predicted and which were not.</span>
 It compares the actual labels (true values) with the predicted labels from the model, helping to identify errors.

- Each column represents the predicted class.
- The diagonal elements represent the correctly classified images.
- The off-diagonal elements represent the misclassified images.
  
1️⃣ True Positives (TP) – Correctly predicted positive cases.
- Recall, sensitivity

2️⃣ True Negatives (TN) – Correctly predicted negative cases.
- Specificity 

3️⃣ False Positives (FP) – Incorrectly predicted as positive (Type I error). 
- 	1-specificity

4️⃣ False Negatives (FN) – Incorrectly predicted as negative (Type II error).
- Miss Rate 

**Model evaluation**

`from sklearn.metrics import confusion_matrix`

`cm = confusion_matrix (y_test,y_pred)`

**Vizualise the confusion matrix**

`from sklearn.metrics import ConfusionMatrixDisplay`


`_, ax = plt.subplots()`

`display_cm = ConfusionMatrixDisplay(confusion_matrix = cm, 
                                    display_labels = ['not target variable', 'target variable'])`
                                    
`ax.set_xticks([0, 1])`

`ax.set_yticks([0, 1])`

`ax.set_xticklabels(labels = ['no target variable', 'target variable'], fontsize = 8)`

`ax.set_yticklabels(labels = ['no target variable', 'target variable'], fontsize = 8)`

`display_cm.plot(ax = ax)`

`plt.show()`

             
         



<img src="confusionmatrix.png" alt="Confusion Matrix" style="width:800px;"/>


### <font color='coral'><b>Precision - classification models</b></font>

- when I say something is possitive how often Im I right?
- key performance metric tells<span style="background-color: peachpuff;"> you how many predicted positive cases were .</span>


<span style="background-color: peachpuff;">Precision = True Positives /True Positives + False Positives</span>

Important when false positives are costly (e.g., spam filtering).

 * High Precision = model has low false positive rate meaning its good at accurately predicting
 * Low Precision: model has high false possitive rate meaning model is not good at predicting 

**Metric** 

`from sklearn.metrics import precision_score`

`# Example: Actual vs. Predicted labels`

`y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]  # Actual labels`

`y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 1]  # Predicted labels`

`# Calculate Precision`

`precision = precision_score(y_true, y_pred)`

`print(f'Precision: {precision:.2f}')`



### <font color='coral'><b>Recall - classification models</b></font>

- When something is possitive, how often do we predict that it is actually possitive?
- key performance metric in machine learning, particularly for classification tasks. It **measures the ability** of a model to identify all relevant instances within a dataset. In other words, recall tells you how many of the actual positive instances were correctly identified by the model.

<span style="background-color: peachpuff;">Recall = True Positives/ True Positives + False negatives</span>

- High recall = good but it depends on the goal of the project 

- Low recall = generally bad

**Metric** 

`from sklearn.metrics import recall_score`

`# Example: Actual vs. Predicted labels`

`y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]  # Actual labels`

`y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 1]  # Predicted labels`

`# Calculate Recall`

`recall = recall_score(y_true, y_pred)`

`print(f'Recall: {recall:.2f}')`


### <font color='coral'><b> F1 Score - classification models </b></font>

 - Is a metric used to evaluate the performance of a classification model. It is the harmonic mean of precision and recall, providing a balance between the two. The F1 score is particularly useful when you need to balance the trade-offs between precision and recall, especially in cases where you have an uneven class distribution like medical diagnosis data and fraud detection data. In medical diagnosis the sick people are way less than the healthy and fradulent transactions are way less to the ration of non fradulent transactions. 

- range from 0 -1, 1 is the best possible score meaning you have the best recall and precision


<span style="background-color: peachpuff;">F1 score = 2 x precision x recal / precision + recall</span>
    
Precision: 80% (0.8)

Recall: 67% (0.67)

F1 Score = 2 × 0.8 × 0.67 / 0.8 + 0.67 = 0.73

This score provides a single metric that balances both precision and recall, giving you a better overall measure of your model's performance.

**Metric** 

`from sklearn.metrics import f1_score`

`# Example data: actual labels vs. predicted labels`

`y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # Actual values`

`y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]  # Predicted values`

`# Calculate F1-score`

`f1 = f1_score(y_true, y_pred)`

`print("F1 Score:", f1)`


### <font color='coral'><b>Precision- Recall Tradeoff - classification models </b></font>


- use when youre dealing with an imbalanced data set. An imbalanced dataset is one where the classes are not represented equally. In other words, one class (often the negative class) has significantly more instances than the other class (often the positive class). This imbalance can pose challenges for machine learning models, as they may become biased towards the majority class and perform poorly on the minority class.

- the balance between precision and recall. Improving one often comes at the expense of the other. For example, increasing precision might decrease recall, and vice versa. This tradeoff is important when deciding which metric to prioritize based on the specific application.

- Example:
Imagine you have a binary classification model that predicts the probability of an email being spam. The model outputs a probability score between 0 and 1 for each email. The threshold determines at what probability value you classify an email as spam.

- Threshold = 0.5: If the predicted probability is greater than or equal to 0.5, classify the email as spam; otherwise, classify it as not spam.

- Lower Threshold (e.g., 0.3): More emails will be classified as spam, increasing recall but potentially decreasing precision.

- Higher Threshold (e.g., 0.7): Fewer emails will be classified as spam, increasing precision but potentially decreasing recall.

- Precision-Recall Curve:
Axes: Plots Precision on the y-axis and Recall on the x-axis.

- Use Case: Particularly useful for imbalanced datasets where the positive class is much less frequent than the negative class.

- Interpretation: A curve closer to the top right corner indicates better performance. It helps you understand the trade-off between precision and recall.

### <font color='coral'><b>Classification Report - classification models  </b></font>

- Provides a detailed breakdown of model performance (precision, recall, F1-score). <span style="background-color: peachpuff;">Helps evaluate imbalanced datasets more effectively than just accuracy.</span>
 Essential for understanding misclassification rates across different classes. Make sure to run classification report on both training data and test data. Training data will shows how well the model learned patterns and test data will show how well the model generalizes to unseen data. They both should have similar performance on both training and test data.
- Helps detect overfitting (if training performance is much higher than test performance). Highlights imbalances in precision, recall, and F1-score between different classes.
- If your dataset is balanced, accuracy might already give a clear picture and you wouldnt need classification report. Rule of thumb for myself never ever just use accuracy, keep evaluating the model is always best. 

**Model**

`from sklearn.linear_model import LogisticRegression`

`# Instantiate the model`

`logreg = LogisticRegression()`

`# Fit the model`

`logreg.fit(X_train, y_train)`

`# Predict probabilities`

`y_pred = logreg.predict(X_test)`


**Metric**

`from sklearn.metrics import classification_report`

`# make sure to compare y_train and y_pred for the training data`

`report = classification_report(y_train, y_pred)`

`# make sure to compare y_test and y_pred for the training data `

`report = classification_report(y_test, y_pred)`







### <font color='coral'><b>Receiver Operating Characteristics - classification models</b></font>



- ROC Curve: is a graph that shows how well a classification model distinguishes between positive and negative cases.
Axes: Plots the True Positive Rate (Recall) on the y-axis and the False Positive Rate (1 - Specificity) on the x-axis. Useful when the classes are balanced or when you want to understand the trade-off between true positives and false positives.

- Interpretation: A curve closer to the top left corner indicates better performance. The Area Under the Curve (AUC) summarizes the overall performance.


**True Positive Rate** = **Recall**

TP/TP + FN
_________________
         
**True Negative Rate (TNR)** = **Specificity** 

- TN/ TN + FP
_________________

**False possitive rate** 
1-specificity


**Metric** 

`from sklearn.metrics import roc_curve, auc`

`import matplotlib.pyplot as plt`

`# Assuming you already have true labels (y_test) and predicted probabilities (y_scores)`

`fpr, tpr, _ = roc_curve(y_test, y_scores)`

`# Calculate AUC (Area Under the Curve)`

`roc_auc = auc(fpr, tpr)`

`# Plot the ROC curve`

`plt.figure(figsize=(6,6))`

`plt.plot(fpr, tpr, color='blue', label=f'ROC curve (AUC = {roc_auc:.2f})')`

`plt.plot([0, 1], [0, 1], color='gray', linestyle='--')  # Random guess line`

`plt.xlabel('False Positive Rate')`

`plt.ylabel('True Positive Rate')`

`plt.title('ROC Curve')`

`plt.legend(loc='lower right')`

`plt.show()`


### <font color='coral'><b>AUC Area Under the Curve - classification models </b></font>

- Area Under the Curve, is a single scalar value that summarizes the overall performance of a binary classification model. It is derived from the ROC curve. The AUC (Area Under the Curve) represents the entire area beneath the ROC curve, which plots True Positive Rate (TPR) on the y-axis and False Positive Rate (FPR) on the x-axis

AUC Interpretation:
AUC = 1: Perfect model with no false positives or false negatives.

AUC > 0.9: Excellent model performance.

AUC between 0.8 and 0.9: Good model performance.

AUC between 0.7 and 0.8: Fair model performance.

AUC between 0.6 and 0.7: Poor model performance.

AUC = 0.5: Model with no discrimination ability, equivalent to random guessing. 

**Metric** 

`from sklearn.metrics import roc_auc_score`

`# Assuming you already have true labels (y_test) and predicted probabilities (y_pred)`

`auc_score = roc_auc_score(y_test, y_pred)`

`print(f"AUC Score: {auc_score:.2f}")`




<div class="span5 alert alert-warning">
<h3>Clustering Model Evaluation Methods</h3>


### <font color='coral'><b>Elbow Method - KMeans </b></font>


The Elbow Method is a technique used to determine the optimal number of clusters (k) in K-Means clustering. It helps identify the point where adding more clusters no longer significantly improves the model’s performance.

**How It Works**

1️⃣ Inertia → Think of it as how well your clusters fit the data. Lower inertia means your points are grouped more tightly and meaningfully.

2️⃣ Plot Inertia vs. Number of Clusters → You test different numbers of clusters (k) and measure how well they organize the data. Then, you plot the results to see how inertia changes.

3️⃣ Find the "Elbow" Point → Imagine bending your arm—there’s a sharp change where it bends. In the plot, this "bend" shows where adding more clusters stops making a big difference in organizing the data. That’s your best number of clusters!

**Metric**

`ks = range(1, 6)`

`inertias = []`

`for k in ks:`

    # Create a KMeans instance with k clusters: model
    model = KMeans(n_clusters = k)
    
    # Fit model to data
    model.fit(data)
    
    # Append the inertia to the list of inertias
    inertias.append(model.inertia_)
    
`# Plot ks vs inertias`

`plt.plot(ks, inertias, '-o')`

`plt.xlabel('number of clusters, k')`

`plt.ylabel('inertia')`

`plt.xticks(ks)`

`plt.show()`



### <font color='coral'><b>Cluster Validity Assessment using External Labels - KMeans </b></font>


Cluster validity assessment using external labels is the process of evaluating clustering quality by comparing predicted cluster assignments to actual known categories. This helps measure how well the algorithm groups similar data points relative to predefined labels. The data may need to go trough feature transformation may need to go trough scalling or normalization. You can build a pipeline that does the feature transformation, does kmeans and then build the predictions. 

**Metric** 

`# Create a KMeans model with 3 clusters: model`

`model = KMeans(n_clusters = 3)`

`# Use fit_predict to fit model and obtain cluster labels: labels`

`labels = model.fit_predict(data)`

`# Create a DataFrame with labels and varieties as columns: df`

`df = pd.DataFrame({'labels': labels, 'varieties': varieties})`

`# Create crosstab: ct`

`ct = pd.crosstab(df['labels'],df['varieties'] )`

`# Display ct`

`print(ct)`


### <font color='coral'><b>Cluster Validity Assessment using External Labels - Hierarchical Clustering</b></font>

- Evaluates how well the clustering results match known categories. It works by comparing the predicted cluster assignments to predefined labels.

**Metric**

`# Perform the necessary imports`

`import pandas as pd`

`from scipy.cluster.hierarchy import fcluster`

`# Use fcluster to extract labels: labels`

`labels = fcluster(mergings,6, criterion='distance')`

`# Create a DataFrame with labels and varieties as columns: df`

`df = pd.DataFrame({'labels': labels, 'varieties': varieties})`

`# Create crosstab: ct`

`ct = pd.crosstab(df['labels'], df['varieties'])`

`# Display ct`

`print(ct)`




<div class="span5 alert alert-warning">
<h3>Improving Model Performance with Cross-Validation</h3>


### <font color='coral'><b>Cross Validation - evaluation method for most MLA (unseen data)</b></font>



Cross-validation is a technique used in machine learning to evaluate how well a model performs on unseen data. It involves splitting the dataset into multiple parts, training the model on some parts, and testing it on the remaining parts. This process is repeated several times to ensure the model's performance is consistent and reliable. It's like giving your model multiple "practice tests" before the final exam to see how well it does in different scenarios.


theres different methods to cross validate these are the most common 
Train-Test Split

- K-Fold Cross-Validation

- Leave-One-Out Cross-Validation (LOOCV)

- Stratified K-Fold Cross-Validation

- Repeated K-Fold Cross-Validation

- Time Series Cross-Validation

**Quick evaluation of a model**


`from sklearn.model_selection import cross_val_score`

`# Load dataset`

`X, y = load_data()  # Replace with actual data loading method`

`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`

`model = pick a model()`

`cv_scores = cross_val_score(model, X, y, cv=5)`

**Custom cross-validation strategie KFold** 
- need to specify model and evaluation method if no evaluation method given it will defult to using the models default built in scoring method. 

`# Load dataset`

`X, y = load_data()  # Replace with actual data loading method`

`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`

`kf = KFold(n_splits=5, shuffle=True, random_state=42)`

`model = pick a model() `

`# Perform cross-validation`

`cv_scores = cross_val_score(model, X, y, cv=kf) # no evaluation method given`

**Normal Cross-Validation with Custom Scoring** 

`from sklearn.model_selection import cross_val_score `

`cross_val_score (model, the_input_data, target_variable_trying_to_predict, cv = '#', scoring = 'sccuracy' )`

**Cross Validation For Multiple Models** 

`def cross_validate_model(model, X, y, cv=5):`

    """Performs cross-validation and returns mean scores for RMSE, MAE, and R2."""
    
    # Cross-validated RMSE (negative values converted to positive)
    rmse_scores = cross_val_score(model, X, y, cv=cv, scoring="neg_root_mean_squared_error")
    rmse_mean = np.abs(rmse_scores).mean()
    
    # Cross-validated MAE
    mae_scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_absolute_error")
    mae_mean = np.abs(mae_scores).mean()
    
    # Cross-validated R2 Score
    r2_scores = cross_val_score(model, X, y, cv=cv, scoring="r2")
    r2_mean = r2_scores.mean()
    
    return {"Cross-Validated RMSE": rmse_mean, "Cross-Validated MAE": mae_mean, "Cross-Validated R2-SCORE": r2_mean}

`models = {`

    "Linear Regression": LinearRegression(),
    "Random Forest": RandomForestRegressor(n_estimators=100, max_features=None, random_state=0),
    "Support Vector Regressor": SVR(kernel='rbf', C=1.0, epsilon=0.1),
    "XGBoost Regressor": XGBRegressor(n_estimators=100)
`}`

`# Run cross-validation for each model`

`cv_results = {name: cross_validate_model(model, X_train, y_train) for name, model in models.items()}`

`# Print cross-validation results`

`for model_name, metrics in cv_results.items():`

    print(f"{model_name}: {metrics}")


### Overfitting - too complex 

The model learns the training data too well, including noise and irrelevant details.

It performs very well on the training data but poorly on new, unseen data.

Think of it like memorizing answers for an exam instead of understanding the concepts.

**Bias-Variance Tradeoff - Overfitting = Low Bias, High Variance**


**Signs of Overfitting:**

        High accuracy on training data but low accuracy on test data.

        The model is too complex with too many features.

        Solution: Use regularization (like L1/L2 penalties), simplify the model, or get more training data.

### Underfitting - too simple 

The model is too simple and fails to capture important patterns in the data.

It performs poorly on both training and test data.

**Bias-Variance Tradeoff - Underfitting = High Bias, Low Variance**


**Signs of Underfitting:**

        Low accuracy on both training and test data.

        Model predictions are too simplistic.

        Solution: Increase model complexity, use better features, or try a more advanced algorithm.