# Tutorial: Polynomial Transformation, k-Fold Cross-Validation, and Feature Scaling

## Table of Contents
1. [Polynomial Transformation](#section1)
    * Introduction
    * Example with Code
    * Bias Parameter Explanation
    
2. [k-Fold Cross-Validation](#section2)
    * Introduction
    * Mathematical Explanation
    * Example with Code
    
3. [Standard Scaler: How it Works](#section3)
    * Introduction
    * Mathematical Explanation
    * Example with Code

---

<a id='section1'></a>
## 1. Polynomial Transformation
### Introduction
Polynomial transformation is a technique to introduce higher degree features into the dataset, making our linear model capable of fitting non-linear relationships.

`PolynomialFeatures` is a pre-processing technique used for feature engineering. It generates a new feature matrix consisting of all polynomial combinations of the original features up to a specified degree. For a feature vector `x = [a, b]` of degree `2`, the output is `[1, a, b, a^2, ab, b^2]`.

### Example with Code


In [11]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Generate a simple synthetic dataset
X = np.array([[2,4], [3,5], [1,8]])

# Instantiate the PolynomialFeatures class with degree=2
poly = PolynomialFeatures(degree=2)

# Transform the data to include polynomial features and a bias column (column of ones)
X_poly = poly.fit_transform(X)

# Display the transformed data
print(X_poly)


[[ 1.  2.  4.  4.  8. 16.]
 [ 1.  3.  5.  9. 15. 25.]
 [ 1.  1.  8.  1.  8. 64.]]


### Bias in Polynomial Features

In polynomial regression, a bias term is often included, $x_0=1$.

In the `PolynomialFeatures()` function, this bias term is controlled by the `include_bias` parameter. When set to `True` (which is its default setting), a bias column is added to the output. This column is uniformly filled with ones. Conceptually, for each data sample, this is akin to having an additional feature \(x_0 = 1\).


However, there may be times when you'd prefer to not include this bias term. To turn off the bias term in `PolynomialFeatures()`, you can set the `include_bias` parameter to `False`.


<a id='section2'></a>
## 2. k-Fold Cross-Validation

### Introduction
k-Fold cross-validation involves dividing the dataset into 'k' subsets. The model is trained on \(k-1\) of these folds and validated on the remaining 1 fold. This process is repeated 'k' times, each time with a different fold as the validation set.

###  Explanation
Let's denote our dataset as $D$, which we divide into $k$ folds, \( $D_1$, $D_2$, ... , $D_k$ \). For each fold $i$:
1. Train the model on \( $D$ - $D_i$ \) (all data except the ith fold)
2. Validate the model on \( $D_i$ \)

After k iterations, average the performance across all folds to get a final model performance measure.

### Example with code

In [13]:
# Import the necessary module from scikit-learn to perform K-fold cross-validation
from sklearn.model_selection import KFold

# Create an instance of KFold class specifying the number of splits (folds)
# In this example, we are using 3 folds for our cross-validation
kf = KFold(n_splits=3)

# Iterate through each fold using the split method provided by the KFold object.
# The split method returns indices for the train and test (validation) sets for each fold.
for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:", test_index)
    
    # Using the indices provided, we split our dataset (both features and target) 
    # into training and validation sets for the current fold.
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # --- Placeholder for model-specific steps ---
    # 1. Fit the model using X_train and y_train
    # 2. Predict on X_test
    # 3. Compute the error between y_test and the predictions
    # 4. Save the computed error for later analysis

# After looping through all folds, compute and display the average error (assuming errors are saved in a list)

Train: [1 2] Validation: [0]
Train: [0 2] Validation: [1]
Train: [0 1] Validation: [2]


### Shuffling the Data
By default, `KFold` does not shuffle the dataset before splitting. However, if you want the data to be shuffled before forming the folds, you can set the `shuffle` parameter to `True` during the `KFold` initialization. Additionally, setting a `random_state` ensures the shuffle produces the same result across different runs, maintaining reproducibility.


In [12]:
from sklearn.model_selection import KFold

kf = KFold(n_splits=3, shuffle=True, random_state=42)
for train_index, test_index in kf.split(X):
    print("Train:", train_index, "Validation:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]


Train: [1 2] Validation: [0]
Train: [0 2] Validation: [1]
Train: [0 1] Validation: [2]


## 3. Standard Scaler: How it Works

### Introduction

`StandardScaler` standardizes the dataset’s features by removing the mean and scaling them to unit variance. 

### Mathematical Explanation

Given a feature \( x \), the standard scaler applies:

$x' = \frac{x - \text{mean}(x)}{\text{std}(x)} $

Where:
- $ x' $ is the transformed feature
- $\text{mean}(x) $ is the mean of feature $ x $
- $ \text{std}(x) $ is its standard deviation.


In [14]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled)


[[ 0.         -0.98058068]
 [ 1.22474487 -0.39223227]
 [-1.22474487  1.37281295]]
