# Enhancing Linear Models: Exploring Polynomial Regression and Feature Engineering

## Objective
- The realm of feature engineering to boost the predictive power of your models.
- The intricacies of polynomial regression, enabling you to adapt linear regression techniques to capture complex, non-linear relationships

## Library

In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

## Diving Deeper: Feature Engineering and the Power of Polynomial Regression

At its core, linear regression equips us with tools to construct models like:
$$f_{\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \tag{1}$$ 

When data deviates from a linear trend, as with housing prices influenced by varying home sizes, can our standard linear regression tools capture these curves? While we can adjust parameters $\mathbf{w}$ and $\mathbf{b}$ in (1) to fit our training data, this alone can't represent non-linear patterns.

## Polynomial Features


Polynomial features is a method used to add complexity to linear models by considering non-linear relationships of the input features. By adding polynomial terms (e.g., squared or cubed terms) as new features to our dataset, we can model non-linear relationships while still using a linear regression algorithm.

For instance, if we have one feature x, then adding a polynomial feature would mean adding a term like 
x^2, etc., as new features.

If we have a dataset with a feature x, a 2-degree polynomial feature would convert:
x to: x, x^2
 
A 3-degree polynomial would convert it to:
x, x^2, x^3
 
This concept can be extended to multiple features as well.

## Example in Python

Let's take a simple example using Python's scikit-learn library, which provides a convenient utility called PolynomialFeatures to generate these polynomial and interaction terms.

In [None]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Sample data
X = np.array([
    [2],
    [3],
    [4]
])

# Create polynomial features of degree 2
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X)

print(X_poly)

## Selecting Features

When modeling data, it might not be immediately evident which polynomial degree or which feature transformations would be most appropriate. Choosing the right feature set is critical for preventing underfitting (when the model is too simple to capture the underlying patterns) and overfitting (when the model is excessively complex and captures the noise in the data).

Feature selection methods can help determine which features (or combinations of features) are most predictive. In the context of polynomial regression, the question might be about the correct polynomial degree to use or which polynomial terms to include.

In the given problem, the equation y = w0*x0 + w1*x1^2 + w2*x2^3 + b suggests adding polynomial terms up to the third degree, but is this the optimal choice? Should we maybe consider even higher degrees, or perhaps some of these terms are unnecessary?

## Python example
Let's say we have some data, and we want to determine whether adding x^2 and x^3 terms improves the fit of our model. We'll use simple linear regression as our model for demonstration.

In [None]:
# Generate some sample data
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x) + 0.1 * np.random.randn(100)  # sine curve with some noise

# Reshape x for the model training
X = x.reshape(-1, 1)

# Using polynomial features
degrees = [1, 2, 3]  # Linear, quadratic, and cubic
errors = []

for degree in degrees:
    poly = PolynomialFeatures(degree)
    X_poly = poly.fit_transform(X)
    
    model = LinearRegression().fit(X_poly, y)
    y_pred = model.predict(X_poly)
    
    plt.plot(x, y_pred, label=f'Degree {degree}')
    errors.append(mean_squared_error(y, y_pred))

plt.scatter(x, y, marker='x', c='r', label="True Values")
plt.legend()
plt.show()

for degree, error in zip(degrees, errors):
    print(f"Degree {degree} MSE: {error:.4f}")