# PolynomialFeatures

1. **Purpose**: The primary goal of 'PolynomialFeatures' is to generate polynomial and interaction features. It can be used to add complexity to your model by **including not only the features themselves but also their powers and interaction terms**.
2. **Polynomial Features**: These are features created by raising existing features to a power. For example, if your original feature is $x$, polynomial features might include $x^2, x^3$, etc.
3. **Interaction Features**: These are features created by multiplying two or more features together. For example, if you have two features $x$ and $y$, an interaction feature could be $x \times y$.

## Predict House Prices


In [3]:
from sklearn.preprocessing import PolynomialFeatures

# Features: size (in square feet), age (in years)
X = [[2400, 10], [3000, 3], [1500, 15]]
# Target: Price (in thousands)
y = [500, 700, 300]

# Apply PolynomialFeatures
poly = PolynomialFeatures(degree = 2, interaction_only = False, include_bias = True)
X_transformed = poly.fit_transform(X)

# Model Training 
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_transformed, y)

# Prediction
new_house = poly.transform([[2000, 5]])
predicted_price = model.predict(new_house)
print(f"Predicted Price: {predicted_price}")

Predicted Price: [412.15104733]


In [4]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Predict using the model
y_pred = model.predict(X_transformed)

# Calculate MAE
mae = mean_absolute_error(y, y_pred)
print(f"Mean Absolute Error: {mae}")

# Calculate MSE
mse = mean_squared_error(y, y_pred)
print(f"Mean Squared Error: {mse}")

# Calculate RMSE
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error: {rmse}")

# Calculate R-squared
r2 = r2_score(y, y_pred)
print(f"R-squared: {r2}")

Mean Absolute Error: 3.789561257387201e-14
Mean Squared Error: 4.308232357047019e-27
Root Mean Squared Error: 6.563712636189231e-14
R-squared: 1.0


# Pipeline Tools
- ensures all steps in the workflow are executed in a systematic manner, which is particularly beneficial for cross-validation and grid search procedures

In [1]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

# Generate some data to work with
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create a pipeline that first standardizes the data, then applies polynomial features,
# and finally fits a linear regression model
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegression())
])

# Use the pipeline to fit the model on the training data
pipeline.fit(X_train, y_train)

# Now you can use the pipeline to make predictions
y_pred = pipeline.predict(X_test)

# The pipeline can also be used to evaluate the model
print(f"Training set score: {pipeline.score(X_train, y_train)}")
print(f"Test set score: {pipeline.score(X_test, y_test)}")


Training set score: 0.999999111738292
Test set score: 0.9999993299106112
