# Principal Component Regression (PCR)

Principal Component Regression (PCR) combines Principal Component Analysis (PCA) with linear regression. It is particularly useful when dealing with multicollinearity in datasets, as it reduces the dimensionality of the data by transforming correlated variables into a set of linearly uncorrelated components.

In [None]:
# Principal Component Regression (PCR) Notebook

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Generate synthetic data
np.random.seed(0)
X = 2 - 3 * np.random.rand(100)
y = X**2 + np.random.randn(100) * 0.5

# Reshape the data
X = X[:, np.newaxis]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_train)

# Apply PCA to the polynomial features
pca = PCA(n_components=1)  # Adjust the number of components based on explained variance
X_pca = pca.fit_transform(X_poly)

# Train the linear regression model on PCA-transformed features
model = LinearRegression()
model.fit(X_pca, y_train)

# Predict using the model
X_test_poly = poly.transform(X_test)
X_test_pca = pca.transform(X_test_poly)
y_pred = model.predict(X_test_pca)

# Visualize the results
plt.scatter(X_test, y_test, color='red', label='Actual data')
plt.scatter(X_test, y_pred, color='blue', label='Predicted data')
plt.title('Principal Component Regression Results')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

# Calculate and print model performance metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model Performance Metrics:")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R²): {r2}")

# Print explained variance ratio of the principal components
print("Explained variance ratio of the principal components:", pca.explained_variance_ratio_)

### Explanation of Code Components

1. **Data Generation**: Similar to previous examples, synthetic data is generated for polynomial regression.

2. **Data Preprocessing**: The dataset is split into training and testing sets, and polynomial features are created.

3. **PCA Transformation**: PCA is applied to the polynomial features to reduce dimensionality. You can adjust the number of components in `PCA(n_components=1)` based on the desired level of variance explained.

4. **Model Training**: A linear regression model is trained on the PCA-transformed features.

5. **Prediction**: Predictions are made on the test set.

6. **Visualization**: A scatter plot shows the actual vs. predicted values.

7. **Performance Measurement**: Model performance is evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) metrics.

8. **Explained Variance**: The explained variance ratio of the principal components is printed to understand how much variance is captured by the components used.

### Note
You can adjust the number of components in PCA to find the optimal number that balances complexity and model performance. More components may capture more variance but can also lead to overfitting. 