# Feature Engineering

This notebook covers the creation of new features and the selection of the most important features for the AI4I 2020 Predictive Maintenance Dataset using Partial Least Squares (PLS) regression.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from features import load_data, preprocess_data, select_features, plot_feature_importance

# Set plot style
sns.set(style='whitegrid')

## Load and Preprocess the Dataset

In [None]:
# Load and preprocess the dataset
data = load_data('../data/ai4i2020.csv')
X_train, X_test, y_train, y_test, feature_names = preprocess_data(data)
data.head()

## PLS Regression for Feature Selection

In [None]:
# Select features using PLS regression
X_train_selected, X_test_selected, selected_features_mask = select_features(X_train, y_train, X_test)
selected_features = feature_names[selected_features_mask]
selected_features

## Visualize Selected Features

In [None]:
# Plot the importance of selected features
from sklearn.cross_decomposition import PLSRegression
pls = PLSRegression(n_components=10)
pls.fit(X_train, y_train)
plot_feature_importance(pls, selected_features)

## Save the Selected Features

In [None]:
# Save the selected features data
np.savez('../data/selected_features_data.npz', X_train=X_train_selected, X_test=X_test_selected, y_train=y_train, y_test=y_test)

## Conclusion

In this notebook, we have performed feature selection using Partial Least Squares (PLS) regression, visualized the importance of the selected features, and saved the selected features data for model training.