<font color="Yellow" size="6"> Feature Union</font>

FeatureUnion allows you to apply multiple transformers to the same dataset in parallel and then combine their results. It is useful when you want to extract different kinds of features (e.g., scaled features, dimensionality-reduced features) from the data and combine them into one unified output that can be fed into a model.

Using FeatureUnion to Apply PCA and Scaling Simultaneously

In this example, we'll apply two transformers in parallel using FeatureUnion:

    PCA (Principal Component Analysis) to reduce the dimensionality.
    StandardScaler to standardize the data (i.e., scale the features).

In [1]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.metrics import accuracy_score

# Load dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define FeatureUnion with two transformers: StandardScaler and PCA
feature_union = FeatureUnion([
    ('scaler', StandardScaler()),  # Step 1: Apply StandardScaler
    ('pca', PCA(n_components=2))   # Step 2: Apply PCA for dimensionality reduction
])

# Define a pipeline with FeatureUnion for feature extraction and RandomForestClassifier for modeling
pipeline = Pipeline([
    ('features', feature_union),      # Step 1: Apply FeatureUnion
    ('classifier', RandomForestClassifier(random_state=42))  # Step 2: Classifier step
])

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Make predictions and evaluate the model
y_pred = pipeline.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0
