# Feature Engineering

This notebook performs feature engineering steps required for machine
learning models, including:

- Handling remaining missing values
- Feature scaling
- Dimensionality reduction using PCA

The resulting dataset will be used for model training.


In [1]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

In [2]:
df = pd.read_csv("../data/dataset_clean.csv")

df.head()

Unnamed: 0,temp_0_0,temp_1_0,temp_2_0,temp_3_0,temp_4_0,temp_5_0,temp_6_0,temp_7_0,temp_8_0,temp_9_0,...,V_L1_L2_S,V_L2_L3_S,V_L1_L3_S,A_L1_S,A_L2_S,A_L3_S,D_V_L1_L2,D_V_L2_L3,D_V_L1_L3,m_falla
0,27.4,27.7,27.7,27.4,27.4,27.6,27.6,27.1,27.1,27.6,...,213.3,211.4,211.0,3.16,4.13,4.06,0.2,0.4,0.3,Normal
1,27.3,27.6,27.6,27.3,27.3,27.5,27.5,27.2,27.2,27.6,...,213.3,211.5,211.3,3.18,4.13,4.07,0.1,0.3,0.2,Normal
2,27.2,27.5,27.5,27.1,27.1,27.2,27.2,27.2,27.2,27.7,...,213.4,211.6,211.2,3.17,4.14,4.06,0.1,0.1,0.3,Normal
3,27.8,27.7,27.7,27.8,27.8,27.7,27.7,28.0,28.0,28.0,...,213.4,211.5,211.0,3.17,4.14,4.07,0.1,0.3,0.3,Normal
4,28.4,28.2,28.2,28.2,28.2,28.3,28.3,28.1,28.1,28.2,...,213.3,211.5,211.1,3.16,4.13,4.07,0.2,0.3,0.3,Normal


In [3]:
X = df.drop(columns=["m_falla"])
y = df["m_falla"]

The target variable is separated from the feature matrix before
applying transformations.

In [4]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [5]:
pca = PCA(n_components=0.95, random_state=42)
X_pca = pca.fit_transform(X_scaled)

In [6]:
X_pca.shape

(33, 19)

In [7]:
df_pca = pd.DataFrame(X_pca)
df_pca["target"] = y.values

df_pca.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,target
0,-165.058597,-57.338742,0.532278,-30.405186,-16.486549,-22.648695,-80.087776,2.881088,-35.0109,28.998078,-9.634279,5.549117,80.165356,26.861281,24.329548,-45.410821,-1.40977,-75.924364,-4.151851,Normal
1,-132.817248,-52.719194,36.911171,82.682448,-43.828395,-31.213636,-70.71128,-3.863463,-98.525531,48.259934,-6.977621,2.00887,-56.139117,-19.479393,-33.776381,13.538516,5.632825,-59.285131,24.817515,Normal
2,-199.643745,-65.266953,-6.563071,-93.499363,12.268257,-3.901613,-66.120289,-18.40305,42.76287,-55.401883,7.117638,-11.855221,-17.088453,-8.142533,-14.606354,-21.324098,-12.344769,6.940182,-1.536305,Normal
3,-46.807759,-18.568754,118.700691,76.590198,-83.724714,-10.212285,5.97457,3.333007,116.642036,65.841764,-30.866047,-6.095412,8.916711,-30.072411,14.253818,-5.796079,33.323266,14.654182,82.860428,Normal
4,-117.671799,-67.738634,50.303439,253.178646,-73.435498,-29.592335,76.35472,-77.601207,-11.678098,-111.818566,74.374101,-42.834864,19.755066,26.484618,-3.981393,2.146357,6.368204,-4.460632,-5.005755,Normal


In [8]:
df_pca.to_csv("../data/dataset_final_pca.csv", index=False)

## Summary

Feature engineering was applied to prepare the dataset for machine
learning models. The final dataset includes scaled and PCA-transformed
features, ensuring reduced dimensionality and improved learning
efficiency.