
What is feature engineering? Why is it called the "secret sauce" of ML?

Explain the difference between feature selection, feature extraction, and feature creation.

Why is feature engineering more critical than algorithm choice in many cases?

What's the 80/20 rule in ML? (80% data preparation, 20% modeling)


Numerical Features
How do you handle outliers in numerical features? (Winsorizing, trimming, transformation)

What are scaling methods and when to use each?

Min-Max Scaling: When to use? (Neural networks, values in specific range)

Standardization (Z-score): When to use? (Models assuming Gaussian)

Robust Scaling: When to use? (With outliers)

MaxAbs Scaling: When to use?

Categorical Features
Compare one-hot encoding vs label encoding vs target encoding

When would you use each?

What is the dummy variable trap and how to avoid it?

How to handle high-cardinality categorical features? (Target encoding, frequency encoding, embedding)

DateTime Features
What features can you extract from timestamps?

Examples: hour, day, month, year, dayofweek, weekend flag, holiday flag

How to handle cyclical features? (sin/cos transformation)

Time since specific event features

Text Features
Bag of Words vs TF-IDF vs Word Embeddings

When to use n-grams?

How to handle text length, word count, readability scores?


Polynomial Features
When to create interaction terms and polynomial features?

What's the risk of polynomial features? (Curse of dimensionality, overfitting)

How to select meaningful interactions?

Binning/Discretization
Why bin continuous features? (Handle non-linear relationships, outliers)

Methods: Equal-width, equal-frequency, k-means, decision tree based

How to choose number of bins?

Aggregation Features
Creating features by aggregating across:

Time windows (rolling means, expanding stats)

Groups (customer average, product median)

Related entities (user's average purchase)

Domain-Specific Features
Finance: Moving averages, volatility measures, returns

E-commerce: Time since last purchase, purchase frequency, monetary value

Healthcare: BMI, combined lab ratios, vital sign trends

Log Transformation
When to use log transform? (Right-skewed data, multiplicative relationships)

What about zero values? (log(x+1))

Interpretability of log-transformed features

Box-Cox & Yeo-Johnson
What are power transformations and when to use them?

Difference between Box-Cox (positive values) and Yeo-Johnson (any values)

Rank Transformation
When to use rank features? (Outliers, non-linear but monotonic relationships)


Numerical Missing Data
Mean/Median/Mode imputation

KNN imputation

Regression imputation

Add "missing" flag

Categorical Missing Data
Mode imputation

"Missing" as a new category

Predictive imputation

Advanced Methods
MICE (Multiple Imputation by Chained Equations)

Deep learning imputation

When to drop vs impute?


Filter Methods
Correlation analysis (Pearson, Spearman)

Chi-square test (categorical-categorical)

ANOVA F-test (numerical-categorical)

Mutual information

Wrapper Methods
Forward selection

Backward elimination

Recursive feature elimination (RFE)

Pros/cons of wrapper methods

Embedded Methods
L1 regularization (Lasso)

Tree-based feature importance

Permutation importance

Interview Questions on Selection
"How many features are too many?" (Rule of thumb: 10x samples per feature)

"What would you do if you have 1000 features but only 100 samples?"

"How to handle correlated features in selection?"


PCA vs t-SNE vs UMAP: When to use each?

Autoencoders for feature extraction

When to use dimensionality reduction vs feature selection?


Lag features (t-1, t-2, ...)

Rolling statistics (mean, std, min, max over window)

Expanding window features

Difference features (t - t-1)

Seasonal decomposition features


Feature Crossing
Manual feature crossing vs automated (FM, Deep&Wide)

Example: Age × Income, Location × Time

Cluster Features
Using clustering results as features (k-means labels)

Distance to cluster centroids as features

Text-Specific
Sentiment scores

Named Entity Recognition features

Readability scores

Embedding averages

"What is the curse of dimensionality and how does feature engineering help?"

"How does feature scaling affect different algorithms?"

"Explain bias-variance tradeoff in context of feature engineering"

"What is target leakage and how to avoid it in feature engineering?"


"Tell me about a time when feature engineering significantly improved your model"

"Describe your process for feature engineering under time constraints"

"How do you collaborate with domain experts for feature engineering?"

"How do you document and version your features?"


"You have 5000 features and 1000 samples. What do you do?"

"How would you engineer features for a dataset with 80% missing values?"

"What if your test data has categories not seen in training?"

"How to handle features with different scales and distributions together?"


"How to make feature engineering efficient for large datasets?"

"Batch vs real-time feature engineering"

"Feature stores: What are they and why use them?"

"Monitoring feature distributions in production"


Automated Feature Engineering (FeatureTools, AutoFeat)

Deep Feature Synthesis

Neural Network embeddings as features

Transfer learning features (using pre-trained models)

s
Pitfalls to Avoid
Target leakage (using future information)

Overfitting during feature engineering

Not considering computational cost

Ignoring feature interpretability

Not validating on holdout set

Best Practices
Always split data before any feature engineering

Use cross-validation for feature selection

Document every feature's source and logic

Monitor feature drift in production

Start simple, then add complexity


"How do you use scikit-learn pipelines for feature engineering?"

"Feature engineering in TensorFlow vs PyTorch"

"Spark for large-scale feature engineering"

Handle missing values

Encode categorical variables

Scale/normalize numerical features

Handle outliers

Create interaction features

Generate polynomial features

Extract datetime components

Create aggregate features

Apply transformations (log, sqrt)

Reduce dimensionality if needed

Split data before any engineering

Validate feature importance

Document all features





