"How would you handle a dataset with 1000 features but only 100 samples?"

"Your model performs well on train but poorly on test. Walk me through debugging."

"Compare bagging vs boosting - when would you choose each?"

"How do you explain your model to a non-technical stakeholder?"

"What's the difference between L1 and L2 regularization?"

"How do you validate time series models differently?"

"What metrics would you use for imbalanced classification?"

"How do you handle missing values differently for tree vs linear models?"

"When would you choose logistic regression over random forest?"

"How do you monitor models in production?"


Fraud detection: Highly imbalanced, need low false negatives

Recommendation system: Collaborative filtering vs content-based

Churn prediction: Time-series features, survival analysis elements

Price prediction: Handling outliers, domain constraints

Image classification: When to use CNN vs traditional ML

Text sentiment: Feature extraction from text

Customer segmentation: Unsupervised but often asked

A/B test analysis: Statistical significance, practical significance

Model deployment: API design, scaling considerations

Ethical AI: Detecting and mitigating bias

Core Concepts and Model Fundamentals
These questions test your understanding of the underlying principles of supervised learning.

Define supervised learning and contrast it with unsupervised learning.

What is the bias-variance tradeoff? Explain how it relates to model complexity, overfitting, and underfitting.

What is overfitting and underfitting? List specific techniques to combat each.

Why is a train/test split necessary? Describe more robust validation techniques like k-fold cross-validation.

Walk through the standard end-to-end workflow for a supervised learning project, from data collection to deployment.

Explain the "No Free Lunch" theorem and its implications for model selection.

Foundational Algorithms (Theory & Comparison)
You must understand the mechanics, assumptions, and use cases of core algorithms.

Linear Regression: State its assumptions. How do you interpret the coefficients? What is the closed-form solution?

Logistic Regression: Why is it a classification algorithm despite its name? How do you interpret the log-odds?

Decision Trees: Explain how a tree is built. What are Gini impurity and information gain/entropy? What are the advantages and critical weaknesses?

Support Vector Machines (SVM): What is the kernel trick and why is it powerful? Explain the role of the C parameter and support vectors.

K-Nearest Neighbors (K-NN): Why is it called a "lazy" learner? How do you choose *k* and the distance metric? Discuss the curse of dimensionality in this context.

Naïve Bayes: Why is it called "naïve"? State Bayes' Theorem and explain how the classifier uses it.

Advanced Algorithms and Ensembles
Modern interviews heavily feature these powerful techniques.

Random Forest: How does bagging (bootstrap aggregating) reduce variance? What is the out-of-bag (OOB) error and why is it useful? Explain feature importance.

Gradient Boosting Machines (GBM/XGBoost/LightGBM): Explain the boosting principle. What is the role of the learning rate and subsampling? How does it sequentially correct errors?

Compare and contrast Bagging and Boosting. When would you choose one over the other?

Explain the concept of stacking (stacked generalization).

Optimization, Loss, and Regularization
These mathematical concepts are key to training effective models.

Define common loss/cost functions: Mean Squared Error (MSE), Mean Absolute Error (MAE), Cross-Entropy/Log Loss, Hinge Loss.

Explain gradient descent. Compare batch, stochastic (SGD), and mini-batch variants.

What is regularization and why is it used? Explain the fundamental difference between L1 (Lasso) and L2 (Ridge) regularization. What is Elastic Net?

For neural networks: Explain dropout and batch normalization as regularization techniques.

Data Handling and Feature Engineering
Your ability to prepare data is as important as modeling skill.

How would you handle missing values in a dataset? Discuss different imputation methods.

How do you encode categorical variables? Compare label encoding, one-hot encoding, and target/mean encoding.

What is feature scaling and why is it critical for algorithms like SVM and K-NN? Compare normalization (Min-Max) and standardization (Z-score).

How do you address class imbalance in a classification problem? Discuss techniques like resampling (SMOTE) and class weighting.

Describe techniques for feature selection (filter, wrapper, embedded methods).

Model Evaluation and Metrics
Demonstrate you know how to properly assess a model's performance.

For a binary classification problem, define and explain the relationships between: Accuracy, Precision, Recall (Sensitivity), Specificity, and F1-Score.

What is a confusion matrix? Walk through its components (TP, TN, FP, FN).

Explain the ROC curve and AUC. What does an AUC of 0.5 vs. 0.9 signify?

For regression, explain MSE, RMSE, MAE, and R-squared (R²). When would you use MAE over MSE?

Why is accuracy often a misleading metric, and when should you avoid it?

Practical Implementation and System Design
These questions test your ability to apply knowledge in real-world scenarios.

Write pseudocode or Python (using numpy) to implement gradient descent for linear regression.

How would you debug a model that has high training accuracy but poor test accuracy?

Describe how you would deploy a trained model into a production environment. What considerations are there for monitoring and maintenance?

How does cross-validation differ for time-series data? Explain forward chaining/rolling window validation.

What is data leakage? How can you prevent it during the feature engineering and validation process?

Conceptual and Problem-Solving Questions
These open-ended questions test your deep understanding and critical thinking.

"How would you explain a [Random Forest / Gradient Boosting] model to a non-technical stakeholder?"

"If you could only choose one metric to evaluate your model, which would it be and why?" (Argument required)

"Describe a project where you used supervised learning. What was the biggest challenge and how did you overcome it?"

"When would you choose a simple linear model over a more complex ensemble method, even if the ensemble has higher accuracy?"

"What are the ethical considerations and risks of bias when building and deploying supervised learning models?"


Core Concepts
What is supervised vs unsupervised learning?

Explain the bias-variance tradeoff with examples.

What is overfitting? How to detect and prevent it?

What is underfitting? How to fix it?

What is the "No Free Lunch" theorem?

Explain the difference between parametric and non-parametric models.

What is the curse of dimensionality?

What is inductive bias in machine learning?

What is the difference between generative and discriminative models?

What is empirical risk minimization?

Probability & Statistics
Explain Bayes' Theorem and its application in ML.

What is conditional probability?

What are prior, likelihood, and posterior?

What is maximum likelihood estimation (MLE)?

What is maximum a posteriori (MAP) estimation?

What is the Central Limit Theorem?

What are Type I and Type II errors?

What is p-value and statistical significance?

What is confidence interval?

What is correlation vs causation?

Model Evaluation
What is cross-validation? Types?

What is train-validation-test split?

What is stratified sampling?

What is bootstrapping?

What is k-fold cross-validation vs leave-one-out?

Linear Regression
Derive the normal equations for linear regression.

What are the Gauss-Markov assumptions?

What is homoscedasticity vs heteroscedasticity?

What is multicollinearity and its effects?

How to interpret regression coefficients?

What is R-squared and adjusted R-squared?

What is the F-test in regression?

What is t-test for coefficients?

What are confidence intervals for predictions?

What is the difference between simple and multiple regression?

Advanced Regression
What is polynomial regression?

What is ridge regression (L2 regularization)?

What is lasso regression (L1 regularization)?

What is elastic net?

What is quantile regression?

What is Poisson regression?

What is logistic regression for classification?

What is ordinal regression?

What is robust regression?

What is isotonic regression?

Regression Metrics
What is MSE, RMSE, MAE?

What is MAPE, SMAPE?

What is R-squared vs adjusted R-squared?

What is AIC, BIC?

What is Mallows' Cp?

Logistic Regression
Why is it called logistic regression if used for classification?

What is the sigmoid function?

What is log-odds?

How to interpret logistic regression coefficients?

What is the decision boundary?

What is maximum likelihood estimation for logistic regression?

What is the Newton-Raphson method for optimization?

What is the convergence criteria?

Decision Trees
How does a decision tree work?

What is Gini impurity?

What is entropy and information gain?

What is information gain ratio?

What is the ID3 algorithm?

What is CART algorithm?

What is CHAID?

How to handle continuous variables?

How to handle categorical variables?

What is pruning? Types?

What is cost-complexity pruning?

What is the time complexity for building a tree?

Random Forest
What is bagging?

What is bootstrap sampling?

What is out-of-bag (OOB) error?

What is feature randomness?

How to calculate feature importance?

What is Gini importance vs permutation importance?

What is the effect of number of trees?

What is the effect of max_depth?

What is the effect of max_features?

How to tune random forest hyperparameters?

Gradient Boosting
What is boosting vs bagging?

What is gradient boosting?

What is the learning rate?

What is shrinkage?

What is stochastic gradient boosting?

What is XGBoost?

What is LightGBM?

What is CatBoost?

What is the difference between gradient boosting and AdaBoost?

What is the difference between gradient boosting and random forest?

Support Vector Machines
What is the maximum margin classifier?

What is the hard margin vs soft margin?

What is the C parameter?

What is the kernel trick?

What are linear, polynomial, RBF kernels?

What are support vectors?

What is the dual formulation?

What is the SMO algorithm?

What is ν-SVM?

What is one-class SVM?

Naïve Bayes
What is Bayes' theorem?

What is the naive assumption?

What is Gaussian Naïve Bayes?

What is Multinomial Naïve Bayes?

What is Bernoulli Naïve Bayes?

What is Laplace smoothing?

What is the zero-frequency problem?

What is the difference between Naïve Bayes and logistic regression?

What is the time complexity?

When is Naïve Bayes a good choice?

K-Nearest Neighbors
Why is KNN called lazy learning?

How to choose k?

What distance metrics?

What is the curse of dimensionality?

What is weighted KNN?

What is KD-tree?

What is ball tree?

What is locality sensitive hashing?

What is the time complexity?

What is the space complexity?

Other Classifiers
What is LDA?

What is QDA?

What is perceptron?

What is ADALINE?

What is MADALINE?


Gradient Descent
What is gradient descent?

What is batch gradient descent?

What is stochastic gradient descent?

What is mini-batch gradient descent?

What is learning rate?

What is momentum?

What is Nesterov accelerated gradient?

What is AdaGrad?

What is RMSProp?

What is Adam?

Loss Functions
What is MSE loss?

What is MAE loss?

What is Huber loss?

What is log loss (binary cross-entropy)?

What is categorical cross-entropy?

What is hinge loss?

What is exponential loss?

What is quantile loss?

What is custom loss functions?

What is differentiable vs non-differentiable loss?

Regularization
What is L1 regularization (Lasso)?

What is L2 regularization (Ridge)?

What is elastic net?

What is dropout?

What is early stopping?

What is batch normalization?

What is weight decay?

What is data augmentation?

What is noise injection?

What is label smoothing?

Classification Metrics
What is accuracy?

What is precision?

What is recall?

What is specificity?

What is F1-score?

What is Fβ-score?

What is MCC?

What is ROC curve?

What is AUC?

What is PR curve?

What is average precision?

What is log loss?

What is Brier score?

What is Cohen's kappa?

What is confusion matrix?

Regression Metrics
What is MSE?

What is RMSE?

What is MAE?

What is MAPE?

What is SMAPE?

What is R²?

What is adjusted R²?

What is explained variance?

What is MSLE?

What is RMSLE?

Model Selection
What is AIC?

What is BIC?

What is MDL?

What is cross-validation score?

What is train-test split?

What is time series split?

What is group k-fold?

What is stratified k-fold?

What is nested cross-validation?

What is leave-one-out?

Probability Calibration
What is calibration?

What is calibration curve?

What is Brier score?

What is Platt scaling?

What is isotonic regression?

Data Preprocessing
What is feature scaling?

What is normalization vs standardization?

What is robust scaling?

How to handle missing values?

What is imputation?

What is MICE?

What is KNN imputation?

How to handle outliers?

What is winsorization?

What is truncation?

Feature Engineering
What is one-hot encoding?

What is label encoding?

What is target encoding?

What is frequency encoding?

What is embedding?

What is feature crossing?

What is polynomial features?

What is interaction terms?

What is binning?

What is discretization?

Feature Selection
What is filter methods?

What is wrapper methods?

What is embedded methods?

What is recursive feature elimination?

What is forward selection?

What is backward elimination?

What is LASSO for feature selection?

What is mutual information?

What is chi-square test?

What is ANOVA F-test?

Dimensionality Reduction
What is PCA?

What is kernel PCA?

What is LDA for dimensionality reduction?

What is t-SNE?

What is UMAP?

What is autoencoder?

What is factor analysis?

What is NMF?

What is ICA?

What is random projection?

Imbalanced Learning
What is class imbalance?

What is oversampling?

What is undersampling?

What is SMOTE?

What is ADASYN?

What is cost-sensitive learning?

What is class weights?

What is threshold moving?

What is ensemble methods for imbalance?

What is anomaly detection approach?


Ensemble Methods
What is bagging?

What is boosting?

What is stacking?

What is blending?

What is voting classifier?

What is weighted voting?

What is super learner?

What is Bayesian model averaging?

What is model stacking implementation?

What is meta-features?

Time Series
What is time series cross-validation?

What is walk-forward validation?

What is expanding window vs sliding window?

What is autocorrelation?

What is partial autocorrelation?

What is stationarity?

What is differencing?

What is seasonal decomposition?

What is ARIMA?

What is SARIMA?

Bayesian Methods
What is Bayesian linear regression?

What is Bayesian logistic regression?

What is MCMC?

What is variational inference?

What is Gibbs sampling?

What is Hamiltonian Monte Carlo?

What is Bayesian optimization?

What is Gaussian processes?

What is Bayesian neural networks?

What is probabilistic programming?

Causal Inference
What is causal inference?

What is potential outcomes?

What is propensity score matching?

What is instrumental variables?

What is regression discontinuity?

What is difference-in-differences?

What is causal forests?

What is do-calculus?

What is directed acyclic graphs?

What is counterfactuals?


ML System Design
Design a fraud detection system.

Design a recommendation system.

Design a search ranking system.

Design an ad click prediction system.

Design a churn prediction system.

Design a price prediction system.

Design a spam detection system.

Design a sentiment analysis system.

Design an image classification system.

Design a speech recognition system.

Scalability
How to handle large datasets?

What is distributed training?

What is model parallelism?

What is data parallelism?

What is federated learning?

What is model compression?

What is quantization?

What is pruning?

What is knowledge distillation?

What is edge deployment?

Productionization
What is model deployment?

What is A/B testing?

What is canary deployment?

What is blue-green deployment?

What is model monitoring?

What is drift detection?

What is concept drift vs data drift?

What is model retraining?

What is CI/CD for ML?

What is ML pipelines?


Algorithm Implementation
Implement linear regression from scratch.

Implement logistic regression from scratch.

Implement decision tree from scratch.

Implement KNN from scratch.

Implement K-means from scratch.

Implement PCA from scratch.

Implement gradient descent from scratch.

Implement forward propagation for neural network.

Implement backpropagation from scratch.

Implement batch normalization.

Data Manipulation
Implement one-hot encoding.

Implement label encoding.

Implement feature scaling.

Implement train-test split.

Implement k-fold cross-validation.

Implement stratified sampling.

Implement bootstrap sampling.

Implement SMOTE.

Implement feature selection.

Implement dimensionality reduction.

Model Evaluation
Implement accuracy calculation.

Implement precision, recall, F1.

Implement ROC curve.

Implement AUC calculation.

Implement confusion matrix.

Implement MSE, RMSE, MAE.

Implement R-squared.

Implement cross-validation.

Implement grid search.

Implement random search.

Optimization
Implement gradient descent.

Implement SGD.

Implement momentum.

Implement Adam.

Implement L1 regularization.

Implement L2 regularization.

Implement dropout.

Implement early stopping.

Implement learning rate scheduler.

Implement weight initialization.



Tell me about your most challenging ML project.

Describe a time you failed with a model and how you recovered.

How do you stay updated with ML advancements?

Describe your approach to a new ML problem.

How do you explain complex models to non-technical stakeholders?

What is your experience with team collaboration on ML projects?

How do you handle conflicting opinions on model selection?

Describe your experience with production deployment.

How do you ensure model fairness and ethics?

What is your experience with version control for ML?

Case Studies
You have 1M rows, 10K features, 100 positive samples. How to build a model?

Model has 99% train accuracy, 60% test accuracy. Debug.

Feature importance shows unexpected results. Investigate.

Model performance dropped in production. Diagnose.

Need to reduce model size by 90% without significant accuracy loss.

Data has missing values in 50% of features. Handle.

Classes are imbalanced 99:1. Build classifier.

Features have different scales and distributions. Preprocess.

Model needs to make predictions in <10ms. Optimize.

Data arrives in streaming fashion. Update model.