# Top 100 Machine Learning Interview Questions & Answers [MAANG Edition]

Below is a categorized and comprehensive list of 100 machine learning questions and answers curated specifically for interviews at companies like Google, Amazon, Apple, Meta (Facebook), Netflix, and Microsoft.

---

## 📌 Basic Machine Learning Questions

1. **What is the difference between supervised and unsupervised learning?**
   - Supervised learning uses labeled data to train models.
   - Unsupervised learning uses unlabeled data to discover hidden patterns.

2. **What are the different types of Machine Learning?**
   - Supervised Learning
   - Unsupervised Learning
   - Semi-supervised Learning
   - Reinforcement Learning

3. **Define Overfitting and Underfitting.**
   - Overfitting: Model performs well on training but poorly on unseen data.
   - Underfitting: Model performs poorly on both training and test data.

4. **How do you prevent overfitting?**
   - Cross-validation
   - Regularization (L1/L2)
   - Pruning (for trees)
   - Dropout (for neural nets)
   - More training data

5. **What is a confusion matrix?**
   - It is a matrix that visualizes the performance of a classification model.

6. **What is precision and recall?**
   - Precision = TP / (TP + FP)
   - Recall = TP / (TP + FN)

7. **What is F1 Score?**
   - Harmonic mean of precision and recall.

8. **What are bias and variance?**
   - Bias: Error from erroneous assumptions.
   - Variance: Error from model sensitivity to small data fluctuations.

9. **What is the bias-variance tradeoff?**
   - You must balance bias and variance to minimize total error.

10. **What is cross-validation?**
   - Technique to assess model performance by partitioning data into folds.

11. **What is the difference between classification and regression?**
   - Classification: Predicts categories.
   - Regression: Predicts continuous values.

12. **What is a cost function?**
   - A function that measures how well a model is performing.

13. **What is gradient descent?**
   - Optimization algorithm used to minimize the cost function.

14. **What is the difference between batch and stochastic gradient descent?**
   - Batch: Uses full dataset per step.
   - Stochastic: Uses one sample per step.

15. **What is the role of learning rate in training?**
   - Controls the step size in gradient descent.

16. **What are epochs, batches, and iterations?**
   - Epoch: One full pass over data.
   - Batch: Subset of data.
   - Iteration: One update step.

17. **Explain the Curse of Dimensionality.**
   - High-dimensional data makes models less effective due to sparsity.

18. **What is feature engineering?**
   - The process of creating new input features from existing data.

19. **What is one-hot encoding?**
   - Converts categorical variables into binary vectors.

20. **What is normalization vs standardization?**
   - Normalization: Scales data to [0,1].
   - Standardization: Centers data with mean=0, std=1.

---

## 🔧 Intermediate Machine Learning Questions

21. **What is PCA (Principal Component Analysis)?**
22. **What is the elbow method in clustering?**
23. **What is silhouette score?**
24. **What is a ROC curve?**
25. **What is AUC - ROC?**
26. **What are precision-recall trade-offs?**
27. **What are the assumptions of linear regression?**
28. **Explain multicollinearity.**
29. **What is regularization?**
30. **Difference between L1 and L2 regularization?**
31. **What are decision trees?**
32. **Explain entropy and information gain.**
33. **What is pruning in decision trees?**
34. **What are ensemble methods?**
35. **Difference between bagging and boosting?**
36. **What is random forest?**
37. **What is gradient boosting?**
38. **Explain XGBoost.**
39. **What is cross-entropy loss?**
40. **What is log loss?**

...

(Continuation of questions from 41–100 will be categorized into Advanced, Algorithm-based, Deep Learning, and System Design related topics.)

---

## 🔗 Support GitHub Repositories

1. [andrewekhalel/MLQuestions](https://github.com/andrewekhalel/MLQuestions)
2. [alirezadir/Machine-Learning-Interviews](https://github.com/alirezadir/Machine-Learning-Interviews)
3. [youssefHosni/Data-Science-Interview-Questions-Answers](https://github.com/youssefHosni/Data-Science-Interview-Questions-Answers)
4. [chip-huyen/ml-interviews-book](https://huyenchip.com/ml-interviews-book/)
5. [job-interview-ml/awesome-ml-interview](https://github.com/job-interview-ml/awesome-ml-interview)
6. [data-science-interviews/algorithms](https://github.com/dipanjanS/data-science-interviews)
7. [ShubhankarRawat/ML-Interview-Prep](https://github.com/ShubhankarRawat/ML-Interview-Prep)
8. [ml-interview-handbook](https://github.com/chiphuyen/ml-interview-handbook)

---





# Top 100 Machine Learning Interview Questions & Answers [MAANG Edition]

Below is a categorized and comprehensive list of 100 machine learning questions and answers curated specifically for interviews at companies like Google, Amazon, Apple, Meta (Facebook), Netflix, and Microsoft.

---

## 📌 Basic Machine Learning Questions

1. **What is the difference between supervised and unsupervised learning?**
   - Supervised learning uses labeled data to train models.
   - Unsupervised learning uses unlabeled data to discover hidden patterns.

2. **What are the different types of Machine Learning?**
   - Supervised Learning
   - Unsupervised Learning
   - Semi-supervised Learning
   - Reinforcement Learning

3. **Define Overfitting and Underfitting.**
   - Overfitting: Model performs well on training but poorly on unseen data.
   - Underfitting: Model performs poorly on both training and test data.

4. **How do you prevent overfitting?**
   - Cross-validation
   - Regularization (L1/L2)
   - Pruning (for trees)
   - Dropout (for neural nets)
   - More training data

5. **What is a confusion matrix?**
   - It is a matrix that visualizes the performance of a classification model.

6. **What is precision and recall?**
   - Precision = TP / (TP + FP)
   - Recall = TP / (TP + FN)

7. **What is F1 Score?**
   - Harmonic mean of precision and recall.

8. **What are bias and variance?**
   - Bias: Error from erroneous assumptions.
   - Variance: Error from model sensitivity to small data fluctuations.

9. **What is the bias-variance tradeoff?**
   - You must balance bias and variance to minimize total error.

10. **What is cross-validation?**
   - Technique to assess model performance by partitioning data into folds.

11. **What is the difference between classification and regression?**
   - Classification: Predicts categories.
   - Regression: Predicts continuous values.

12. **What is a cost function?**
   - A function that measures how well a model is performing.

13. **What is gradient descent?**
   - Optimization algorithm used to minimize the cost function.

14. **What is the difference between batch and stochastic gradient descent?**
   - Batch: Uses full dataset per step.
   - Stochastic: Uses one sample per step.

15. **What is the role of learning rate in training?**
   - Controls the step size in gradient descent.

16. **What are epochs, batches, and iterations?**
   - Epoch: One full pass over data.
   - Batch: Subset of data.
   - Iteration: One update step.

17. **Explain the Curse of Dimensionality.**
   - High-dimensional data makes models less effective due to sparsity.

18. **What is feature engineering?**
   - The process of creating new input features from existing data.

19. **What is one-hot encoding?**
   - Converts categorical variables into binary vectors.

20. **What is normalization vs standardization?**
   - Normalization: Scales data to [0,1].
   - Standardization: Centers data with mean=0, std=1.

---

## 🔧 Intermediate Machine Learning Questions

21. **What is PCA (Principal Component Analysis)?**  
    - A linear dimensionality reduction technique that projects data onto orthogonal axes (principal components) maximizing variance, often used to remove multicollinearity and reduce noise.

22. **What is the elbow method in clustering?**  
    - A heuristic for selecting K in K-Means by plotting WCSS (within-cluster sum of squares) vs. K and choosing the point (‘elbow’) where diminishing returns start.

23. **What is the silhouette score?**  
    - A metric measuring how similar an object is to its own cluster versus other clusters, ranging from -1 to 1; higher is better.

24. **What is a ROC curve?**  
    - A plot of True Positive Rate (TPR) vs. False Positive Rate (FPR) at various classification thresholds, showing trade-offs.

25. **What is AUC-ROC?**  
    - The area under the ROC curve, summarizing the model’s ability to distinguish classes; 1.0 is perfect, 0.5 is random.

26. **What are precision-recall trade-offs?**  
    - Increasing threshold raises precision but lowers recall, and vice versa; useful when classes are imbalanced.

27. **What are the assumptions of linear regression?**  
    - Linearity, independence of errors, homoscedasticity, normality of residuals, and no multicollinearity.

28. **Explain multicollinearity.**  
    - When two or more predictors are highly correlated, inflating coefficient variances and making estimates unstable.

29. **What is regularization?**  
    - A technique to prevent overfitting by adding a penalty term (L1 or L2) to the loss function.

30. **Difference between L1 and L2 regularization?**  
    - L1 (Lasso) can shrink coefficients to zero (feature selection); L2 (Ridge) shrinks coefficients uniformly but keeps all features.

31. **What are decision trees?**  
    - Non-parametric models splitting data by feature thresholds, constructing a tree of decisions for classification/regression.

32. **Explain entropy and information gain.**  
    - Entropy measures impurity; information gain is the reduction in entropy when splitting on a feature.

33. **What is pruning in decision trees?**  
    - The process of trimming branches to reduce complexity and overfitting, via cost-complexity or reduced-error pruning.

34. **What are ensemble methods?**  
    - Techniques combining multiple models to improve performance (e.g., Bagging, Boosting, Stacking).

35. **Difference between bagging and boosting?**  
    - Bagging builds models independently in parallel to reduce variance; Boosting builds sequentially to reduce bias.

36. **What is Random Forest?**  
    - An ensemble of decision trees trained on bootstrapped samples with random feature subsets at each split, reducing variance.

37. **What is gradient boosting?**  
    - Sequential ensemble method where each new model fits the residuals (errors) of the previous ensemble to minimize loss.

38. **Explain XGBoost.**  
    - An optimized gradient boosting library with regularization, parallel tree construction, and efficient handling of missing values.

39. **What is cross-entropy loss?**  
    - A loss function measuring the dissimilarity between true labels and predicted probabilities, commonly used in classification.

40. **What is log loss?**  
    - Another name for cross-entropy loss in binary classification, penalizing confident but wrong predictions heavily.

## 🚀 Advanced Machine Learning Questions

41. **What is Support Vector Machine (SVM)?**  
    - A classifier that finds the hyperplane maximizing margin between classes; uses kernels to handle non-linear boundaries.

42. **What are common kernel functions in SVM?**  
    - Linear, polynomial, RBF (Gaussian), and sigmoid kernels for mapping data into higher dimensions.

43. **How do you perform hyperparameter tuning?**  
    - Techniques like grid search, random search, and Bayesian optimization with cross-validation to select optimal parameters.

44. **What are cross-validation types?**  
    - K-Fold, stratified K-Fold, Leave-One-Out (LOOCV), and time-series split for sequential data.

45. **What is SMOTE?**  
    - Synthetic Minority Over-sampling Technique that generates synthetic samples for minority classes to address class imbalance.

46. **Explain time-series forecasting methods.**  
    - Models like ARIMA, SARIMA, exponential smoothing, and Prophet that account for trends and seasonality.

47. **What is an LSTM?**  
    - A type of RNN with forget, input, and output gates to capture long-term dependencies and mitigate vanishing gradients.

48. **What is the Attention mechanism?**  
    - A method that weights input elements differently, allowing models to focus on relevant parts of the sequence.

49. **What are Transformers?**  
    - Architectures built entirely on self-attention mechanisms enabling parallel sequence processing (e.g., BERT, GPT).

50. **Explain dropout.**  
    - A regularization technique that randomly disables neurons during training to prevent co-adaptation and overfitting.

51. **What is batch normalization?**  
    - Normalizes layer inputs per mini-batch, accelerating training and improving stability.

52. **Over-sampling vs. Under-sampling?**  
    - Over-sampling duplicates or synthetically creates minority samples; under-sampling removes majority samples.

53. **Feature selection vs. Feature extraction?**  
    - Selection chooses a subset of original features; extraction transforms data into new features (e.g., PCA).

54. **Describe A/B testing.**  
    - Controlled experiments comparing two variants to determine which performs better using statistical significance tests.

55. **What is Variance Inflation Factor (VIF)?**  
    - A measure of multicollinearity; VIF > 5–10 indicates high correlation among predictors.

56. **What is churn prediction?**  
    - Predicting customer attrition using classification models and features like usage patterns.

57. **What is Mean Average Precision (mAP)?**  
    - The average precision across recall levels, used in object detection and ranking tasks.

58. **What affects cross-validation variance?**  
    - Data size, number of folds, and randomness in fold assignment.

59. **Explain bootstrapping.**  
    - Resampling with replacement to estimate statistics (e.g., in Bagging or confidence intervals).

60. **What is manifold learning?**  
    - Non-linear dimensionality reduction methods (t-SNE, Isomap) preserving local or global structure.

61. **What are Bayesian Networks?**  
    - Probabilistic graphical models representing variables and their conditional dependencies via a DAG.

62. **What is Gaussian Mixture Model (GMM)?**  
    - A probabilistic clustering model that assumes data is generated from a mixture of Gaussian distributions.

63. **Explain the Expectation-Maximization (EM) algorithm.**  
    - Iterative method to find maximum likelihood estimates for models with latent variables (e.g., GMM).

64. **What is a Hidden Markov Model (HMM)?**  
    - A statistical model describing systems that are Markov processes with unobserved (hidden) states.

65. **Types of Reinforcement Learning?**  
    - Model-free (Q-Learning, SARSA) and model-based (planning using learned transition models).

66. **Explain Q-Learning.**  
    - An off-policy RL algorithm learning the optimal action-value function using Bellman updates.

67. **What are Policy Gradient methods?**  
    - On-policy RL algorithms (REINFORCE, Actor-Critic) directly optimizing the policy’s parameters via gradient ascent.

68. **Explain model capacity and its relation to bias/variance.**  
    - Capacity refers to a model’s complexity; low capacity leads to underfitting (high bias), high capacity to overfitting (high variance).

69. **What is label leakage?**  
    - When training data includes information that will not be available at prediction time, inflating performance.

70. **What is model calibration?**  
    - Adjusting model output probabilities to reflect true likelihoods (e.g., using Platt scaling or isotonic regression).

## 🤖 Deep Learning Questions

71. **What are Convolutional Neural Networks (CNNs)?**  
    - Architectures using convolutional layers to extract spatial hierarchies of features from images.

72. **ReLU vs. Sigmoid vs. Tanh activation?**  
    - ReLU: fast convergence, sparse activation; Sigmoid/Tanh: saturate and cause vanishing gradients.

73. **Common optimizers: Adam, RMSprop, Momentum?**  
    - Adam: adaptive moments; RMSprop: adaptive learning rates; Momentum: smooths updates by accumulating gradients.

74. **Vanishing vs. Exploding gradients?**  
    - Gradients shrink or blow up during backprop, hindering learning in deep networks.

75. **Mitigation techniques?**  
    - Gradient clipping, residual connections (ResNet), batch/layer normalization.

76. **What is ResNet?**  
    - Deep CNN using skip connections to allow identity mappings and alleviate degradation in very deep models.

77. **Explain autoencoders.**  
    - Neural networks trained to reconstruct inputs via a lower-dimensional bottleneck for representation learning.

78. **What are Variational Autoencoders (VAEs)?**  
    - Probabilistic autoencoders learning latent distributions, enabling generative sampling.

79. **What are Generative Adversarial Networks (GANs)?**  
    - Two-module networks (generator vs. discriminator) trained adversarially to generate realistic data.

80. **What are Conditional GANs?**  
    - GANs where both generator and discriminator receive auxiliary information (labels).

81. **Object detection methods?**  
    - Two-stage (Faster R-CNN) vs. one-stage (YOLO, SSD) detectors balancing speed and accuracy.

82. **Semantic vs. Instance Segmentation?**  
    - Semantic: classifies each pixel; Instance: distinguishes individual object instances.

83. **Sequence-to-sequence models?**  
    - Encoder-decoder architectures for tasks like translation, often enhanced with attention mechanisms.

84. **What is beam search?**  
    - A heuristic search retaining the top-k most probable sequences at each time step.

85. **Word embeddings: Word2Vec vs. GloVe?**  
    - Word2Vec: predictive model learning via context windows; GloVe: counts-based model capturing global co-occurrence.

86. **What is transfer learning?**  
    - Fine-tuning pre-trained models on new tasks to leverage learned features and reduce training time.

87. **What is one-shot learning?**  
    - Learning from one or few examples using architectures like Siamese networks and meta-learning.

88. **What are capsule networks?**  
    - Networks grouping neurons into capsules to preserve hierarchical pose relationships in visual data.

89. **What are Graph Neural Networks (GNNs)?**  
    - Neural networks operating on graph-structured data via message passing between nodes.

90. **What is Vision Transformer (ViT)?**  
    - Applies transformer architecture to image patches for global self-attention in vision tasks.

## 🏗️ System Design & ML Pipelines

91. **Design a recommendation system.**  
    - Use collaborative filtering (user-item matrix, SVD) and content-based filtering (item features), with a real-time serving layer and feedback loop.

92. **Design an anomaly detection pipeline.**  
    - Data ingestion → feature engineering → model (e.g., Isolation Forest, autoencoder) → thresholding → alerting system.

93. **How to deploy an ML model?**  
    - Containerize (Docker), serve via REST/gRPC, use CI/CD (Jenkins/GitHub Actions), and autoscale with Kubernetes.

94. **A/B testing for ML features?**  
    - Randomly split traffic, track metrics, use statistical tests (t-test, chi-square) to determine significance.

95. **Monitoring models in production.**  
    - Track data drift, concept drift, latency, throughput, error rates; use tools like Prometheus, Grafana.

96. **What is a feature store?**  
    - Centralized repository for storing, sharing, and serving features with consistency between training and serving.

97. **Data labeling pipeline.**  
    - Raw data collection → annotation tool (e.g., Labelbox) → quality assurance → versioned dataset storage.

98. **Online vs. Offline inference?**  
    - Online: real-time predictions with low latency; Offline: batch predictions on large datasets.

99. **ML orchestration tools?**  
    - Airflow, Kubeflow, Prefect for pipeline scheduling and management.

100. **Scalability challenges in ML?**  
    - Handling large data volumes, ensuring low-latency inference, distributed training, feature store consistency.


## 🔗 Support Resources

**GitHub Repositories:**
1. [andrewekhalel/MLQuestions](https://github.com/andrewekhalel/MLQuestions)
2. [alirezadir/Machine-Learning-Interviews](https://github.com/alirezadir/Machine-Learning-Interviews)
3. [youssefHosni/Data-Science-Interview-Questions-Answers](https://github.com/youssefHosni/Data-Science-Interview-Questions-Answers)
4. [chiphuyen/ml-interview-handbook](https://github.com/chiphuyen/ml-interview-handbook)
5. [job-interview-ml/awesome-ml-interview](https://github.com/job-interview-ml/awesome-ml-interview)
6. [dipanjanS/data-science-interviews](https://github.com/dipanjanS/data-science-interviews)
7. [ShubhankarRawat/ML-Interview-Prep](https://github.com/ShubhankarRawat/ML-Interview-Prep)
8. [huyenchip/ml-interviews-book](https://huyenchip.com/ml-interviews-book/)

**Free Medium Blogs:**
- [12 Machine Learning Interview Questions You Should Be Ready For](https://medium.com/swlh/12-machine-learning-interview-questions-you-should-be-ready-for-3e5fe144a9c8)
- [Top 10 Machine Learning Interview Questions & Answers in 2021](https://medium.com/@yarseypaul/top-10-machine-learning-interview-questions-answers-in-2021-f7cdc5a3a2b9)
- [100 Machine Learning Interview Questions](https://medium.com/analytics-vidhya/100-machine-learning-interview-questions-5e5fc1a6f1d6)
- [Ultimate Guide to ML Interviews at FAANG](https://medium.com/@reachpriyaa/how-to-crack-machine-learning-interviews-at-faang-78a2882a05c5)




1. **Hands‑On Machine Learning with Scikit‑Learn, Keras & TensorFlow (3rd Ed.)**  
   Aurelien Géron’s Jupyter‑notebook companion to his bestselling book, guiding you step‑by‑step through real‑world ML pipelines, from data preprocessing to deep learning with TensorFlow 2.  
   ↳ https://github.com/ageron/handson-ml3 citeturn0search2  

2. **Python Data Science Handbook** by Jake VanderPlas  
   The definitive, **free** online book in notebook form, covering IPython, NumPy, Pandas, Matplotlib, Scikit‑Learn, and essential DS workflows. Perfect for grasping core DS tools.  
   ↳ https://github.com/jakevdp/PythonDataScienceHandbook citeturn1search0  

3. **fastai (v2+)**  
   High‑level deep learning library + associated free MOOC (“Practical Deep Learning for Coders”): learn modern DL best practices with concise code, built on PyTorch. Great for rapid prototyping and research.  
   ↳ https://github.com/fastai/fastai citeturn2search0  

4. **Machine Learning for Beginners** (Microsoft)  
   A **12‑week, 26‑lesson** classic ML curriculum using Scikit‑Learn, with quizzes, assignments, and real‑world datasets—ideal for structured, project‑based learning.  
   ↳ https://github.com/microsoft/ML-For-Beginners citeturn3search0  

5. **Machine Learning with PyTorch and Scikit‑Learn** by Sebastian Raschka et al.  
   Code for the 2022 Packt book: hands‑on implementations of ML and DL algorithms using PyTorch + Scikit‑Learn, including advanced topics like GNNs, GANs, and reinforcement learning.  
   ↳ https://github.com/rasbt/machine-learning-book citeturn6search2  

6. **NYU Deep Learning Spring 2020 (NYU‑DLSP20)** by Alfredo Canziani & Yann LeCun  
   Complete set of lecture notebooks (convnets, RNNs, VAEs, Transformers, energy‑based models) used in NYU’s graduate‑level DL course—open‑ended, research‑oriented, and extremely thorough.  
   ↳ https://github.com/Atcold/NYU-DLSP20 citeturn5search0  

---
