# Day 1 – Machine Learning Interview Q&A (with extra context)

---

### Q1. What is the difference between AI, Data Science, ML, and DL?

**Answer:**
- **Artificial Intelligence (AI):**
  - Big umbrella concept → Machines mimicking human intelligence (reasoning, decision making, natural language understanding).
  - Two types: **General AI** (human-like, not yet achieved) and **Applied AI** (specific tasks like self-driving cars).
  - Example: Siri or Alexa (speech + decision making).

- **Machine Learning (ML):**
  - Subset of AI → Focused on teaching machines to **learn from data** instead of explicit programming.
  - Example: Netflix recommending movies by learning your past choices.

- **Deep Learning (DL):**
  - Subset of ML → Uses **neural networks with many layers**.
  - Good for unstructured data like **images, audio, video, text**.
  - Example: Face recognition on Facebook.

- **Data Science:**
  - Broader field involving **data collection, cleaning, visualization, statistics, ML, deployment**.
  - Not only models, but also storytelling with data, dashboards, decision systems.
  - Example: A data scientist analyzing customer behavior using SQL, Python, ML models, and then making a business recommendation.

👉 **Hierarchy to remember:**  
AI → ML → DL  
Data Science overlaps with all because it deals with **end-to-end handling of data and insights**.

---

### Q2. What is the difference between Supervised, Unsupervised, and Reinforcement Learning?

**Answer:**
- **Supervised Learning:**
  - Trained on **labeled data** (input → known output).
  - Model learns mapping between X → Y.
  - Examples:
    - Regression: Predicting house prices.
    - Classification: Spam vs. not spam.

- **Unsupervised Learning:**
  - Works on **unlabeled data** (only inputs, no outputs).
  - Goal = discover patterns, groups, or structure.
  - Examples:
    - Clustering: Grouping customers by spending habits.
    - Anomaly detection: Fraud detection.

- **Reinforcement Learning:**
  - Agent learns by **trial and error** in an environment.
  - Receives **rewards or penalties** for actions.
  - Goal = maximize long-term reward.
  - Examples:
    - AlphaGo beating humans in Go.
    - Robot learning to walk.

👉 Mnemonic:
- Supervised → Teacher with answers.  
- Unsupervised → No teacher, just structure.  
- Reinforcement → Learn by trial & error with feedback.

---

### Q3. Describe the general architecture of Machine Learning.

**Answer:**
Machine learning is not just about training models. It’s an **end-to-end pipeline**:

1. **Business Understanding:**
   - Define the problem clearly (e.g., “predict churn”).
   - Know the domain (finance, healthcare, retail).

2. **Data Collection & Understanding:**
   - Gather data from databases, APIs, logs.
   - Perform EDA (exploratory data analysis) → find missing values, distributions.

3. **Feature Engineering & Selection:**
   - Scale/normalize numerical features.
   - Encode categorical variables.
   - Drop irrelevant features using correlation, PCA, or domain knowledge.

4. **Model Training:**
   - Choose algorithms (linear regression, decision trees, neural nets).
   - Train on training data.

5. **Model Evaluation:**
   - Metrics: accuracy, precision, recall (classification); RMSE, MAE (regression).
   - Use cross-validation to ensure robustness.

6. **Hyperparameter Tuning:**
   - Improve model via GridSearch, RandomSearch, Bayesian optimization.

7. **Deployment:**
   - Serve the model (API, cloud, mobile).
   - Example: recommendation engine in production.

8. **Monitoring:**
   - Track drift → if new data distribution changes, retrain model.

👉 **Lifecycle mnemonic:** Define → Collect → Prepare → Train → Evaluate → Deploy → Monitor

---

### Q4. What is Linear Regression?

**Answer:**
- Linear regression is a supervised ML algorithm to predict **continuous values**.
- It assumes a **linear relationship** between inputs and outputs.
- Equation:
  \[
  Y = \beta_0 + \beta_1X + \epsilon
  \]
- **Simple Linear Regression:** one input feature.  
- **Multiple Linear Regression:** multiple input features.

**Example:**
Predicting salary (Y) from years of experience (X).

```python
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1],[2],[3],[4]])   # years
y = np.array([30, 35, 40, 45])    # salary in k

model = LinearRegression().fit(X, y)
print("Prediction for 5 years exp:", model.predict([[5]])[0])
````

---

### Q5. What is Ordinary Least Squares (OLS)?

**Answer:**

* OLS is the method used to find the best-fit line in linear regression.
* It minimizes the **sum of squared residuals (errors)**:

  $$
  \text{Error} = \sum (y_i - \hat{y_i})^2
  $$
* OLS gives estimates for coefficients β that minimize this error.
* In Python (using statsmodels), OLS also gives:

  * **t-values, p-values** → to check which features are statistically significant.
  * Helps in **feature selection**.

---

### Q6. What is L1 Regularization (Lasso)?

**Answer:**

* Lasso adds **absolute values of coefficients** as penalty to loss function.
* Effect:

  * Shrinks less important feature coefficients to **0**.
  * Does **feature selection** automatically.
* Useful when dataset has many irrelevant features.
* Good for **sparse models**.

👉 Think of Lasso as a model that “keeps only important features and ignores the rest.”

---

### Q7. What is L2 Regularization (Ridge Regression)?

**Answer:**

* Ridge adds **squared values of coefficients** as penalty.
* Effect:

  * Shrinks coefficients closer to zero but **never exactly zero**.
  * Keeps all features but reduces their influence.
* Good when all features are useful but we want to avoid overfitting.
* Not robust to outliers (because large errors are squared).

👉 Think of Ridge as a model that “uses all features but controls their influence.”

---

### Q8. What is R-squared? Where to use and where not?

**Answer:**

* R² measures how much variance in target is explained by model.
* Formula:

  $$
  R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
  $$
* Range: 0 → 1

  * 0 → model explains nothing.
  * 1 → model explains everything.
* **Limitation:** Always increases with more features, even if useless.
* **Adjusted R²:** Penalizes addition of irrelevant features → better for comparing models.

---

### Q9. What is Mean Squared Error (MSE)?

**Answer:**

* MSE = average of squared differences between actual and predicted.
* Formula:

  $$
  MSE = \frac{1}{n}\sum (y_i - \hat{y_i})^2
  $$
* Properties:

  * Penalizes large errors heavily.
  * Sensitive to outliers.
* RMSE = √MSE → interpretable in same units as target.

**Example:**
If true values = \[2, 4], predicted = \[3, 5],
Errors = \[−1, −1] → Squared = \[1, 1] → MSE = 1.

---

### Q10. Why Support Vector Regression (SVR)? Difference from Linear Regression?

**Answer:**

* **Linear Regression:** Fits a straight line by minimizing overall error.
* **SVR:** Tries to fit a line (or hyperplane) such that most points lie within a margin of tolerance (epsilon). Only points outside margin (support vectors) influence the model.
* Concepts:

  * **Hyperplane:** Best fit line.
  * **Epsilon (ε):** Error margin where small deviations are ignored.
  * **Support Vectors:** Critical points that define boundary.
  * **Kernel:** Enables SVR to handle non-linear relationships.

👉 Use SVR when:

* You want to allow small tolerance in errors.
* Data is non-linear (use kernels).

```

---

