# Theoritical Questions

1. What is a Support Vector Machine (SVM)?

- A supervised learning algorithm used for classification and regression.
- Finds the optimal hyperplane that best separates classes with maximum margin.
- Uses support vectors (critical data points) to define the decision boundary.
- Works well in high-dimensional spaces.

2. Hard Margin vs Soft Margin
- Hard Margin
  - Assumes data is perfectly linearly separable.
  - No misclassification allowed.
  - Risk of overfitting and sensitive to noise/outliers.
- Soft Margin
  - Allows some misclassification using slack variables.
  - Balances margin maximization & error minimization.
  - Controlled by the penalty parameter C.

3. Mathematical intuition behind SVM
- Maximize the margin (distance between hyperplane & closest points).
- Converts problem into a convex optimization task with constraints.
- Uses dot products to measure similarity between samples.
- Optimization ensures a unique global minimum.

4. Role of Lagrange Multipliers in SVM
- Transform constrained optimization → unconstrained dual problem.
- Help incorporate margin constraints into the objective function.
- Allow kernel trick by expressing solution in dot-product form.
- Non-zero multipliers correspond to support vectors.

5. What are Support Vectors?
- Data points closest to the decision boundary.
- Crucial in defining the optimal hyperplane.
- Only these points influence the model; others are irrelevant.
- Removing them changes the decision boundary.

6. What is a Support Vector Classifier (SVC)?
- SVM applied to classification problems.
- Finds a maximum-margin hyperplane separating classes.
- Supports linear and kernel-based separation.
- Tuned using C and kernel parameters.

7. What is a Support Vector Regressor (SVR)?
- SVM adapted for regression tasks.
- Uses an ε-insensitive tube to ignore small errors.
- Finds a function that fits within max deviation ε.
- Controlled by parameters C, ε, and kernel choice.

8. What is the Kernel Trick?
- Maps data to a higher-dimensional space without explicit transformation.
- Computes dot-products using kernel functions instead of coordinates.
- Enables SVM to learn non-linear boundaries.
- Common kernels: Linear, Polynomial, RBF.

9. Compare Linear, Polynomial, and RBF Kernel
- Linear
  - For linearly separable data.
  - Fastest and simplest.
  - Good for high-dimensional sparse data (e.g., text).
- Polynomial
  - Captures interaction features (degree controls complexity).
  - Useful for moderately non-linear patterns.
  - More expensive than linear.
- RBF (Gaussian)
  - Handles highly non-linear data.
  - Uses distance-based similarity.
  - Most widely used; flexible but can overfit.

10. Effect of the C parameter in SVM
- Controls trade-off between margin width & classification error.
- High C → low tolerance for misclassification (hard margin).
- Low C → more tolerance (soft margin, smoother boundary).
- Affects model complexity & generalization.

11. Role of Gamma in RBF Kernel SVM
- Controls how far influence of one training example reaches.
- High gamma → very narrow influence → risk of overfitting.
- Low gamma → wider influence → smoother decision boundary.
- Defines curvature of the separating boundary.

12. What is Naïve Bayes? Why is it “Naive”?
- A probabilistic classifier based on Bayes’ theorem.
- Predicts class with highest posterior probability.
- “Naïve” because it assumes all features are independent.
- Despite assumption, works well in real-world tasks.

13. What is Bayes’ Theorem?
- Formula to compute posterior probability from prior probability.
- Connects likelihood, prior, and evidence.
- Foundation for Bayesian classification.

14. Differences among Gaussian, Multinomial, Bernoulli Naive Bayes
- Gaussian NB
  - Assumes continuous features follow a normal distribution.
  - Used for numeric data.
- Multinomial NB
  - Used for discrete count data (e.g., word frequencies).
  - Best for NLP tasks like document classification.
- Bernoulli NB
  - Binary features (0/1: word present or not).
  - Works well in text classification with binary representation.

15. When to use Gaussian Naive Bayes
- When features are continuous (age, height, income).
- Data roughly follows a bell-curve distribution.
- Not suitable for word counts or binary text data.
- Works well even with small sample sizes.

16. Key assumptions of Naïve Bayes
- Features are conditionally independent given the class.
- All features contribute equally and independently.
- No correlations between inputs.
- Data follows the assumed distribution (Gaussian/Multinomial/Bernoulli).

17. Advantages & Disadvantages of Naïve Bayes
- Advantages
  - Fast, simple, and scalable.
  - Works well with small data and high-dimensional data.
  - Excellent performance in text classification.
- Disadvantages
  - Feature independence assumption rarely holds.
  - Poor with correlated features.
  - Zero-frequency issues (handled by Laplace smoothing).

18. Why is Naive Bayes good for text classification?
- Words occur independently → matches NB assumption well.
- Handles high-dimensional sparse data efficiently.
- Performs well even with small training data.
- Fast training and prediction.

19. Compare SVM and Naive Bayes
- SVM
  - Margin-based deterministic model.
  - Works well with complex boundaries & smaller datasets.
  - Needs tuning (C, gamma, kernel).

- Naive Bayes
  - Probabilistic model with strong independence assumptions.
  - Very fast and scalable; best for text data.
  - Performs poorly with correlated features.

20. How does Laplace Smoothing help in Naïve Bayes?
- Adds +1 to word counts to avoid zero probabilities.
- Prevents model from assigning zero likelihood to unseen features.
- Improves generalization for rare or missing words.
- Especially useful in text classification.
