# QUIZ: SUPPORT VECTOR MACHINE
---

## Q1. Which kernel is most suitable for capturing complex, non-linear relationships in data? 
1. Linear kernel 
2. Polynomial kernel 
3. Gaussian (RBF) kernel 
4. identity kernel

The most suitable kernel for capturing complex, non-linear relationships in data is:

**3. Gaussian (RBF) kernel**

Explanation:

* The **Gaussian Radial Basis Function (RBF)** kernel can model very flexible and complex non-linear patterns by mapping input features into an infinite-dimensional space.
* The **Linear kernel** works well for linear relationships only.
* The **Polynomial kernel** can capture some non-linearity but might be limited for very complex data unless you use a very high degree.
* The **Identity kernel** is not commonly used and does not transform the data to capture non-linearity.

So, **Gaussian (RBF) kernel** is the best choice here.


## Q2. What is the main advantage of using kernels in SVM over explicitly mapping the data to a higher-dimensional space? 
1. It allows for non-linear transformations without explicitly computing high-dimensional coordinates 
2. It reduces the number of support vectors 
3. It simplifies the optimization process 
4. It automatically selects the best features for classification

The main advantage of using kernels in SVM over explicitly mapping the data to a higher-dimensional space is:

**1. It allows for non-linear transformations without explicitly computing high-dimensional coordinates**

Explanation:

* Kernel functions let SVM implicitly operate in a high-dimensional feature space without the costly computation of coordinates in that space (called the **kernel trick**).
* This avoids the computational burden and memory cost of explicit mapping, making it efficient.
* Options 2, 3, and 4 are not the primary advantages of kernels.

So, **option 1** is correct.


## Q3. What is the main purpose of the Lagrange multiplier technique in SVM optimization? 
1. To introduce non-linearity in the model 
2. To convert the constrained optimization problem into an unconstrained one 
3. To compute the kernel function 
4. To select the best features for classification

The main purpose of the Lagrange multiplier technique in SVM optimization is:

**2. To convert the constrained optimization problem into an unconstrained one**

Explanation:

* SVM optimization involves maximizing the margin with constraints (like correct classification).
* The Lagrange multipliers help reformulate this constrained problem into a form that can be solved more easily using methods for unconstrained optimization.
* This approach also helps identify support vectors via non-zero multipliers.
* Options 1, 3, and 4 are unrelated to the role of Lagrange multipliers.

So, **option 2** is the correct answer.


## Q4. What does the gamma parameter control in the Gaussian (RBF) kernel? 
1. The degree of the polynomial 
2. The width of the Gaussian function 
3. The offset from the origin 
4. The number of support vectors

The gamma parameter in the Gaussian (RBF) kernel controls:

**2. The width of the Gaussian function**

Explanation:

* Gamma defines how far the influence of a single training example reaches.
* A small gamma means a large variance (wide Gaussian), so points far away can influence the decision boundary.
* A large gamma means a narrow Gaussian, so only points very close influence the decision boundary, leading to more complex models.

So, **option 2** is correct.


## Q5. What is the primary advantage of solving the dual problem in SVM optimization? 
1. It always results in a global optimum 
2. It eliminates the need for kernel functions 
3. It is often more efficients for large datasets 
4. It reduces the number of support vectors

The primary advantage of solving the dual problem in SVM optimization is:

**1. It always results in a global optimum**

Explanation:

* The dual formulation of SVM is a convex quadratic optimization problem, guaranteeing a global optimum.
* Also, solving the dual allows the use of kernel functions to handle non-linear decision boundaries efficiently (so kernel functions *are* used, not eliminated).
* While the dual can sometimes be more efficient, especially with kernels, it’s not always more efficient for large datasets compared to primal methods.
* The number of support vectors is determined by the data, not by choosing primal or dual.

So, the best answer is **option 1**.


## Q6. What is the primary difference between hard margin and soft margin SVM? 
1. Hard margin uses kernel functions, while soft margin doesn't 
2. Soft margin allows for some misclassification, while hard margin doesn't 
3. Hard margin is used for regression, while soft margin is for classification 
4. Soft margin reduces computational complexity compared to hard margin

The primary difference between hard margin and soft margin SVM is:

**2. Soft margin allows for some misclassification, while hard margin doesn't**

Explanation:

* **Hard margin SVM** requires that all training data points are perfectly separated with no errors (no misclassification), which is only possible if data is linearly separable.
* **Soft margin SVM** introduces slack variables allowing some misclassification or margin violations to handle non-separable or noisy data better.
* Both can use kernels, both are for classification (not regression), and soft margin doesn't necessarily reduce computational complexity.

So, **option 2** is correct.


## Q7. In SVM Regression, what is the purpose of the epsilon-insensitive loss function? 
1. To maximize the margin between classes 
2. To ignore errors within a specified margin 
3. To transform the input space 
4. To compute the kernel function

In SVM Regression, the purpose of the epsilon-insensitive loss function is:

**2. To ignore errors within a specified margin**

Explanation:

* The epsilon-insensitive loss allows the model to ignore errors (differences between predicted and actual values) that fall within a margin of size epsilon.
* This helps create a "tube" around the regression function where small deviations are not penalized, promoting a simpler model that doesn’t try to fit noise.
* Options 1, 3, and 4 are unrelated to this loss function.

So, **option 2** is correct.


## Q8. In the polynomial kernel function K(x,y) = (x.y+c)^d, what does the parameter d represent? 
1. The distance between x and y 
2. The dimensionality of the input space 
3. The degree of the polynomial 
4. The regularization parameter

In the polynomial kernel function $K(x, y) = (x \cdot y + c)^d$, the parameter **d** represents:

**3. The degree of the polynomial**

Explanation:

* The degree $d$ controls the flexibility of the polynomial kernel and how complex the decision boundary can be.
* It determines the power to which the dot product (plus constant $c$) is raised.
* It is *not* the distance, input dimensionality, or regularization parameter.

So, **option 3** is correct.


## Q9. What is the amin limitation of Linear SVM? 
1. It cannot handle high-dimensional data 
2. It is computationally more expensive than non-linear SVM 
3. It cannot separate classes that are not linearly separable 
4. It always overfits the training data

The main limitation of Linear SVM is:

**3. It cannot separate classes that are not linearly separable**

Explanation:

* Linear SVM can only find a linear decision boundary, so if the classes are not linearly separable, it won't perform well.
* Linear SVM actually handles high-dimensional data well and is usually computationally less expensive than non-linear SVM.
* It does not always overfit; in fact, linear SVM is often less prone to overfitting compared to more complex models.

So, **option 3** is correct.


## Q10. In the context of SVM, what does the term "maximum margin hyperplane" refer to? 
1. The hyperplane with the most support vectors 
2. The hyperplane that perfectly separates all data points 
3. The hyperplane that has the largest distance from the nearest data points of both classes 
4. The hyperplane with the highest dimensionality

In the context of SVM, the term **"maximum margin hyperplane"** refers to:

**3. The hyperplane that has the largest distance from the nearest data points of both classes**

Explanation:

* SVM aims to find the hyperplane that maximizes the margin, i.e., the distance between the hyperplane and the closest data points (support vectors) from each class.
* This maximizes the model's generalization ability.
* It’s not necessarily the one with the most support vectors, nor does it always perfectly separate all points (especially in soft margin SVM), and dimensionality refers to the feature space, not the hyperplane.

So, **option 3** is correct.


## Q11. What is the primary advantage of using the polynomial kernel over the linear kernel? 
1. It always results in faster computation 
2. It can capture polynomial relationships between features 
3. It reduces the dimensionality of the data 
4. It eliminates the need for regularization

The primary advantage of using the polynomial kernel over the linear kernel is:

**2. It can capture polynomial relationships between features**

Explanation:

* The polynomial kernel allows the SVM to model non-linear relationships by mapping features into a higher-dimensional space where polynomial relationships can be captured.
* It is usually more computationally expensive than a linear kernel (so option 1 is wrong).
* It does not reduce dimensionality (option 3), rather it implicitly increases it.
* It does not eliminate the need for regularization (option 4).

So, **option 2** is correct.


## Q12. Which of the following is NOT a common application of the linear kernels in SVM? 
1. Text classification 
2. Image classification with complex patterns 
3. High-dimensional gene expression data 
4. Spam detection

The option that is **NOT** a common application of linear kernels in SVM is:

**2. Image classification with complex patterns**

Explanation:

* Linear kernels work well for **high-dimensional** and **sparse data** like text classification, spam detection, and gene expression data where the data is often linearly separable or close to it.
* For **image classification with complex patterns**, non-linear kernels (like RBF or polynomial) are preferred because they can capture complex, non-linear relationships in pixel data.

So, **option 2** is correct.


## Q13. What is the purpose of slack variables in soft margin classification? 
1. To increase the margin size 
2. To measure the degree of misclassification 
3. To transform the feature space 
4. To compute the kernel function

The purpose of slack variables in soft margin classification is:

**2. To measure the degree of misclassification**

Explanation:

* Slack variables allow some data points to lie inside the margin or be misclassified, quantifying how much each point violates the margin constraints.
* This helps the model balance between maximizing the margin and minimizing classification errors in non-linearly separable data.
* Slack variables don’t transform the feature space, increase margin size, or compute kernels.

So, **option 2** is correct.


## Q14. In SVM Regression, what does the epsilon parameter represent? 
1. The learning rate 
2. The margin of tolerance for errors 
3. The number of support vectors 
4. The degree of the polynomial kernel

In SVM Regression, the epsilon parameter represents:

**2. The margin of tolerance for errors**

Explanation:

* Epsilon defines a tube around the regression function where errors (differences between predicted and actual values) are ignored.
* It sets how much deviation is tolerated without penalty, controlling the model’s sensitivity to small errors.
* It is not the learning rate, number of support vectors, or degree of the polynomial kernel.

So, **option 2** is correct.


## Q15. What is the main difference between SVM classification and SVM regression? 
1. SVM Regression uses kernel functions, while Classification doesn't 
2. SVM Classificiation maximizes margin, while Regression minimizes it 
3. SVM Classification predicts discrete classes, while Regression predicts continuous values 
4. SVM Regression is always linear, while Classificiation can be non-linear

The main difference between SVM classification and SVM regression is:

**3. SVM Classification predicts discrete classes, while Regression predicts continuous values**

Explanation:

* SVM **classification** assigns input data to discrete categories or classes.
* SVM **regression** predicts continuous numerical values.
* Both can use kernel functions, both can be linear or non-linear, and both involve margin concepts but tailored to classification or regression.

So, **option 3** is correct.


## Q16. What is the primary goal of Linear SVM in classification tasks? 
1. To MInimize the margin between classes 
2. To maximizes the margin between classes 
3. To create a curved decision boundary 
4. To minimize the number of support vectors

The primary goal of Linear SVM in classification tasks is:

**2. To maximize the margin between classes**

Explanation:

* Linear SVM tries to find the hyperplane that maximizes the distance (margin) between the closest points of different classes, improving generalization.
* It does **not** minimize the margin, nor does it create curved boundaries (that’s for non-linear kernels).
* It does not explicitly aim to minimize the number of support vectors.

So, **option 2** is correct.


## Q17. What is the "kernel trick" in SVM? 
1. A method to reduce computational complixty 
2. A technique to explicitly compute coodinates in high-dimensional space 
3. A way to implicitly compute inner products in high-dimensional space 
4. A procedure to eliminate irrelevant features

The "kernel trick" in SVM is:

**3. A way to implicitly compute inner products in high-dimensional space**

Explanation:

* The kernel trick allows SVM to calculate the similarity (inner product) between data points in a high-dimensional feature space without explicitly computing their coordinates.
* This makes it efficient to handle non-linear relationships without the computational cost of explicit mapping.
* It is not about reducing computational complexity directly, explicitly computing coordinates, or feature elimination.

So, **option 3** is correct.


## Q18. Which kernel function is defined as K(x,y) = x.y? 
1. Polynomial kernel 
2. Gaussian (RBF) kernel 
3. Sigmoid kernel 
4. Linear kernel

The kernel function defined as $K(x, y) = x \cdot y$ is:

**4. Linear kernel**

Explanation:

* The linear kernel is simply the dot product between vectors $x$ and $y$.
* Polynomial, Gaussian (RBF), and Sigmoid kernels have more complex formulas involving powers, exponentials, or sigmoid functions.

So, **option 4** is correct.


## Q19. What is the main purpose of using kernel functions in SVM? 
1. To reduce the dimensionality of the input space
2. To increase the number of support vectors 
3. To transform data into a higher-dimensional space for better separation 
4. To eliminate the need for optimization

The main purpose of using kernel functions in SVM is:

**3. To transform data into a higher-dimensional space for better separation**

Explanation:

* Kernel functions implicitly map input data into a higher-dimensional space where classes are more likely to be linearly separable.
* They do **not** reduce dimensionality (usually they increase it implicitly), nor do they aim to increase support vectors or eliminate optimization.

So, **option 3** is correct.


## Q20. WHat does the parameter C control in Soft Margin SVM? 
. The number of support vectors 
2. The dimensionality of the feature space 
3. The trade-off between margin size and classification error 
4. The kernel function type

The parameter **C** in Soft Margin SVM controls:

**3. The trade-off between margin size and classification error**

Explanation:

* **C** determines how much the model penalizes misclassifications.
* A large **C** tries to minimize classification errors, possibly at the cost of a smaller margin (less tolerance for errors).
* A small **C** allows a larger margin with more misclassifications tolerated (more regularization).
* It does not control the number of support vectors, feature space dimensionality, or kernel type.

So, **option 3** is correct.


## Q21. What is the primary advantage of Soft Margin Classification in SVM? 
1. It always results in perfect classification 
2. It allows for some misclassification to handle non-linearly separable data 
3. It eliminates the need for support vectors 
4. It reduces the computational complexity of the algorithm

The primary advantage of Soft Margin Classification in SVM is:

**2. It allows for some misclassification to handle non-linearly separable data**

Explanation:

* Soft margin SVM permits some errors (misclassifications) to create a more flexible and robust model, especially when data is noisy or not perfectly separable.
* It does **not** guarantee perfect classification, nor does it eliminate support vectors or necessarily reduce computational complexity.

So, **option 2** is correct.


## Q22. What does the equation w.x+b=0 represent in Linear SVM? 
1. The margin 
2. The hyperplane 
3. the support vectors 
4. The optimization function

The equation $w \cdot x + b = 0$ in Linear SVM represents:

**2. The hyperplane**

Explanation:

* This equation defines the decision boundary (hyperplane) that separates the classes in the feature space.
* The margin refers to the distance around this hyperplane, support vectors are the points closest to this hyperplane, and the optimization function is what SVM tries to solve to find the best $w$ and $b$.

So, **option 2** is correct.


## Q23. In the context of SVM, what are support vectors? 
1. The weight vector of the hyperplane 
2. The bias term of the hyperplane 
3. The nearest data points to the hyperplane 
4. All data points in the training set

In the context of SVM, **support vectors** are:

**3. The nearest data points to the hyperplane**

Explanation:

* Support vectors are the data points that lie closest to the decision boundary (hyperplane) and directly influence its position and orientation.
* They are critical in defining the margin and the final model.
* They are not the weight vector, bias term, or all training data points.

So, **option 3** is correct.


## Q24. What is the primary advantage of using the Gaussian (RBF) kernel in SVM? 
1. It always results in perfect classification 
2. It can map data to an infinte-dimensional space 
3. It reduces the computational complexity of SVM 
4. It eliminates the nee for parameter tuning

The primary advantage of using the Gaussian (RBF) kernel in SVM is:

**2. It can map data to an infinite-dimensional space**

Explanation:

* The RBF kernel implicitly maps input data into an infinite-dimensional feature space, allowing SVM to model very complex, non-linear relationships.
* It does **not** guarantee perfect classification, nor does it reduce computational complexity or eliminate the need for parameter tuning (like gamma and C).

So, **option 2** is correct.


## Q25. Which of the 'following is TRUE about the bias term b in the SVM hyperplane equation? 
1. It is always positive 
2. It determines the orientation of the hyperplane 
3. It represents the offset of the hyperplane from the origin 
4. It is directly proportional to th emargin size

The correct statement about the bias term **b** in the SVM hyperplane equation is:

**3. It represents the offset of the hyperplane from the origin**

Explanation:

* The bias $b$ shifts the hyperplane away from the origin.
* It does **not** determine orientation (that’s controlled by the weight vector $w$), it can be positive or negative, and it is not directly proportional to the margin size.

So, **option 3** is correct.


## Q26. In the context of SVM, what does the term "primal problem" refer to? 
1. The original optimization problem formulation 
2. The dual form of the optimization problem 
3. The process of selecting the best kernel function 
4. The method of computing support vectors

In the context of SVM, the term **"primal problem"** refers to:

**1. The original optimization problem formulation**

Explanation:

* The primal problem is the initial constrained optimization problem that SVM aims to solve to find the best separating hyperplane.
* The dual problem is a reformulation of this problem that often makes it easier to solve, especially with kernels.
* Options 3 and 4 are unrelated to the primal problem.

So, **option 1** is correct.


## Q27. Which of the following is NOT a characteristic of support vectors in SVM? 
1. They are the data points closest to the decision boundary 
2. They determine the position of the hyperplane 
3. They are always correctly classified 
4. They are crucial for defining the amrgin

The statement that is **NOT** a characteristic of support vectors in SVM is:

**3. They are always correctly classified**

Explanation:

* Support vectors are indeed the points closest to the decision boundary (option 1), they determine the hyperplane's position (option 2), and are crucial for defining the margin (option 4).
* However, in **soft margin SVM**, some support vectors can be misclassified (inside the margin or on the wrong side), so they are **not always correctly classified**.

So, **option 3** is correct as the answer.


## Q28. Which of the following statements about SVM Regression is FALSE? 
1. It uses an epsilon-insensitive loss function 
2. It aims to find a function that fits all data points perfectly 
3. It can use different kernel functions for non-linear regression 
. It tries to keep the regression function as flat as possible

The **FALSE** statement about SVM Regression is:

**2. It aims to find a function that fits all data points perfectly**

Explanation:

* SVM regression uses an epsilon-insensitive loss function, which **does not** penalize errors within a margin, so it does **not** try to fit all data points perfectly.
* It can use kernels for non-linear regression (option 3), and it tries to keep the function flat (option 4) to avoid overfitting.

So, **option 2** is false.


## Q29. In the context of SVM, what does the term "feature space" refer to ? 
1. The original space of input features 
2. The transformed space after applying a kernel function 
3. The space of support vectors 
4. The space of slack variables

In the context of SVM, the term **"feature space"** refers to:

**2. The transformed space after applying a kernel function**

Explanation:

* The feature space is where the original input data is mapped (often to a higher dimension) by the kernel function to make it easier to separate classes linearly.
* It is different from the original input space (option 1), and not related to the space of support vectors or slack variables.

So, **option 2** is correct.
