# SVC (Support Vector Classification) 
SVC (Support Vector Classification) class from the sklearn.svm module.<br>
class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)
### Overview of SVC
Purpose: SVC is used for classification tasks. It implements the Support Vector Machine algorithm for binary and multi-class classification problems.

In [1]:
import numpy as np

In [2]:
class SVM:
    def __init__(self,learning_rate=0.0001,lambda_param=0.01,n_iters=1000):
        self.lr=learning_rate
        self.lambda_param=lambda_param
        self.n_iters=n_iters
        self.weights=None
        self.bias=None

    def fit(self,X,y):
        y=np.where(y<=0,-1,1)
        n_samples,n_features=X.shape
        self.weights=np.zeros(n_features)
        self.bias=0
        for i in range(self.n_iters):
            for idx,x_i in enumerate(X):
                condition=y[idx]*(np.dot(x_i,self.weights)-self.bias)>=1
                if condition:
                    self.weights-=self.lr*(2*self.lambda_param*self.weights)
                else:
                    self.weights-=self.lr*(2*self.lambda_param*self.weights-np.dot(x_i,y[idx]))
                    self.bias-=self.lr*y[idx]

    def predict(self,X):
        y_pred=np.dot(X,self.weights)-self.bias
        return np.where(y_pred<=0,-1,1)

In [3]:
from sklearn import datasets
X,y=datasets.make_blobs(n_samples=1000,n_features=5,centers=2,cluster_std=1.05,random_state=40)

In [4]:
y=np.where(y==0,-1,1)

In [5]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1234)

In [6]:
clf=SVM()
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)
predictions

array([-1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1, -1,  1,  1,
        1, -1, -1, -1,  1, -1,  1,  1, -1,  1, -1, -1, -1,  1,  1,  1, -1,
        1, -1, -1, -1,  1, -1,  1,  1,  1, -1, -1, -1,  1,  1, -1, -1,  1,
       -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1, -1, -1,  1,  1,  1,
        1,  1,  1,  1,  1, -1,  1, -1,  1,  1, -1,  1,  1,  1, -1,  1, -1,
        1,  1, -1, -1, -1, -1,  1,  1,  1,  1, -1,  1, -1, -1,  1, -1, -1,
        1,  1, -1, -1,  1,  1, -1,  1, -1, -1,  1, -1,  1,  1,  1, -1, -1,
        1,  1,  1, -1,  1, -1,  1,  1, -1,  1,  1,  1,  1, -1, -1,  1, -1,
       -1,  1, -1, -1,  1,  1, -1, -1,  1,  1,  1, -1, -1, -1,  1, -1, -1,
       -1,  1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1, -1,  1, -1,  1,  1,
        1, -1, -1,  1,  1, -1, -1, -1,  1, -1,  1,  1, -1,  1, -1,  1, -1,
        1, -1,  1, -1,  1,  1,  1,  1, -1,  1,  1, -1, -1])

In [7]:
y_test

array([-1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1, -1,  1,  1,
        1, -1, -1, -1,  1, -1,  1,  1, -1,  1, -1, -1, -1,  1,  1,  1, -1,
        1, -1, -1, -1,  1, -1,  1,  1,  1, -1, -1, -1,  1,  1, -1, -1,  1,
       -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1, -1, -1,  1,  1,  1,
        1,  1,  1,  1,  1, -1,  1, -1,  1,  1, -1,  1,  1,  1, -1,  1, -1,
        1,  1, -1, -1, -1, -1,  1,  1,  1,  1, -1,  1, -1, -1,  1, -1, -1,
        1,  1, -1, -1,  1,  1, -1,  1, -1, -1,  1, -1,  1,  1,  1, -1, -1,
        1,  1,  1, -1,  1, -1,  1,  1, -1,  1,  1,  1,  1, -1, -1,  1, -1,
       -1,  1, -1, -1,  1,  1, -1, -1,  1,  1,  1, -1, -1, -1,  1, -1, -1,
       -1,  1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1, -1,  1, -1,  1,  1,
        1, -1, -1,  1,  1, -1, -1, -1,  1, -1,  1,  1, -1,  1, -1,  1, -1,
        1, -1,  1, -1,  1,  1,  1,  1, -1,  1,  1, -1, -1])

In [8]:
def accuracy(y_true,y_pred):
    accuracy=np.sum(y_true==y_pred)/len(y_true)
    return accuracy
accuracy(y_test,predictions)

1.0

In [9]:
import pandas as pd

In [10]:
df=pd.read_csv("diabetes.csv")
X=df.drop(columns=["Outcome"])
y=df["Outcome"]

In [11]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1234)
X_train=np.array(X_train)
y_train=np.array(y_train)
y_test=np.array(y_test)
x_test=np.array(X_test)

In [12]:
clf=SVM()
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)
predictions

array([-1,  1, -1,  1, -1, -1, -1, -1, -1,  1,  1, -1, -1, -1, -1,  1, -1,
       -1, -1, -1, -1,  1, -1,  1, -1, -1,  1,  1, -1, -1, -1,  1,  1, -1,
        1,  1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1, -1, -1,
       -1,  1, -1,  1, -1, -1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1, -1,
       -1,  1, -1, -1, -1, -1,  1, -1, -1, -1, -1,  1, -1,  1,  1,  1, -1,
       -1, -1,  1, -1, -1, -1,  1, -1, -1, -1, -1, -1,  1, -1,  1, -1,  1,
       -1,  1, -1, -1,  1,  1,  1,  1, -1,  1, -1, -1, -1,  1, -1,  1, -1,
       -1, -1, -1, -1,  1,  1,  1, -1, -1, -1, -1,  1, -1, -1,  1,  1, -1,
       -1, -1, -1, -1, -1, -1, -1,  1, -1, -1,  1,  1, -1, -1,  1, -1, -1,
        1])

In [13]:
y_test=np.where(y_test<=0,-1,1)
y_test

array([-1, -1,  1,  1, -1, -1, -1, -1,  1, -1,  1, -1, -1, -1, -1, -1,  1,
        1, -1,  1,  1,  1, -1,  1, -1, -1, -1,  1, -1, -1, -1, -1,  1,  1,
        1, -1,  1, -1, -1, -1, -1, -1,  1, -1, -1, -1, -1,  1, -1, -1,  1,
       -1,  1, -1,  1, -1, -1, -1, -1,  1,  1,  1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1,  1, -1, -1,  1, -1, -1, -1, -1, -1,  1,  1,  1,  1, -1,
       -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1, -1,  1,  1,  1,
       -1, -1, -1,  1,  1,  1, -1,  1, -1, -1,  1,  1, -1, -1,  1,  1, -1,
       -1, -1,  1,  1, -1,  1,  1, -1, -1, -1, -1, -1, -1,  1,  1, -1, -1,
       -1,  1, -1, -1,  1, -1, -1, -1,  1,  1,  1, -1, -1, -1,  1, -1,  1,
       -1])

In [14]:
accuracy(y_test,predictions)

0.6688311688311688

In [15]:
y_train=np.where(y_train<=0,-1,1)

### Key Parameters
- C: Regularization parameter. A smaller value specifies stronger regularization, helping to prevent overfitting.
- kernel: Defines the type of kernel function to be used (e.g., linear, rbf, poly). The choice of kernel can significantly affect model performance.
- gamma: Affects the decision boundary's curvature. Higher values lead to more complex models, while lower values can lead to underfitting.
- class_weight: Useful for handling imbalanced datasets. It adjusts the weight of each class based on their frequency.
- probability: When set to True, enables probability estimates, but adds computational overhead during training.

### Attributes
- support_vectors_: Provides the support vectors, which are critical to the decision boundary.
- n_support_: Returns the number of support vectors for each class.
- coef_: Weights assigned to features if the kernel is linear.

### Methods
- fit(X, y): Trains the model using the provided training data.
- predict(X): Classifies the input data.
- score(X, y): Returns the mean accuracy of the classifier on the test data.
- predict_proba(X): Provides probabilities of class membership if probability estimates are enabled.


In [16]:
from sklearn.svm import SVC
clf=SVC()
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)
predictions



array([-1,  1, -1, -1, -1, -1, -1, -1,  1, -1,  1, -1, -1, -1, -1, -1,  1,
       -1, -1, -1,  1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1, -1,  1, -1,
        1, -1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1, -1, -1, -1, -1,  1,
        1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1, -1,  1, -1,  1,
       -1,  1, -1, -1,  1,  1, -1,  1, -1, -1, -1, -1, -1, -1, -1,  1, -1,
       -1, -1, -1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,  1, -1, -1,
        1])

In [17]:
accuracy(y_test,predictions)

0.7467532467532467

### Example
Here's a simple example of using SVC with a pipeline:

- import numpy as np<br>
from sklearn.pipeline import make_pipeline<br>
from sklearn.preprocessing import StandardScaler<br>
from sklearn.svm import SVC

- Sample data<br>
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])<br>
y = np.array([1, 1, 2, 2])

- Create a pipeline with standard scaling and SVC<br>
clf = make_pipeline(StandardScaler(), SVC(gamma='scale<br>
clf.fit(X, y)

- Make a prediction<br>
print(clf.predict([[-0.8, -1]]))  # Output might be [1]

### Considerations
- Performance: For large datasets, consider alternatives like LinearSVC or SGDClassifier, as SVC can become computationally intensive.
- Hyperparameter Tuning: Tuning parameters like C, gamma, and kernel choice can significantly improve performance. Use techniques like cross-validation to find optimal values.
- Interpretability: While SVMs can provide a strong classification performance, the interpretability of the decision boundaries may be complex, especially with non-linear kernels.

### Kernel Function in Support Vector Machine
In Support Vector Machines (SVMs), there are several types of kernel functions that can be used to map the input data into a higher-dimensional feature space. The choice of kernel function depends on the specific problem and the characteristics of the data.

- Linear Kernel<br>
A linear kernel is a type of kernel function used in machine learning, including in SVMs (Support Vector Machines). It is the simplest and most commonly used kernel function, and it defines the dot product between the input vectors in the original feature space.<br>
The linear kernel can be defined as:<br>
K(x, y) = x .y  <br>
Where x and y are the input feature vectors. The dot product of the input vectors is a measure of their similarity or distance in the original feature space.<br>
When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that separates the different classes in the feature space. This linear boundary can be useful when the data is already separable by a linear decision boundary or when dealing with high-dimensional data, where the use of more complex kernel functions may lead to overfitting.

- Polynomial Kernel
A particular kind of kernel function utilised in machine learning, such as in SVMs, is a polynomial kernel (Support Vector Machines). It is a nonlinear kernel function that employs polynomial functions to transfer the input data into a higher-dimensional feature space.<br>
One definition of the polynomial kernel is:<br>
K(x, y) = (x. y + c)d <br>
Where x and y are the input feature vectors, c is a constant term, and d is the degree of the polynomial.<br>
The constant term is added to, and the dot product of the input vectors elevated to the degree of the polynomial.<br>
The decision boundary of an SVM with a polynomial kernel might capture more intricate correlations between the input characteristics because it is a nonlinear hyperplane.<br>
The degree of nonlinearity in the decision boundary is determined by the degree of the polynomial.<br>
The polynomial kernel has the benefit of being able to detect both linear and nonlinear correlations in the data. It can be difficult to select the proper degree of the polynomial, though, as a larger degree can result in overfitting while a lower degree cannot adequately represent the underlying relationships in the data.<br>
In general, the polynomial kernel is an effective tool for converting the input data into a higher-dimensional feature space in order to capture nonlinear correlations between the input characteristics.<br>

- Gaussian (RBF) Kernel<br>
The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular kernel function used in machine learning, particularly in SVMs (Support Vector Machines). It is a nonlinear kernel function that maps the input data into a higher-dimensional feature space using a Gaussian function.<br>
The Gaussian kernel can be defined as:<br>
K(x, y) = exp(-gamma * ||x - y||^2)  <br>
Where x and y are the input feature vectors, gamma is a parameter that controls the width of the Gaussian function, and ||x - y||^2 is the squared Euclidean distance between the input vectors.<br>
When using a Gaussian kernel in an SVM, the decision boundary is a nonlinear hyper plane that can capture complex nonlinear relationships between the input features. The width of the Gaussian function, controlled by the gamma parameter, determines the degree of nonlinearity in the decision boundary.<br>
One advantage of the Gaussian kernel is its ability to capture complex relationships in the data without the need for explicit feature engineering. However, the choice of the gamma parameter can be challenging, as a smaller value may result in under fitting, while a larger value may result in over fitting.<br>

- Laplace Kernel<br>
The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is a type of kernel function used in machine learning, including in SVMs (Support Vector Machines). It is a non-parametric kernel that can be used to measure the similarity or distance between two input feature vectors.<br>
The Laplacian kernel can be defined as:<br>
K(x, y) = exp(-gamma * ||x - y||)  <br>
Where x and y are the input feature vectors, gamma is a parameter that controls the width of the Laplacian function, and ||x - y|| is the L1 norm or Manhattan distance between the input vectors.<br>
When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear hyperplane that can capture complex relationships between the input features. The width of the Laplacian function, controlled by the gamma parameter, determines the degree of nonlinearity in the decision boundary.<br>
One advantage of the Laplacian kernel is its robustness to outliers, as it places less weight on large distances between the input vectors than the Gaussian kernel. However, like the Gaussian kernel, choosing the correct value of the gamma parameter can be challenging.

### Advantages of SVM:
- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

### Disadvantages of SVM:
- If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
- SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).