# Feature Scaling and SVM

## Task
Demonstrate the impact of feature scaling on SVM performance with and without kernel transformation.

## Objective
Highlight the importance of preprocessing steps in SVM implementations.

## Implementation
Here, we compare the performance of an SVM on scaled versus unscaled data, using both linear and RBF kernels.


In [1]:

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# SVM without scaling
svm_noscale = SVC(kernel='linear')
svm_noscale.fit(X_train, y_train)
accuracy_noscale = accuracy_score(y_test, svm_noscale.predict(X_test))

# SVM with scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm_scale = SVC(kernel='linear')
svm_scale.fit(X_train_scaled, y_train)
accuracy_scale = accuracy_score(y_test, svm_scale.predict(X_test_scaled))

print(f'Accuracy without scaling: {accuracy_noscale}')
print(f'Accuracy with scaling: {accuracy_scale}')


Accuracy without scaling: 1.0
Accuracy with scaling: 0.9666666666666667


## SVM with Custom Kernels

## Task
Implement custom kernels for SVMs and compare their effectiveness on specific types of data structures.

## Objective
Explore the flexibility of SVMs with kernels tailored to particular problems.

## Implementation
We implement a simple custom kernel for SVM and compare it to standard kernels on a simple dataset.


In [2]:

import numpy as np

# Custom kernel function: polynomial kernel
def my_kernel(X, Y):
    return (1 + np.dot(X, Y.T)) ** 2

svm_custom = SVC(kernel=my_kernel)
svm_custom.fit(X_train_scaled, y_train)
accuracy_custom = accuracy_score(y_test, svm_custom.predict(X_test_scaled))

print(f'Accuracy with custom kernel: {accuracy_custom}')


Accuracy with custom kernel: 1.0


## One-Class SVM

## Task
Use one-class SVM for anomaly detection in datasets.

## Objective
Understand how SVM can be adapted for unsupervised problems, specifically for identifying outliers.

## Implementation
We demonstrate the use of a one-class SVM to detect outliers in a dataset.


In [3]:

from sklearn.svm import OneClassSVM

# Simulate data with outliers
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
X_inliers = np.random.normal(size=(100, 2))
X_total = np.vstack((X_inliers, X_outliers))

# One-class SVM
oc_svm = OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
oc_svm.fit(X_inliers)

y_pred = oc_svm.predict(X_total)
n_error = y_pred[y_pred == -1].size

print(f'Number of detected outliers: {n_error}')


Number of detected outliers: 25


## Incremental Learning with SVM

## Task
Implement an online learning algorithm with SVM to handle large datasets that cannot fit into memory.

## Objective
Learn techniques for training models incrementally, essential for real-time data processing.

## Implementation
We use the incremental learning capabilities of scikit-learn's SGDClassifier to simulate an SVM.


In [4]:

from sklearn.linear_model import SGDClassifier

# Incremental SVM using SGDClassifier
svm_incremental = SGDClassifier(loss='hinge')  # 'hinge' loss simulates SVM

# Simulating incremental learning
for _ in range(5):  # Simulate 5 batches of data
    svm_incremental.partial_fit(X_train_scaled[:20], y_train[:20], classes=np.unique(y_train))

accuracy_incremental = accuracy_score(y_test, svm_incremental.predict(X_test_scaled))
print(f'Incremental learning accuracy: {accuracy_incremental}')


Incremental learning accuracy: 0.8333333333333334


## SVM for Feature Selection

## Task
Utilize SVMs with recursive feature elimination to identify the most significant features for classification tasks.

## Objective
Understand the role of feature selection in improving model accuracy and efficiency.

## Implementation
We demonstrate the use of SVM with recursive feature elimination (RFE) to identify significant features in the Iris dataset.


In [5]:

from sklearn.feature_selection import RFE

# Feature selection with SVM and RFE
svm_rfe = SVC(kernel="linear")
selector = RFE(svm_rfe, n_features_to_select=2, step=1)
selector = selector.fit(X_train_scaled, y_train)

print(f'Selected features: {selector.support_}')


Selected features: [False False  True  True]
