<u>Kernel Transformation vs. Kernel Approximation:</u>
- Kernel transformations map data into a higher-dimensional feature space, making non-linear relationships linearly separable. However, this is computationally expensive for large datasets.

- Kernel approximations approximate this high-dimensional mapping with lower-dimensional feature spaces. While this reduces computational cost, it retains the essential characteristics of the original kernel, allowing linear methods to perform efficiently.


<u>Kernel Approximations Explained:</u>

- Kernel approximations help scale kernel methods efficiently for large datasets.
- Traditional kernel methods, like the Radial Basis Function (RBF) kernel, compute a kernel matrix that grows quadratically with the dataset size, making them computationally expensive and memory-intensive.

- Kernel approximations, such as Random Fourier Features for RBF kernels or the Nystroem method, transform input data into a lower-dimensional space that mimics the original kernel’s behavior. This enables the use of faster, linear algorithms while still capturing non-linear relationships.


<u>Example: Concentric Circles Problem</u>
- Consider a dataset where points form concentric circles, which are not linearly separable in 2D. An RBF kernel maps these points into a higher-dimensional space, where they can be separated by a hyperplane. However, this transformation is costly.

- Random Fourier Features approximate this mapping by projecting the data into a lower-dimensional space, preserving the separability characteristics. 
This enables efficient linear methods like SVM while achieving similar performance to the original kernel method.



<h4>Kernel Approximation Methods and Their Data Suitability</h4>

<table style="font-size: 14px;">
  <tr>
    <th><strong>Method</strong></th>
    <th><strong>Description</strong></th>
    <th><strong>Data Type It Handles Well</strong></th>
  </tr>
  <tr>
    <td><strong>AdditiveChi2Sampler</strong></td>
    <td>Approximates the feature map for the additive chi-squared kernel. It is ideal for non-negative data such as histograms or counts.</td>
    <td><strong>Non-negative data, Histograms, Count data</strong></td>
  </tr>
  <tr>
    <td><strong>Nystroem</strong></td>
    <td>Uses a subset of the training data to approximate the kernel map. Ideal for large datasets, it is computationally efficient.</td>
    <td><strong>Large datasets, Data with high dimensionality</strong></td>
  </tr>
  <tr>
    <td><strong>PolynomialCountSketch</strong></td>
    <td>Approximates polynomial kernels via the Tensor Sketch algorithm. Efficient for polynomial kernel-based learning.</td>
    <td><strong>Data requiring polynomial kernel, High-dimensional spaces</strong></td>
  </tr>
  <tr>
    <td><strong>RBFSampler</strong></td>
    <td>Approximates the RBF kernel feature map by using random Fourier features, making it suitable for large-scale datasets.</td>
    <td><strong>High-dimensional, continuous data, Data requiring smooth decision boundaries (such as in clustering, regression, or classification tasks with continuous features)</strong></td>
  </tr>
  <tr>
    <td><strong>SkewedChi2Sampler</strong></td>
    <td>Approximates the feature map for "skewed chi-squared" kernel, often used for data with skewed distributions.</td>
    <td><strong>Non-negative, skewed data (e.g., counts, frequencies)</strong></td>
  </tr>
</table>


#### AdditiveChi2Square approximation

In [None]:
"""
- Random Feature Selection  : Randomly selects a subset of features.
- Random Sampling           : Generates random points (basis functions) for transformation.
- Transformation & Modeling : Transforms data using Chi-squared kernel approximation and applies a linear model.
"""

import sklearn.kernel_approximation

# Initialize the AdditiveChi2Sampler
additivechi2sampler = sklearn.kernel_approximation.AdditiveChi2Sampler(
    sample_steps=2,                #  Defines how many random points are sampled for kernel approximation. Default is 2.
    sample_interval=None           #  Specifies the spacing between these sampled points. Default is None.
)


# Example of using the AdditiveChi2Sampler
from sklearn.datasets import load_digits
from sklearn.linear_model import SGDClassifier
from sklearn.kernel_approximation import AdditiveChi2Sampler

X, y = load_digits(return_X_y=True)

# Apply AdditiveChi2Sampler transformation
chi2sampler = AdditiveChi2Sampler(sample_steps=2)
X_transformed = chi2sampler.fit_transform(X, y)

# Train classifier
clf = SGDClassifier(max_iter=5, random_state=0, tol=1e-3)
clf.fit(X_transformed, y)

# Evaluate the classifier
clf_score = clf.score(X_transformed, y)
print(f"Classifier score: {clf_score:.4f}")

### Nystroem approximation

In [None]:
"""
- Kernel Approximation : The Nystroem method approximates the feature map of a kernel to speed up computations.
- Used with Linear Models : Works well with linear models such as LinearSVC for efficient non-linear classification.
"""

import sklearn.kernel_approximation
from sklearn import datasets, svm

# Initialize the Nystroem approximation
nystroem_approximation = sklearn.kernel_approximation.Nystroem(
    kernel="rbf",                   # Specifies the kernel to be approximated. Can be 'rbf', 'polynomial', etc.
    gamma=0.2,                      # Gamma parameter for the RBF kernel. Controls the width of the Gaussian. Default is None.
                                    # Higher values lead to a narrower kernel, affecting the decision boundary.
    coef0=None,                     # Coefficient for polynomial and sigmoid kernels (not used for RBF). Default is None.
    degree=None,                    # Degree for polynomial kernel (not used for RBF). Default is None.
    kernel_params=None,             # Additional parameters for custom kernels passed as a callable function. Default is None.
    n_components=300,               # Number of features to construct during kernel approximation. More components increase the computational complexity.
                                    # Larger values may lead to better accuracy at the cost of speed and memory.
    random_state=None,              # Controls randomness for reproducibility. None means random behavior; set an integer (e.g., 0, 42) for consistent results.
    n_jobs=-1,                      # Number of CPU cores to use for computation (-1 uses all available cores). Default is 1.
)
param_grid = {
    'kernel': ['rbf', 'polynomial', 'laplacian', 'sigmoid', 'chi2'],  # Type of kernel function to approximate
    'gamma': [0.01, 0.1, 0.5, 1, 2, None],  # Gamma parameter for the kernel function (None is the default)
    'coef0': [None, 0.0, 0.1, 0.5, 1.0],  # Coefficient for polynomial and sigmoid kernels (ignored for others)
    'degree': [2, 3, 4, 5, 6],  # Degree for polynomial kernel (ignored for other kernels)
    'n_components': [50, 100, 200, 300, 400],  # Number of components for feature mapping
    'random_state': [None, 0, 42],  # Random state for reproducibility
    'n_jobs': [None, 1, -1]  # Number of CPU cores to use for computation, None means 1 core, -1 uses all available cores
}




# Load the dataset (digits)
X, y = datasets.load_digits(n_class=9, return_X_y=True)

# Normalize the data (scale the input features)
data = X / 16.  # Normalize by dividing by 16, as the original data is in range [0, 16]

# Apply Nystroem kernel approximation
data_transformed = nystroem_approximation.fit_transform(data)

# Train a Linear Support Vector Classifier on the transformed data
clf = svm.LinearSVC()
clf.fit(data_transformed, y)

# Evaluate the classifier
clf_score = clf.score(data_transformed, y)
print(f"Classifier score: {clf_score:.4f}")


### PolynomialCountSketch Approximation.

In [None]:
"""
- Polynomial Kernel Approximation : Approximates the feature map of the polynomial kernel using Tensor Sketch.
- Uses Fast Fourier Transforms (FFT) to efficiently compute a Count Sketch of the outer product of a vector with itself.
"""

import sklearn.kernel_approximation
from sklearn.linear_model import SGDClassifier

# Initialize PolynomialCountSketch with specified hyperparameters
polynomialcountSketch_approximation = sklearn.kernel_approximation.PolynomialCountSketch(
    gamma=1.0,             # Parameter of the polynomial kernel (K(X, Y) = (gamma * <X, Y> + coef0)^degree)
    degree=3,              # Degree of the polynomial kernel (default is 2, here set to 3)
    coef0=1,               # Constant term (default is 0)
    n_components=100,      # Dimensionality of the output feature space (usually greater than number of input features)
    random_state=None,     # Controls randomness for reproducibility. None means random behavior; set an integer (e.g., 0, 42) for consistent results.

)


from sklearn.kernel_approximation import PolynomialCountSketch
from sklearn.linear_model import SGDClassifier

X = [[0, 0], [1, 1], [1, 0], [0, 1]]
y = [0, 0, 1, 1]

ps = PolynomialCountSketch(degree=3, random_state=1)
X_features = ps.fit_transform(X)

clf = SGDClassifier(max_iter=10, tol=1e-3)

clf.fit(X_features, y)
SGDClassifier(max_iter=10)
clf.score(X_features, y)


### RBFSampler Approximation

In [None]:
"""
- Random Fourier Features: Approximates the RBF kernel by generating random Fourier features.
- Feature Transformation: Uses RBF kernel approximation to project the input data into a higher-dimensional space.
- Linear Classification: Applies a linear model after kernel approximation for classification or regression tasks.
"""

import sklearn.kernel_approximation

# Initialize the RBFSampler for kernel approximation
rbfsampler_approximation = sklearn.kernel_approximation.RBFSampler(
    gamma=1,                # - Controls the width of the RBF kernel, influencing how localized the influence of each data point is.
                            # - Higher gamma makes the kernel more sensitive, leading to a complex model.
                            # - Lower gamma makes the kernel broader, leading to a smoother model.
                            # - gamma = 'scale' automatically adjusts based on data variance and feature count.Controls the width of the kernel. Default is 1.
    
    n_components=100,       # - Defines the number of random features to use in approximating the RBF kernel.
                            # - Higher n_components increases accuracy but also computation cost and dimensionality.
                            # - Lower n_components reduces accuracy but speeds up the process; Default is 100.    
    random_state=None,      # Controls randomness for reproducibility. None means random behavior; set an integer (e.g., 0, 42) for consistent results.

)

# Example of using the RBFSampler to transform data and apply a classifier
from sklearn.datasets import load_digits
from sklearn.linear_model import SGDClassifier

# Load the dataset
X, y = load_digits(return_X_y=True)

# Apply RBFSampler transformation to the data (kernel approximation)
rbf_feature = sklearn.kernel_approximation.RBFSampler(gamma=1, random_state=1)
X_features = rbf_feature.fit_transform(X)  # Transform data using RBF kernel approximation

# Train a classifier using the transformed features
clf = SGDClassifier(max_iter=5, random_state=0, tol=1e-3)  # Initialize SGD classifier
clf.fit(X_features, y)  # Fit the classifier to the transformed data

# Evaluate the classifier on the transformed data
clf_score = clf.score(X_features, y)  # Calculate the accuracy score
print(f"Classifier score: {clf_score:.4f}")  

### SkewedChi2sample Approximation

In [None]:
"""
- SkewedChi2Sampler: Approximates the chi-square kernel with skewed features for kernel approximation.
- Feature Transformation: Uses SkewedChi2Sampler to project the input data into a higher-dimensional space.
- Linear Classification: Applies a linear model after kernel approximation for classification tasks.
"""
import sklearn.kernel_approximation

skewedchi2sample_approximation = sklearn.kernel_approximation.SkewedChi2Sampler(
    skewedness=0.01,           # - Controls the skewness of the chi-square transformation. 
                               # - Lower values make the transformation more uniform, higher values make it more skewed.
    n_components=10,           # - Number of random features to use in approximating the chi-square kernel.
                               # - Increasing n_components increases accuracy but also computational cost.
    random_state=0             # - Controls randomness for reproducibility. Set an integer for consistent results.
)




from sklearn.kernel_approximation import SkewedChi2Sampler
from sklearn.linear_model import SGDClassifier


# Example of using the SkewedChi2Sampler to transform data and apply a classifier
X = [[0, 0], [1, 1], [1, 0], [0, 1]]  # Input data
y = [0, 0, 1, 1]                       # Target labels

# Apply SkewedChi2Sampler transformation to the data (kernel approximation)
X_features = skewedchi2sample_approximation.fit_transform(X, y)

# Train a classifier using the transformed features
clf = SGDClassifier(max_iter=10, tol=1e-3)  # Initialize SGD classifier with 10 iterations
clf.fit(X_features, y)  # Fit the classifier to the transformed data

# Evaluate the classifier on the transformed data
clf_score = clf.score(X_features, y)  # Calculate the accuracy score
print(f"Classifier score: {clf_score:.4f}")
