The key idea is that a **baseline model** is a simple reference point, and SVMs (via SVC) are then compared against that baseline and against each other under different kernels and hyperparameters. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html)

## 1. Baseline model: DummyClassifier

A **baseline classifier** is a very simple model you use to answer: *“Is my real model actually learning anything?”* [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/ml-dummy-classifiers-using-sklearn/)

- In scikit‑learn, `DummyClassifier` ignores the input features and predicts using trivial rules like:
  - Always predict the most frequent class (`strategy="most_frequent"`).
  - Predict classes randomly according to their observed frequencies (`"stratified"`).
  - Predict a constant label (`"constant"`). [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html)
- Its **baseline score** is just the accuracy of this dummy model on the test set. [towardsdatascience](https://towardsdatascience.com/dummy-classifier-explained-a-visual-guide-with-code-examples-for-beginners-009ff95fc86e/)

If your real model can’t beat the dummy, it’s not useful.

Example intuition:

- If 80% of wines are class 0, `DummyClassifier(strategy="most_frequent")` achieves 80% accuracy by always predicting 0.  
- Any real classifier should aim for **better than 80%**.

## 2. Support Vector Classifier (SVC) and kernels

An **SVC** is scikit‑learn’s implementation of a support vector machine for classification. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

- It can model:
  - **Linear** decision boundaries (`kernel="linear"`).
  - **Nonlinear** boundaries via kernels (`"poly"`, `"rbf"`, `"sigmoid"`). [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html)
- In the notebook, the **wine dataset** is used with SVC:
  - Train SVC on two or more wine features.
  - Compare performance under different kernels (linear, polynomial, RBF, sigmoidal).
  - See how kernel choice and hyperparameters change accuracy and boundary shape. [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html)

Conceptually:

- SVC uses the **maximum margin classifier** idea and the **kernel trick**:
  - Finds a hyperplane in feature space that maximizes the margin.
  - Kernels implicitly define the feature space without manual feature engineering. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/major-kernel-functions-in-support-vector-machine-svm/)

## 3. Choosing a kernel: practical guidelines

The **kernel function** $k(x,z)$ determines the geometry of the feature space and the shape of the decision boundary. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/major-kernel-functions-in-support-vector-machine-svm/)

Common kernels in SVC: [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

- **Linear** (`kernel="linear"`):  
  - $k(x,z) = x^\top z$.  
  - Boundary is a straight hyperplane in the original space.
- **Polynomial** (`"poly"`):  
  - $k(x,z) = (\gamma x^\top z + r)^d$.  
  - Captures polynomial interactions and mild curvature.
- **RBF / Gaussian** (`"rbf"`):  
  - $k(x,z) = \exp(-\gamma \|x - z\|^2)$.  
  - Very flexible; good for complex, local patterns.
- **Sigmoid** (`"sigmoid"`):  
  - $k(x,z) = \tanh(\gamma x^\top z + r)$.  
  - Related to neural network activation functions; less commonly used. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/support-vector-machine-algorithm/)

Guidelines (as in the mini‑lesson and standard practice): [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/support-vector-machine-algorithm/)

- **Linear separability**:
  - If data looks roughly linearly separable (classes separated by a “wide” straight boundary), start with a **linear kernel**.
- **Problem complexity**:
  - If boundaries appear curved or complex (e.g., spiral, circles), use a nonlinear kernel like **RBF** or **polynomial**. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/major-kernel-functions-in-support-vector-machine-svm/)
- **Computational budget**:
  - Linear SVC scales better on very large datasets.
  - Polynomial and RBF kernels can be more computationally expensive, especially if you tune many hyperparameters. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/support-vector-machine-algorithm/)
- **Empirical testing**:
  - It’s hard to know the best kernel a prior; you typically **try several** and compare with cross‑validation. [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html)

## 4. Tuning kernel hyperparameters (grid search & CV)

You usually tune kernel hyperparameters using **grid search** plus **cross‑validation**: [youtube](https://www.youtube.com/watch?v=ZobQggQtRt8)

1. Choose a set of candidate kernels and parameter grids, e.g.:
   - Linear: just vary $C$.
   - Polynomial: vary `degree`, `gamma`, `coef0`, `C`.
   - RBF: vary `gamma`, `C`. [geeksforgeeks](https://www.geeksforgeeks.org/python/rbf-svm-parameters-in-scikit-learn/)
2. For each combination, run cross‑validation and compute metrics (accuracy, F1, etc.).
3. Pick the combination with the best cross‑validation performance, considering:
   - Predictive performance.
   - Computational cost.
   - Interpretability (linear models are easier to explain). [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html)

## 5. Understanding gamma (γ) in kernels like RBF and poly
How `gamma` is treated in scikit‑learn: [stackoverflow](https://stackoverflow.com/questions/59594653/what-does-the-gamma-parameter-in-svm-svc-actually-do)

- `gamma` is the **kernel coefficient** for `rbf`, `poly`, and `sigmoid` kernels. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)
- Intuition in RBF:

  - $k(x,z) = \exp(-\gamma \|x - z\|^2)$. [geeksforgeeks](https://www.geeksforgeeks.org/python/rbf-svm-parameters-in-scikit-learn/)
  - **High gamma**:
    - Distance matters a lot; each training point influences only a tiny neighborhood.
    - Decision boundary becomes more complex and wiggly around training points.
    - Higher risk of **overfitting**—fits training data very closely but may generalize poorly. [geeksforgeeks](https://www.geeksforgeeks.org/python/rbf-svm-parameters-in-scikit-learn/)
  - **Low gamma**:
    - Distance matters less; points have broader influence.
    - Boundary is smoother and simpler.
    - Higher risk of **underfitting**—boundary may be too coarse to capture real structure. [geeksforgeeks](https://www.geeksforgeeks.org/python/rbf-svm-parameters-in-scikit-learn/)

- You choose the **best gamma** via cross‑validation:
  - The gamma that gives the highest cross‑validated score is preferred.
  - It indicates a good balance between bias and variance (good generalization). [youtube](https://www.youtube.com/watch?v=ZobQggQtRt8)

The same idea applies (with some nuance) to `gamma` in polynomial and sigmoid kernels, though RBF is the most common case where gamma tuning is critical. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/major-kernel-functions-in-support-vector-machine-svm/)

## 6. Summary of how to use this in practice

1. **Start with a baseline**:
   - Use `DummyClassifier` to get a trivial benchmark accuracy. [towardsdatascience](https://towardsdatascience.com/dummy-classifier-explained-a-visual-guide-with-code-examples-for-beginners-009ff95fc86e/)
2. **Train an SVC with a simple kernel**:
   - Try `kernel="linear"` on your dataset.
3. **Explore more complex kernels if needed**:
   - Try `poly`, `rbf`, and `sigmoid`, especially when the data is obviously non‑linear. [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html)
4. **Use cross‑validation and grid search**:
   - Tune `C`, `gamma`, and `degree` (for poly), and pick the configuration with the best cross‑val metrics. [youtube](https://www.youtube.com/watch?v=ZobQggQtRt8)
5. **Check for over/underfitting**:
   - Very high gamma or very high degree → watch for overfitting.
   - Very low gamma or too simple kernel → watch for underfitting.

Define a simple baseline, then compare SVC variants with different kernels and gamma values, using cross‑validation to justify your choices.