## **Introduction to SVM and Random Forests**
### **Using the Diabetes dataset**
### **Authors:** 
Niklas Hohn, Christopher Arleth

In [5]:
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit, train_test_split
from sklearn.svm import SVC
import pandas as pd
from manim import *

### **Support Vector Machines (SVM)**

* **Definition:** SVMs are supervised learning models used for classification and regression tasks[1].  
* **Functionality:** SVMs aim to find an optimal hyperplane in an N-dimensional space that effectively separates data points into different classes[2].  
* **Applications:** SVMs excel in binary classification problems and can handle both linear and nonlinear classification tasks[1].  

In [7]:
%%manim -qm SVMExplanation

class SVMExplanation(Scene):
    def construct(self):
        # Title
        title = Text("Support Vector Machines (SVM)", font_size=60)
        self.play(Write(title))
        self.wait(1)
        self.play(FadeOut(title))

        # Definitions
        definitions = VGroup(
            Text("Definition:", font_size=48, weight=BOLD),
            Text("Supervised learning models for classification and regression.", font_size=36),
            Text("Functionality:", font_size=48, weight=BOLD),
            Text("Find an optimal hyperplane separating data points.", font_size=36),
            Text("Applications:", font_size=48, weight=BOLD),
            Text("Excel in binary classification; handle linear/nonlinear tasks.", font_size=36)
        ).arrange(DOWN, aligned_edge=LEFT, buff=0.2)
        self.play(FadeIn(definitions))
        self.wait(2)
        self.play(FadeOut(definitions))

        # Example with data points and hyperplane
        # Create data points (example: two classes)
        np.random.seed(0)  # For reproducibility
        class1 = [np.array([x, y, 0]) for x, y in np.random.multivariate_normal([2, 2], [[1, 0], [0, 1]], 20)]
        class2 = [np.array([x, y, 0]) for x, y in np.random.multivariate_normal([-2, -2], [[1, 0], [0, 1]], 20)]

        points1 = VGroup(*[Dot(point, color=BLUE) for point in class1])
        points2 = VGroup(*[Dot(point, color=RED) for point in class2])

        self.play(FadeIn(points1), FadeIn(points2))

        # Create hyperplane (initially not optimal)
        hyperplane = Line(start=LEFT * 3, end=RIGHT * 3, color=GREEN)
        self.play(Create(hyperplane))

        # Move hyperplane to a more optimal position
        optimal_hyperplane = Line(start=DOWN*2+LEFT*2, end=UP*2+RIGHT*2, color=YELLOW)
        self.play(Transform(hyperplane, optimal_hyperplane))

        # Show margin
        margin1 = DashedLine(start=optimal_hyperplane.point_from_proportion(0)+UP*0.5, end=optimal_hyperplane.point_from_proportion(1)+UP*0.5, color=YELLOW, stroke_width=2)
        margin2 = DashedLine(start=optimal_hyperplane.point_from_proportion(0)+DOWN*0.5, end=optimal_hyperplane.point_from_proportion(1)+DOWN*0.5, color=YELLOW, stroke_width=2)

        self.play(Create(margin1), Create(margin2))
        margin_group = VGroup(margin1, margin2)

        margin_text = Text("Margin", color=YELLOW, font_size=24).next_to(optimal_hyperplane, UP, buff=0.7)
        self.play(Write(margin_text))

        # Highlight support vectors
        support_vectors = VGroup()
        for point in points1:
            projection = optimal_hyperplane.get_projection(point)
            distance = np.linalg.norm(point.get_center() - projection)
            if distance < 0.6:  # Adjust threshold as needed
                support_vectors.add(point.copy().set_color(PURPLE))
        for point in points2:
            projection = optimal_hyperplane.get_projection(point)
            distance = np.linalg.norm(point.get_center() - projection)
            if distance < 0.6:  # Adjust threshold as needed
                support_vectors.add(point.copy().set_color(PURPLE))
        self.play(AnimationGroup(*[Transform(point, support_vectors[i]) for i, point in enumerate(support_vectors)], lag_ratio=0.2))
        support_text = Text("Support Vectors", color=PURPLE, font_size=24).next_to(margin_text, UP)
        self.play(Write(support_text))


        self.wait(3)
        self.play(*[FadeOut(mob) for mob in self.mobjects]) #clean scene

        #Non linear example
        points1 = VGroup(*[Dot(point, color=BLUE) for point in class1]).shift(LEFT*3)
        points2 = VGroup(*[Dot(point, color=RED) for point in class2]).shift(LEFT*3)

        self.play(FadeIn(points1), FadeIn(points2))

        #Draw a circle around one of the classes
        circle = Circle(radius=3, color=GREEN).shift(LEFT*3)
        self.play(Create(circle))
        nonlinear_text = Text("Non-Linear Separation", color=GREEN, font_size=36).next_to(circle, RIGHT)
        self.play(Write(nonlinear_text))
        self.wait(3)

                                                                                                          

                                                                                                             

                                                                                             

                                                                                              

                                                                                                     

                                                                           

                                                                              

                                                                                      

                                                                                     

AttributeError: Line object has no attribute 'point_to_line_parameters'

* **Key Concepts:**  
  * **Hyperplane:** A decision boundary that separates data points into different classes.  
  * **Support Vectors:** Data points closest to the hyperplane, influencing its position and orientation.  
  * **Margin:** The distance between the hyperplane and the support vectors. SVMs aim to maximize this margin.  
  * **Kernel Trick:** A technique used to transform data into a higher-dimensional space to enable linear separation in cases where the data is not linearly separable in the original space. Imagine trying to separate a group of red and blue marbles scattered on a table with a single straight line. If the marbles are mixed in a way that a straight line can't perfectly separate them, you can use the "kernel trick" to lift the marbles onto a curved surface where they become separable by a plane3.  
  * **Kernel Types:** Different kernel functions can be used depending on the data and the desired complexity of the decision boundary. Common kernel types include:  
    * **Linear Kernel:** Suitable for linearly separable data.  
    * **Polynomial Kernel:** Creates non-linear decision boundaries by mapping data into a higher-dimensional space using polynomial functions.  
    * **Radial Basis Function (RBF) Kernel:** A popular choice for many applications, creating complex decision boundaries.  
    * **Sigmoid Kernel:** Similar to the sigmoid function used in logistic regression3.

### **Random Forests**

* **Definition:** Random Forests are an ensemble learning method that constructs multiple decision trees during training and combines their outputs to improve prediction accuracy4.  
* **Functionality:** For classification tasks, the output is the class selected by most trees. For regression tasks, the output is the average prediction of all trees4.  
* **Applications:** Random Forests are versatile and handle both classification and regression problems effectively5.  
* **Key Concepts:**  

* **Key Concepts:**  
  * **Ensemble Learning:** Combining multiple models to achieve better performance than any individual model. Think of it like getting advice from multiple experts instead of just one.  
  * **Decision Trees:** Individual models within the Random Forest that make predictions based on a series of decisions.  
  * **Bagging:** Bootstrap aggregating, a technique where each tree is trained on a random subset of the data, increasing diversity and reducing overfitting. Imagine creating multiple teams of experts, each with a different set of experiences, to make a more robust decision4.  
  * **Feature Randomness:** Each tree considers only a random subset of features when splitting a node, further reducing correlation between trees and improving generalization.

## **Hyperparameter Tuning for SVM and Random Forests**

This section delves into the crucial aspect of hyperparameter tuning for both SVM and Random Forests.

### **Hyperparameter Tuning for SVM**

* **Objective:** Optimize the performance of an SVM model by finding the best combination of hyperparameters.  
* **Key Hyperparameters:**  
  * **C (Regularization parameter):** Controls the trade-off between maximizing the margin and minimizing classification error. A low C creates a smooth decision surface, while a high C aims to classify all training examples correctly6.  
  * **kernel:** Specifies the type of kernel function used to map data into a higher-dimensional space (e.g., 'linear', 'rbf', 'poly')7.  
  * **gamma:** Kernel coefficient for 'rbf', 'poly', and 'sigmoid' kernels. It defines how far the influence of a single training example reaches. A low gamma value creates wide areas of influence similar to how a large variance in k-Nearest Neighbors leads to smooth decision boundaries. A high gamma value on the other hand, results in small areas of influence and can thus lead to more complex decision boundaries6.  
* **Techniques:**  
  * **GridSearchCV:** Systematically explores all possible combinations of hyperparameter values within a predefined grid9.  
  * **RandomizedSearchCV:** Samples a given number of candidates from a parameter space with a specified distribution7.

### **Hyperparameter Tuning for Random Forests (More Detail)**

* **Objective:** Fine-tune a Random Forest model to achieve optimal performance by adjusting hyperparameters.  
* **Key Hyperparameters:**  
  * **n\_estimators:** The number of trees in the forest. Increasing the number of trees generally improves performance but also increases computational cost10.  
  * **max\_depth:** The maximum depth of each tree. A deeper tree can capture more complex relationships but is also more prone to overfitting11.  
  * **min\_samples\_split:** The minimum number of samples required to split an internal node12.  
  * **min\_samples\_leaf:** The minimum number of samples required to be at a leaf node12.  
  * **max\_features:** The number of features to consider when looking for the best split10.  
  * **criterion:** The function used to measure the quality of a split (e.g., "gini" for the Gini impurity or "entropy" for the information gain).  
* **Importance:** Random Forests are generally less sensitive to hyperparameter settings than other algorithms, but tuning can still yield performance improvements10.  
* **Techniques:**  
  * **GridSearchCV:** Systematically searches for the best hyperparameter combination from a grid of values.  
  * **RandomizedSearchCV:** Randomly samples hyperparameter combinations from a defined distribution.

### **Hyperparameter Tuning with Cross-Validation for SVM**

* **Cross-validation:** A technique to evaluate the performance of a model on unseen data by splitting the training data into multiple folds and using each fold as a validation set in turn.  
* **Benefits:**  
  * More reliable estimate of model performance.  
  * Reduces the risk of overfitting by evaluating the model on different subsets of the data.  
* **Process:**  
  1. Split the training data into k folds.  
  2. For each fold:  
     * Train the SVM model on the remaining k-1 folds.  
     * Evaluate the model on the held-out fold.  
  3. Repeat steps 1 and 2 for different hyperparameter combinations.  
  4. Select the hyperparameter combination that yields the best average performance across all folds.  
* **GridSearchCV:** Scikit-learn provides the GridSearchCV class, which automates this process by training and evaluating the model with different hyperparameter combinations and selecting the best one based on cross-validation performance9.  
* **Example:** Using GridSearchCV with cross-validation to find the optimal C and gamma values for an SVM with an RBF kernel9.

### **Hyperparameter Tuning with Cross-Validation for Random Forests**

* **Cross-validation:** Essential for reliable performance evaluation and hyperparameter tuning in Random Forests13.  
* **K-Fold Cross-Validation:**  
  * Divides the training data into K folds.  
  * Trains the model K times, each time using K-1 folds for training and the remaining fold for validation.  
  * Averages the performance across all K folds to estimate the model's generalization ability13.  
* **Process:**  
  1. Split the training data into K folds.  
  2. For each set of hyperparameters:  
     * Perform K-Fold Cross-Validation, training and evaluating the model K times.  
     * Average the performance across all folds.  
  3. Select the hyperparameter set with the best average performance.  
* **GridSearchCV and RandomizedSearchCV:** Similar to SVM, GridSearchCV and RandomizedSearchCV can be used with Random Forests to automate the hyperparameter tuning process with cross-validation11.  
* **Example:** Tuning n\_estimators, max\_depth, and min\_samples\_split using 5-fold cross-validation14.  
* **Iterative Process:** Hyperparameter tuning with cross-validation is often an iterative process, where you start with a wide range of values and gradually narrow down the search space based on the results of each iteration.

## **Conclusion and Q\&A**
**Key Takeaways:**

* SVM and Random Forests are powerful machine learning algorithms with a wide range of applications in various domains.  
* Hyperparameter tuning is crucial for optimizing the performance of these models and ensuring they generalize well to unseen data.  
* Cross-validation provides a robust method for evaluating model performance and selecting the best hyperparameter combinations.  
* Techniques like GridSearchCV and RandomizedSearchCV automate the process of hyperparameter tuning with cross-validation, making it more efficient and effective.

**Potential Questions and Answers:**

* **Q: What are the advantages and disadvantages of using different kernel types in SVM?**  
  * **A:** The choice of kernel depends on the data and the desired complexity of the decision boundary. Linear kernels are suitable for linearly separable data, while polynomial and RBF kernels can model more complex relationships. However, more complex kernels may be more prone to overfitting.  
* **Q: How do I choose the best cross-validation strategy for my dataset?**  
  * **A:** The choice of cross-validation strategy depends on factors like the size of the dataset and the presence of class imbalance. K-Fold cross-validation is a common choice, but other techniques like stratified k-fold or leave-one-out cross-validation may be more appropriate in certain situations.  
* **Q: What are some common pitfalls to avoid during hyperparameter tuning?**  
  * **A:** Overfitting to the validation set is a common pitfall. It's important to use a separate test set to evaluate the final model's performance on unseen data. Additionally, it's crucial to strike a balance between model complexity and generalization ability to avoid overfitting or underfitting.