1. [Decision Tree](#dt)
2. [Random Foreset](#rf) 
3. [Implimentation from scratch](#impli)
4. [Difference between Regression and classification](#diff)
5. [OvA One Vs All](#ova)
6. [Basic Interview Questions](#int)
7. [Advance Interview Questions](#aint)

<a id='dt'></a>
# Decision Tree

A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. It mimics human decision-making by splitting the data into branches based on certain conditions, starting from the root and ending at the leaves.

### Key Components of a Decision Tree:

1. **Root Node**: Represents the entire dataset and the starting point of the tree. It contains the most significant feature (split criterion).
2. **Decision Nodes**: Intermediate nodes where the dataset is further split based on certain conditions.
3. **Leaf Nodes**: The endpoints of the tree, representing the output or decision (e.g., a class label or predicted value).
4. **Splitting**: The process of dividing a node into two or more sub-nodes based on a feature value or condition.
5. **Pruning**: Reducing the size of the tree by removing parts that contribute little to predictive accuracy (used to prevent overfitting).

### How It Works:

1. Start at the root node and evaluate the best feature to split the data using metrics such as:
    - **Gini Impurity**: Measures the probability of incorrect classification at a node.
    - **Entropy/Information Gain**: Measures the reduction in uncertainty after the split.
    - **Variance Reduction**: For regression tasks, measures how well the split minimizes variability in the target variable.
2. Repeat the splitting process for child nodes recursively until a stopping condition is met (e.g., no further improvement in splits or a maximum depth is reached).
3. Use the leaves to make predictions.

### Advantages:

- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Requires minimal data preprocessing.

### Disadvantages:

- Prone to overfitting with deep trees.
- Sensitive to small changes in the dataset (leads to different splits and predictions).

<a id='rf'></a>
# Random Forest

A Random Forest is an ensemble learning method that combines multiple decision trees to improve the accuracy, robustness, and generalization of the model.

### How It Works:

1. **Bootstrap Sampling**:
    - From the original dataset, multiple random subsets (with replacement) are created.
    - Each subset is used to train an individual decision tree.
2. **Feature Subset Selection**:
    - At each split in the decision trees, only a random subset of features is considered, which introduces diversity in the trees.
3. **Voting/Averaging**:
    - For classification: Each tree in the forest votes for a class, and the majority vote is selected as the final prediction.
    - For regression: The average of all tree predictions is used as the final output.

### Key Parameters:

1. **Number of Trees (n_estimators)**: The number of decision trees in the forest.
2. **Maximum Features (max_features)**: The number of features considered for splitting at each node.
3. **Maximum Depth (max_depth)**: Limits the depth of individual trees to avoid overfitting.
4. **Min Samples Split (min_samples_split)**: Minimum number of samples required to split a node.

### Advantages:

- Reduced Overfitting: By aggregating multiple trees, Random Forest mitigates the overfitting problem of individual decision trees.
- Handles Missing Data: Can handle datasets with missing values effectively.
- Feature Importance: Can rank features by importance, aiding interpretability.

### Disadvantages:

- Computationally expensive with a large number of trees.
- Difficult to interpret compared to a single decision tree.

### Comparison: Decision Tree vs. Random Forest

| Aspect          | Decision Tree                  | Random Forest                              |
|-----------------|--------------------------------|--------------------------------------------|
| Structure       | Single tree                    | Multiple trees (ensemble)                  |
| Overfitting     | Prone to overfitting           | Reduces overfitting by averaging predictions|
| Accuracy        | May have lower accuracy        | Higher accuracy due to ensemble effect     |
| Interpretability| Easy to interpret              | Harder to interpret due to multiple trees  |
| Robustness      | Sensitive to data changes      | Robust to outliers and noise               |

### Applications:

- **Decision Tree**: Useful when interpretability is critical or for quick prototyping.
- **Random Forest**: Ideal for large datasets with complex patterns where accuracy is more important than interpretability.

<a id='impli'></a>
# Implementing Decision Tree and Random Forest from scratch
### 1. Decision Tree Implementation

**Steps:**
1. Calculate the impurity metric (Gini Impurity or Entropy).
2. Identify the best split (feature and threshold) based on impurity reduction.
3. Recursively split the dataset until a stopping criterion is met (e.g., max depth or minimum samples).
4. Use the resulting tree for predictions.

In [13]:
import numpy as np

class DecisionTree:
   def __init__(self, max_depth=None, min_samples_split=2):
       self.max_depth = max_depth
       self.min_samples_split = min_samples_split
       self.tree = None

   def _gini(self, y):
       classes, counts = np.unique(y, return_counts=True)
       probs = counts / len(y)
       return 1 - np.sum(probs ** 2)

   def _split(self, X, y, feature_index, threshold):
       left_indices = X[:, feature_index] <= threshold
       right_indices = ~left_indices
       return X[left_indices], X[right_indices], y[left_indices], y[right_indices]

   def _best_split(self, X, y):
       best_gain = 0
       best_split = None

       for feature_index in range(X.shape[1]):
           thresholds = np.unique(X[:, feature_index])
           for threshold in thresholds:
               X_left, X_right, y_left, y_right = self._split(X, y, feature_index, threshold)
               if len(y_left) == 0 or len(y_right) == 0:
                   continue
               gini_left = self._gini(y_left)
               gini_right = self._gini(y_right)
               weighted_gini = (len(y_left) * gini_left + len(y_right) * gini_right) / len(y)
               gain = self._gini(y) - weighted_gini

               if gain > best_gain:
                   best_gain = gain
                   best_split = {"feature_index": feature_index, "threshold": threshold,
                                 "X_left": X_left, "X_right": X_right, "y_left": y_left, "y_right": y_right}

       return best_split

   def _build_tree(self, X, y, depth=0):
       if len(y) < self.min_samples_split or (self.max_depth and depth >= self.max_depth) or len(np.unique(y)) == 1:
           return np.bincount(y).argmax()

       split = self._best_split(X, y)
       if not split:
           return np.bincount(y).argmax()

       left = self._build_tree(split["X_left"], split["y_left"], depth + 1)
       right = self._build_tree(split["X_right"], split["y_right"], depth + 1)

       return {"feature_index": split["feature_index"], "threshold": split["threshold"], "left": left, "right": right}

   def fit(self, X, y):
       self.tree = self._build_tree(X, y)

   def _predict(self, x, tree):
       if not isinstance(tree, dict):
           return tree
       if x[tree["feature_index"]] <= tree["threshold"]:
           return self._predict(x, tree["left"])
       else:
           return self._predict(x, tree["right"])

   def predict(self, X):
       return np.array([self._predict(x, self.tree) for x in X])

# Example usage:
X = np.array([[2.3, 1.2], [1.1, 3.4], [2.8, 3.5], [1.5, 0.7]])
y = np.array([0, 1, 0, 1])
tree = DecisionTree(max_depth=3)
tree.fit(X, y)
print(tree.predict(X))

[0 1 0 1]


### 2. Random Forest Implementation

**Steps:**
1. Use bootstrapping to create multiple subsets of the training data.
2. Train a decision tree on each subset.
3. Use a random subset of features at each split in the trees.
4. Aggregate predictions from all trees (majority vote for classification or average for regression).


In [14]:
import numpy as np
from collections import Counter

class RandomForest:
   def __init__(self, n_estimators=10, max_depth=None, min_samples_split=2, max_features=None):
       self.n_estimators = n_estimators
       self.max_depth = max_depth
       self.min_samples_split = min_samples_split
       self.max_features = max_features
       self.trees = []

   def _bootstrap_sample(self, X, y):
       n_samples = X.shape[0]
       indices = np.random.choice(n_samples, n_samples, replace=True)
       return X[indices], y[indices]

   def _random_features(self, X):
       if not self.max_features:
           return X, np.arange(X.shape[1])
       feature_indices = np.random.choice(X.shape[1], self.max_features, replace=False)
       return X[:, feature_indices], feature_indices

   def fit(self, X, y):
       self.trees = []
       for _ in range(self.n_estimators):
           X_sample, y_sample = self._bootstrap_sample(X, y)
           X_subset, feature_indices = self._random_features(X_sample)
           tree = DecisionTree(max_depth=self.max_depth, min_samples_split=self.min_samples_split)
           tree.fit(X_subset, y_sample)
           self.trees.append((tree, feature_indices))

   def predict(self, X):
       tree_preds = []
       for tree, feature_indices in self.trees:
           X_subset = X[:, feature_indices]
           tree_preds.append(tree.predict(X_subset))
       tree_preds = np.array(tree_preds).T
       return np.array([Counter(row).most_common(1)[0][0] for row in tree_preds])

# Example usage:
X = np.array([[2.3, 1.2], [1.1, 3.4], [2.8, 3.5], [1.5, 0.7]])
y = np.array([0, 1, 0, 1])
rf = RandomForest(n_estimators=5, max_depth=3, max_features=1)
rf.fit(X, y)
print(rf.predict(X))

[0 1 0 1]


<a id="diff"></a>
**The primary difference in how a Decision Tree operates for classification vs. regression lies in the impurity measure** used for splitting and the way predictions are made. Here’s a detailed breakdown:

1. **Splitting Criteria**

    The process of choosing the best split is different for classification and regression tasks.

    **Classification**

    - **Objective:** Minimize the impurity of the target classes in each split.
    - **Impurity Metrics:**
      - **Gini Impurity:**
         ${Gini = 1 - \sum_{i=1}^{n} p_i^2}$
         where ${p_i}$ is the proportion of samples of class ${i}$ in the node.
      - **Entropy (Information Gain):**
         ${
         Entropy = - \sum_{i=1}^{n} p_i \log(p_i)
         }$
         The goal is to maximize the reduction in entropy (information gain) after the split.

    **Regression**

    - **Objective:** Minimize the variance or error of the target values in each split.
    - **Metrics:**
      - **Mean Squared Error (MSE):**
         ${
         MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \bar{y})^2
         }$
         where ${ \bar{y} }$ is the mean target value of the samples in the node.
      - **Mean Absolute Error (MAE) (less common):**
         ${
         MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \bar{y}|
         }$
         The goal is to minimize the variance or error in target values after the split.

2. **Leaf Node Output**

    The way predictions are made in the leaf nodes differs between classification and regression.

    **Classification**

    - **Prediction:** The most common class (mode) in the leaf node.
    - For example, if the leaf contains samples with classes ${[0, 0, 1]}$, the predicted class is ${0}$ (majority class).

    **Regression**

    - **Prediction:** The mean (average) of the target values in the leaf node.
    - For example, if the leaf contains target values ${[2.5, 3.2, 2.7]}$, the prediction is the mean:
      ${
      \bar{y} = \frac{2.5 + 3.2 + 2.7}{3} = 2.8
      }$

3. **Evaluation of Split Quality**

    The metrics used to evaluate splits differ between classification and regression.

    **Classification**

    - Splits are evaluated by the reduction in impurity:
      - **Gini Impurity Decrease:**
         ${
         \Delta Gini = Gini_{before} - Gini_{after}
         }$
      - **Information Gain (reduction in entropy):**
         ${
         \Delta Entropy = Entropy_{before} - Entropy_{after}
         }$

    **Regression**

    - Splits are evaluated by the reduction in variance or error:
      - **Reduction in Variance (using MSE):**
         ${
         \Delta Variance = Variance_{before} - Variance_{after}
         }$

4. **Handling Outputs**

    The outputs for classification and regression tasks are handled differently due to the nature of the problem.

    **Classification**

    - **Discrete Labels:** The target values are categorical.
    - **Probability Estimation:** Some implementations (e.g., Scikit-learn) can provide probabilities by calculating the proportion of each class in the leaf node.

    **Regression**

    - **Continuous Values:** The target values are real numbers.
    - **Prediction Smoothing:** Some implementations may apply techniques like leaf regularization to prevent overfitting.

5. **Example**

    **Classification**

    | Feature 1 | Feature 2 | Class |
    |-----------|-----------|-------|
    | 2.3       | 1.2       | 0     |
    | 1.1       | 3.4       | 1     |
    | 2.8       | 3.5       | 0     |
    | 1.5       | 0.7       | 1     |

    - The tree might split based on Gini or Entropy to separate class ${0}$ from class ${1}$.
    - A leaf node predicts ${0}$ if it has more samples with class ${0}$.

    **Regression**

    | Feature 1 | Feature 2 | Target |
    |-----------|-----------|--------|
    | 2.3       | 1.2       | 2.5    |
    | 1.1       | 3.4       | 3.2    |
    | 2.8       | 3.5       | 2.7    |
    | 1.5       | 0.7       | 3.8    |

    - The tree splits to minimize the variance of target values within each leaf.
    - A leaf node predicts the average of the target values.

6. **Summary Table**

    | Aspect                  | Classification                     | Regression                        |
    |-------------------------|-------------------------------------|-----------------------------------|
    | Target Type             | Discrete labels (e.g., 0, 1, 2)    | Continuous values (e.g., real numbers) |
    | Impurity Metric         | Gini Impurity, Entropy             | Variance (MSE), MAE               |
    | Prediction in Leaf Node | Most common class (mode)           | Mean (average) of target values   |
    | Split Evaluation        | Reduction in Gini/Entropy          | Reduction in Variance/Error (MSE/MAE) |
    | Output                  | Class label or probability         | Numeric value                     |

<a id="ova"></a>
## OvA (One-vs-All) is a strategy 
It used for solving multi-class classification problems by breaking them into multiple binary classification problems.

### How it works:

1. For a dataset with ${ k }$ classes, the algorithm trains ${ k }$ separate binary classifiers.
2. Each classifier is trained to distinguish one class (the “one”) from all the other classes (the “all”).
3. During prediction:
    - Each classifier produces a score (e.g., probability or decision boundary value).
    - The class with the highest score across all classifiers is chosen as the final prediction.

### Example:

Suppose there are three classes: ${ A }$, ${ B }$, and ${ C }$. Using the OvA approach:

- Classifier 1: Distinguishes ${ A }$ vs. ${ B }$ and ${ C }$.
- Classifier 2: Distinguishes ${ B }$ vs. ${ A }$ and ${ C }$.
- Classifier 3: Distinguishes ${ C }$ vs. ${ A }$ and ${ B }$.

If given a new sample, the model evaluates all three classifiers and selects the class with the highest confidence.

### Advantages:

- Simple and efficient for multi-class problems.
- Works well with algorithms that are inherently binary, such as logistic regression or support vector machines.

### Disadvantages:

- Classifiers are independent, which can lead to inconsistencies in predictions.
- May not perform as well as other strategies (e.g., One-vs-One) for highly imbalanced datasets.


<a id="int"></a>
## Interview Questions

Here’s a list of top interview questions on Decision Trees and Random Forests, along with short, concise answers:

1. **What is a Decision Tree?**

    A Decision Tree is a supervised learning algorithm that splits data into subsets based on feature values, forming a tree-like structure to make decisions for classification or regression tasks.

2. **How does a Decision Tree decide where to split?**

    It chooses the split that maximizes the reduction in impurity:
    - For classification: Uses Gini Impurity or Entropy.
    - For regression: Uses Variance Reduction (e.g., MSE).

3. **What are the advantages of Decision Trees?**

    - Easy to interpret and visualize.
    - Handles both categorical and numerical data.
    - Requires minimal data preprocessing (e.g., no scaling).

4. **What are the limitations of Decision Trees?**

    - Prone to overfitting, especially with deep trees.
    - Sensitive to small changes in the data (unstable splits).
    - Less accurate compared to ensemble methods like Random Forests.

5. **What is Random Forest?**

    Random Forest is an ensemble learning method that combines multiple decision trees (trained on random subsets of data and features) to improve accuracy and reduce overfitting.

6. **Why is Random Forest better than a single Decision Tree?**

    Random Forest:
    - Reduces overfitting by averaging predictions from multiple trees.
    - Improves generalization through randomness (bootstrap samples and random feature selection).

7. **What is bootstrap aggregation (bagging) in Random Forest?**

    Bagging involves:
    1. Creating multiple bootstrap samples (random subsets with replacement) from the training data.
    2. Training each tree on a different sample and combining their outputs (e.g., majority vote for classification, averaging for regression).

8. **How does Random Forest handle feature selection?**

    At each split, it randomly selects a subset of features to find the best split, reducing correlation between trees and improving diversity.

9. **How does Random Forest handle missing data?**

    Random Forest can:
    - Use surrogate splits (alternative splits for missing data).
    - Impute missing values based on proximity or feature importance.

10. **What are Out-of-Bag (OOB) samples in Random Forest?**

     OOB samples are data points not included in the bootstrap sample for a particular tree. These samples are used to estimate model accuracy without needing a separate validation set.

11. **What is Gini Impurity?**

     A measure of node impurity used for classification:
     ${Gini = 1 - \sum_{i=1}^{n} p_i^2}$
     where ${p_i}$ is the proportion of samples belonging to class ${i}$.

12. **What is the difference between Gini and Entropy?**

     - Gini Impurity is faster to compute and ranges from 0 (pure) to 0.5 (max impurity for binary classes).
     - Entropy measures information gain and ranges from 0 (pure) to ${\log_2(c)}$ where ${ c }$ is the number of classes.

13. **What are the hyperparameters of Decision Trees?**

     Key hyperparameters include:
     - `max_depth`: Maximum depth of the tree.
     - `min_samples_split`: Minimum samples required to split a node.
     - `min_samples_leaf`: Minimum samples required in a leaf node.
     - `criterion`: Splitting metric (e.g., Gini, Entropy).

14. **What are the hyperparameters of Random Forest?**

     Key hyperparameters include:
     - `n_estimators`: Number of trees in the forest.
     - `max_features`: Number of features to consider for splits.
     - `max_depth`: Maximum depth of each tree.
     - `bootstrap`: Whether to use bootstrap sampling.

15. **How does Random Forest prevent overfitting?**

     - Averages predictions from multiple trees.
     - Adds randomness via bootstrap samples and random feature selection.

16. **What is the difference between Random Forest and Bagging?**

     - Bagging trains trees on different bootstrap samples but uses all features for splits.
     - Random Forest adds randomness by selecting a subset of features at each split.

17. **Can Decision Trees handle multi-class classification?**

     Yes, Decision Trees handle multi-class classification by evaluating impurity metrics across all classes.

18. **What are feature importances in Random Forest?**

     Feature importance measures how much each feature contributes to reducing impurity in the forest. Features with higher importance have more influence on the predictions.

19. **How does Random Forest handle overfitting?**

     By averaging the outputs of multiple trees and introducing randomness in data and feature selection, Random Forest reduces variance and minimizes overfitting.

20. **What are the limitations of Random Forest?**

     - Computationally expensive (requires many trees).
     - Difficult to interpret compared to a single Decision Tree.
     - May not perform well on high-dimensional sparse data.

21. **What is pruning in Decision Trees?**

     Pruning involves cutting back the tree to reduce overfitting:
     - Pre-pruning: Stop tree growth early (e.g., limit `max_depth`).
     - Post-pruning: Remove branches after full tree growth.

22. **How is the prediction made in Random Forest?**

     - Classification: Majority vote across all trees.
     - Regression: Average of predictions from all trees.

23. **How does Random Forest deal with imbalanced datasets?**

     - Assign class weights during training.
     - Use subsampling or SMOTE to balance classes.

24. **What are OOB error estimates?**

     Out-of-Bag error is an unbiased estimate of the model’s accuracy, calculated using predictions for OOB samples across all trees.

25. **Can Random Forest handle high-dimensional data?**

     Yes, Random Forest can handle high-dimensional data efficiently due to random feature selection, but feature selection or dimensionality reduction may improve performance.


<a id="aint"></a>
### Advance Interview Questions.

1. **What is the difference between variance reduction and impurity reduction in Decision Trees?**

    - **Variance reduction (for regression):** Measures how much the variability in the target is reduced after a split. It uses metrics like Mean Squared Error (MSE).
    - **Impurity reduction (for classification):** Measures how “pure” a split is in terms of class distribution. Metrics include Gini Impurity or Entropy.

2. **Why is Random Forest not prone to overfitting, unlike Decision Trees?**

    - Random Forest reduces overfitting by averaging predictions of multiple trees, which lowers variance.
    - It uses randomness in two ways:
      - Bootstrap sampling (different subsets of data for each tree).
      - Random feature selection at each split.

3. **What is the time complexity of training and predicting with a Decision Tree?**

    - **Training:** \(O(n \log n)\), where \(n\) is the number of samples and \(d\) is the number of features.
    - **Prediction:** \(O(\log n)\), as it traverses from the root to a leaf.

4. **What are some common problems in building Decision Trees, and how can you address them?**

    1. **Overfitting:**
        - Use pruning (max_depth, min_samples_split).
        - Switch to ensemble methods like Random Forest.
    2. **Bias from irrelevant features:**
        - Remove irrelevant/noisy features via feature selection.
    3. **Imbalanced data:**
        - Use class weights or oversample the minority class.

5. **How does Random Forest handle correlated features?**

    - Random Forest struggles with highly correlated features because similar splits can appear in multiple trees, reducing diversity.
    - To address this, reduce correlations using PCA, or rely on feature importance scores to eliminate redundant features.

6. **Why is Random Forest not ideal for extrapolation?**

    Random Forest only makes predictions within the range of the training data, as trees learn by splitting the existing feature space. It cannot predict outside this range, making it unsuitable for extrapolation tasks.

7. **How is feature importance calculated in Random Forest?**

    - **Gini-based importance:** Measures the reduction in Gini Impurity caused by each feature across all splits in the forest.
    - **Permutation importance:** Evaluates how randomizing a feature affects model accuracy (proxy for feature relevance).

8. **What is the role of randomness in Random Forest?**

    - Random Forest introduces randomness in:
      - Data sampling: Each tree is trained on a bootstrap sample.
      - Feature selection: A random subset of features is considered for each split.
    - This randomness ensures low correlation between trees and improves generalization.

9. **What is the tradeoff between max_features and model performance in Random Forest?**

    - **Low max_features:**
      - Increases tree diversity (reduces correlation).
      - May lead to underfitting if important features are missed.
    - **High max_features:**
      - Reduces diversity (increases correlation between trees).
      - May overfit if trees become too similar.

10. **How does Random Forest handle high-dimensional data compared to SVM?**

     - **Random Forest:** Efficient for high-dimensional data because it selects a subset of features at each split. However, it may struggle with sparsity.
     - **SVM:** Handles high-dimensional sparse data better, especially with the right kernel (e.g., RBF).

11. **Explain the difference between “max_depth” and “min_samples_split” in Decision Trees.**

     - **max_depth:** Limits the depth of the tree, controlling overfitting by stopping growth early.
     - **min_samples_split:** The minimum number of samples required to split a node. Higher values prevent splits on small subsets, which also reduces overfitting.

12. **How does Random Forest handle outliers?**

     - Decision Trees in Random Forest tend to ignore outliers because splits are determined based on the majority of samples, not extreme values. However:
        - Outliers can still affect the bootstrap samples if included multiple times.

13. **How does pruning work in Decision Trees?**

     - **Pre-pruning:** Stops tree growth early (e.g., limiting max_depth or requiring min_samples_split).
     - **Post-pruning:** Fully grows the tree, then removes branches with low significance (e.g., based on validation error).

14. **What are surrogate splits, and why are they useful?**

     Surrogate splits are backup splits used when data for the primary splitting feature is missing. They improve robustness in handling missing values by using other correlated features for the same split.

15. **Why is Random Forest slower than a single Decision Tree?**

     - Random Forest trains multiple trees (typically \(n\)), requiring more computation.
     - Prediction involves combining results from all trees, increasing latency.

16. **How would you explain OOB error to a non-technical person?**

     OOB (Out-of-Bag) error is a way to measure how well the model performs without needing a separate validation dataset. It uses data not seen by each tree during training to evaluate accuracy.

17. **What happens if all features are perfectly correlated in Random Forest?**

     - Trees become similar, as they will repeatedly split on the same features, reducing diversity.
     - Random Forest may lose its advantage over a single Decision Tree.

18. **How can you tune hyperparameters in Random Forest for better performance?**

     - Use grid search or random search to optimize key parameters:
        - n_estimators: Number of trees.
        - max_features: Features to consider at each split.
        - max_depth: Depth of trees.
        - min_samples_split and min_samples_leaf: Minimum samples for splits and leaf nodes.

19. **What is the difference between Extra Trees (Extremely Randomized Trees) and Random Forest?**

     - **Random Forest:** Selects the best split among a subset of features.
     - **Extra Trees:** Splits randomly within a subset of features, increasing speed but reducing precision.

20. **How would you evaluate feature importance using Random Forest in Python?**

     ```python
     from sklearn.ensemble import RandomForestClassifier

     model = RandomForestClassifier()
     model.fit(X, y)

     print(model.feature_importances_)
     ```

21. **How does Random Forest handle imbalanced datasets?**

     - Assign class weights (class_weight="balanced") to penalize the majority class.
     - Use techniques like SMOTE (Synthetic Minority Oversampling) to balance the dataset.

22. **Explain the curse of dimensionality in the context of Decision Trees and Random Forests.**

     High-dimensional data may:
     - Lead to sparse splits, making it harder for trees to find meaningful patterns.
     - Increase computation time in Random Forest due to more features being evaluated.

23. **What are the differences between Gradient Boosting and Random Forest?**

     | Aspect           | Random Forest                     | Gradient Boosting                        |
     |------------------|-----------------------------------|------------------------------------------|
     | Method           | Bagging (averages multiple trees) | Boosting (sequentially corrects errors)  |
     | Speed            | Faster due to parallel training   | Slower, as trees are built sequentially  |
     | Overfitting      | Less prone to overfitting         | Prone to overfitting without tuning      |

24. **What is a limitation of Random Forest’s feature importance scores?**

     They can be biased toward features with many unique values (e.g., continuous variables) compared to categorical features with fewer splits.

25. **How would you explain Decision Trees and Random Forests to a layperson?**

     - **Decision Tree:** Think of a flowchart where each question splits data into smaller groups until a final decision is made.
     - **Random Forest:** Imagine asking multiple experts (decision trees) the same question and combining their answers for better accuracy.
