# Bagging

> - Combining models (ensamble-based methods)
> - Boostrap aggregation (Bagging)
> - Bagging, Random Forest, and Extra trees classifiers

### Decision Trees are High-variance

![image.png](attachment:image.png)

### Improvement: Use Many trees

Combine predictions to reduce variance.

![image-2.png](attachment:image-2.png)

--- 
### Aggregate results 

Trees vote on or average result for each data point. 

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

### How Many Trees to fit?

![image-6.png](attachment:image-6.png)

### Baggin Error Calculations

![image-7.png](attachment:image-7.png)

### BaggingClassifier: The syntax
```python
# Import the class containing the classification method
from sklearn.ensamble import BaggingClassifier

# Create an instance of the class
BC = BaggingClassifier(n_estimators=50)

# Fit the instance on the data and then predict the expected value
BC = BC.fit(X_train, y_train)
y_predict = BC.predict(X_test)

Tune parameters with cross-validation. Use BagginRegressor for regression
```

---

### Reduction in Variance Due To Bagging 


![image-8.png](attachment:image-8.png)

### Introducing More Randomnes 

![image-9.png](attachment:image-9.png)

### How many random forest trees?

![image-10.png](attachment:image-10.png)

### RandomForest: The Syntax 

### BaggingClassifier: The syntax
```python
# Import the class containing the classification method
from sklearn.ensamble import RandomForestClassfier

# Create an instance of the class
RC = RandomForestClassifier(n_estimators=50)

# Fit the instance on the data and then predict the expected value
RC = RC.fit(X_train, y_train)
y_predict = RC.predict(X_test)

Tune parameters with cross-validation. Use RandomForestRegressor for regression
```
---

### Introducing Even More Randomness

- Sometimes additional randomness is desired beyond Random Forest.
- Solution: Select features randomly and create splits randomly - Don't choose greedily
- Called "Extra Random Trees"

### Extra treesclassifier: The syntax

```python
# Import the class containing the classification method
from sklearn.ensamble import ExtraTreesClassifier

# Create an instance of the class
EC = RandomForestClassifier(n_estimators=50)

# Fit the instance on the data and then predict the expected value
EC = EC.fit(X_train, y_train)
y_predict = EC.predict(X_test)

Tune parameters with cross-validation. Use ExtraTreesRegressor for regression
```

---

### Boosting and Stacking (metaclasificadores)

> - The boosting approach to combining models 
> - Types of boosting models: Gradient Boosting, AdaBoost
> - Boosting loss functions 
> - Combining heterogeneous classifiers

### Review of Bagging

![image-11.png](attachment:image-11.png)

### Aggregate results 

![image-12.png](attachment:image-12.png)

![image-13.png](attachment:image-13.png)

### Decision Stump: Boosting Base Learner

![image-14.png](attachment:image-14.png)

---

### Overview of Boosting 

![image-15.png](attachment:image-15.png)

![image-16.png](attachment:image-16.png)

![image-17.png](attachment:image-17.png)

![image-18.png](attachment:image-18.png)

![image-19.png](attachment:image-19.png)

![image-20.png](attachment:image-20.png)

![image-21.png](attachment:image-21.png)

---

### Boosting Specifics 

![image-22.png](attachment:image-22.png)

![image-23.png](attachment:image-23.png)

### AdaBoost Loss Function 

![image-24.png](attachment:image-24.png)

### Gradient Boosting Loss Function

![image-25.png](attachment:image-25.png)

### Bagging Vs Boosting 

![image-26.png](attachment:image-26.png)

### Tuning a Gradient Boosted Model

![image-27.png](attachment:image-27.png)

![image-28.png](attachment:image-28.png)

### GradientBoostingClassifier: The Syntax

```python
# Import the class containing the classification method
from sklearn.ensamble import GradientBoostingClassifier

# Create an instance of the class 
GBC = GradientBoostingClassifier(learning_rate=0.1, 
                                max_features=1, 
                                subsample=0.5,
                                n_estimators = 200)

# Fit the instance on the data and then predict the expected value
GBC = GBC.fit(X_train, y_train)
y_predict = GBC.predict(X_test)

Tune with cross-validation. Use GradientBoostingRegressor for regression.

```

### AdaBoostClassifier: The syntax 

```python
# Import the class containing the classification method
from sklearn.ensamble import AdaBoostclassifier
from sklearn.ensamble import DecisionTreeClassifier


# Create an instance of the class 
GBC = AdaBoostclassifier(base_estimator=DecisionTreeClassifier(), 
                            learning_rate=0.1, 
                            n_estimators = 200)

# Fit the instance on the data and then predict the expected value
ABC = ABC.fit(X_train, y_train)
y_predict = ABC.predict(X_test)

Tune with cross-validation. Use AdaBoostRegressor for regression.
```
---

### Stacking: Combining Classifiers

![image-29.png](attachment:image-29.png)

![image-30.png](attachment:image-30.png)

---

### VotingClassifier: The Syntax 


```python
# Import the class containing the classification method
from sklearn.ensamble import VotingClassifier


# Create an instance of the class 
VC = VotingClassifier(estimator_list) # list of fitted models and how to combine


# Fit the instance on the data and then predict the expected value
VC = VC.fit(X_train, y_train)
y_predict = VC.predict(X_test)

Use VotingRegressor for regression

The StackingClassifer (or StackingRegressor) works similarly:

SC = StacokingClassifer (estimator_list, final_estimator = LogisticRegression())
```

XGBoost reading