# Bagging With Random Forests

## Ensemble methods

#### In machine learning, an ensemble method is a machine learning model that aggregates the predictions of individual models. Since ensemble methods combine the results of multiple models, they are less prone to error, and therefore tend to perform better.

## n_estimators

#### Random forests are powerful when there are many trees in the forest. How many is enough? Recently, scikit-learn defaults changed from 10 to 100. While 100 trees may be enough to cut down on variance and obtain good scores, for larger datasets, 500 or more trees may be required.

## warm_start

#### The warm_start hyperparameter is great for determining the number of trees in the forest (n_estimators). When warm_start=True, adding more trees does not require starting over from scratch. If you change n_estimators from 100 to 200, it may take twice as long to build the forest with 200 trees. When warm_start=True, the random forest with 200 trees does not start from scratch, but rather starts where the previous model stopped.

In [None]:
randomized_search_reg(params={'min_weight_fraction_leaf':[0.0, 0.0025, 0.005, 0.0075, 0.01, 0.05],
                              'min_samples_split':[2, 0.01, 0.02, 0.03, 0.04, 0.06, 0.08, 0.1],
                              'min_samples_leaf':[1,2,4,6,8,10,20,30],
                              'min_impurity_decrease':[0.0, 0.01, 0.05, 0.10, 0.15, 0.2],
                              'max_leaf_nodes':[10, 15, 20, 25, 30, 35, 40, 45, 50, None], 
                              'max_features':['auto', 0.8, 0.7, 0.6, 0.5, 0.4],
                              'max_depth':[None,2,4,6,8,10,20]})

In [None]:
randomized_search_reg(params={
  'min_samples_leaf': [1,2,4,6,8,10,20,30], 
  'min_impurity_decrease':[0.0, 0.01, 0.05, 0.10, 0.15, 0.2],
  'max_features':['auto', 0.8, 0.7, 0.6, 0.5, 0.4], 
  'max_depth':[None,2,4,6,8,10,20]})

In [None]:
randomized_search_reg(params=
                      {
                        'min_samples_leaf':[1,2,4,6,8,10,20,30],
                        'min_impurity_decrease':[0.0, 0.01, 0.05, 0.10, 0.15, 0.2],
                        'max_features':['auto', 0.8, 0.7, 0.6, 0.5, 0.4],
                        'max_depth':[None,4,6,8,10,12,15,20]}, runs=20)

## Random forest drawbacks

#### At the end of the day, the random forest is limited by its individual trees. If all trees make the same mistake, the random forest makes this mistake. There are scenarios, as is revealed in this case study before the data was shuffled, where random forests are unable to significantly improve upon errors due to challenges within the data that individual trees are unable to address.