### Avereging ensemble methods

    Simple averaging: (model1 + model2) / 2
    The average error is smaller. However, the model doesn't do better as an individual models for the areas where the models were doing really well, nevertheless, it does better on average.
    
    Weighted averaging: model1 * 0.7 + model2 * 0.3
    If the models have quite similar predictive powen, than their square is no better and that makes sense and it doesn't make sense to rely more in one.
    
    Condition averaging: Prediction of model1 if condition1 else prediction of model2
    There are ensemble methods that are very good at finding these relationships of two or more predictions in respect to the target variable.

### Bagging

    Means avereging slightly different versions of the same model to improve accuracy - Decision tree
    There are two main sources of errors in modeling:
        1. Errors due to Bias (underfitting)
        2. Errors due to Varience (overfitting)
    Bagging parameteres:
        1. Changing the seed
        2. Row(Sub) sampling ot Bootstrapping
        3. Shuffling
        4. Column(Sub) sampling
        5. Model-specific parameteres
        6. Number of models (or bags) usually more than 10
        7. (Optionally) parallelism
    Models are independent of each other

In [4]:
from sklearn.ensemble import BaggingClassifier, BaggingRegressor

model = RandomForestRegressor()
bags = 10
seed = 1

bagged_prediction = np.zeroz(test.shape[0])
for n in range(0, bags):
    model.set_params(random_state = seed + n)
    model.fit(train, y)
    preds = model.predict(test)
    bagged_prediction += preds
bagged_prediction /= bags    

### Boosting

    A form of weighted avareging of models where each model os built sequentially via taking into account the past model performance
    Main boosting types:
        * Weight based
        * Residual based
        
     1) Weight based Boosting
     
         Step 1. Make predictions -> Calculate absolute errors -> Create weights based on it 
         
<img src="files/Images/boost1.png" width="400" height="100"> 

         Step 2. Add next model into ensemble and take into account the weights from the previous model. Rows with bigger weight would get more attention while building new predictions
         
<img src="files/Images/boost2.png" width="400" height="100"> 

    
         Parameters:
             1. Learning rate (or shrinkage or eta)
             predictionN = pred0*eta + pred1*eta + ... + predN * eta
             2. Number of estimators(The relationshop between number of estimators and learning rate is close to linear) To incr
                The logic: Take 100 estimators at start -> find best learning rate(0.1). Than to increse estimators number twice(200) we should divide learning rate by 2 (0,05)
             3. Input model - can be anything that accepts weights
             4. Sub boosting type:
                 * Adaboost - Sklearn(Python)
                 * Logitboost - Weka(Java)
                 
                 
      2) Residual based Boosting
          Extremly successful 
          
          Step 1. Сalculate the error of these predictions but this time, not in absolute terms because we're interested about the direction of the error.
<img src="files/Images/boost3.png" width="400" height="100"> 
                  
          Step 2. Adding new y variable so the error now becomes the new target variable and we use the same features in order to predict this error. To predict Rownum = 1 we would say: Final prediction = 0.75 + 0.20 = 0.95
<img src="files/Images/boost4.png" width="400" height="100"> 

         Parameters:
             1. Learning rate (or shrinkage or eta)
             predictionN = pred0 + pred1*eta + ... + predN * eta
             2. Number of estimators
             3. Row sub() sampling
             4. Column sub() sampling
             5. Input method - best works with trees
             6. Sub boosting type:
                 * Fully gradient based
                 * Dart
          Implementation: 
              * XGBoost
              * LightGBM
              * H2O's GBM
              * Catboost
              * Sklearn's GBM

### Stacking

    Means making predictions of a number of models in a hold-out set and then using a different (Meta) model to train on these predictions
    
    Methodology:
        1. Splitting the train set into two disjoint sets
        2. Train several base learners on the first part
        3. Make predictions with the base learners on the second(validation) part
        4. Using the predictions from (3) as the input a higher level learner
        
<img src="files/Images/stack.png" width="400" height="100"> 
<img src="files/Images/stack2.png" width="400" height="100"> 
    
    Thinks to be mindful of:
        * With time-sensetive data - respect time
        * Diversity as important as performance
        * Diversity may come from:
            - Different algorighms
            - Different input features
        * Performance plateuing after N models
        * Meta model is normally modest
        
    It will find when a model is good, and when a model is actually bad or fairly weak. So you don't need to worry too much to make all the models really strong, stacking can actually extract the juice from each prediction. Therefore, what you really need to focus is, am I making a model that brings some information, even though it is generally weak? Weak models can bring in new information that the meta model could leverage.
     For example, in one data set you may treat categorical features as one whole encoding. In another, you may just use label in coding, and the result will probably produce a model that is very different.
     
    Therefore, it is quite often that the meta model is generally simpler. So if I was to express this in a random forest context, it will have lower depth than what was the best one you found in your base models.

In [5]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.model_selection import train_test_split

In [None]:
training, valid, ytraining, yvalid = train_test_split(train, y, test_size = 0.5)

model1 = RandomForestRegressor()
model2 = LinearRegression()

model1.fit(training, ytraining)
model2.fit(training, ytraining)

preds1 = model1.predict(valid)
preds2 = model2.predict(valid)

test_preds1 = model1.predict(test)
test_preds2 = model2.predict(test)

stacking_predictions = np.column_stack((preds1, preds2))
stacked_test_predictions = np.column_stack((test_preds1, test_preds2))

meta_model = LinearRegression()
meta_model.fit(stacking_predictions, yvalid)
final_predictions = meta_model.predict(stacked_test_predictions)

### StackNet