<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Extracting Base Estimators from Bagged Models Lab

_Instructor: Husain Amer_


---

In this lab, you will have to make use of the attributes available with sklearn's [BaggingClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html). In particular
you will need to investigate what you can do with 
- `.base_estimator_`
- `.estimators_`
- `.estimators_samples_`
- `.estimators_features_`

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Load-the-breast-cancer-data." data-toc-modified-id="1.-Load-the-breast-cancer-data.-1">1. Load the breast cancer data.</a></span></li><li><span><a href="#2.-Load-required-sklearn-packages." data-toc-modified-id="2.-Load-required-sklearn-packages.-2">2. Load required sklearn packages.</a></span></li><li><span><a href="#3.-Make-a-train-test-split." data-toc-modified-id="3.-Make-a-train-test-split.-3">3. Make a train-test split.</a></span></li><li><span><a href="#4.-Create-and-fit-a-BaggingClassifier-with-a-DecisionTreeClassifier-base-estimator." data-toc-modified-id="4.-Create-and-fit-a-BaggingClassifier-with-a-DecisionTreeClassifier-base-estimator.-4">4. Create and fit a <code>BaggingClassifier</code> with a <code>DecisionTreeClassifier</code> base estimator.</a></span></li><li><span><a href="#5.-Pull-out-the-base-estimator-from-the-ensemble-model." data-toc-modified-id="5.-Pull-out-the-base-estimator-from-the-ensemble-model.-5">5. Pull out the base estimator from the ensemble model.</a></span></li><li><span><a href="#6.-Pull-out-all-the-base-estimators." data-toc-modified-id="6.-Pull-out-all-the-base-estimators.-6">6. Pull out <em>all</em> the base estimators.</a></span></li><li><span><a href="#7.-Get-the-features-used-in-each-of-the-bagged-base-estimators." data-toc-modified-id="7.-Get-the-features-used-in-each-of-the-bagged-base-estimators.-7">7. Get the features used in each of the bagged base estimators.</a></span></li><li><span><a href="#8.-Create-a-list-of-the-features-used-in-the-first-base-estimator." data-toc-modified-id="8.-Create-a-list-of-the-features-used-in-the-first-base-estimator.-8">8. Create a list of the features used in the first base estimator.</a></span></li><li><span><a href="#9.-Get-out-the-samples-used-in-our-first-base-estimator." data-toc-modified-id="9.-Get-out-the-samples-used-in-our-first-base-estimator.-9">9. Get out the samples used in our first base estimator.</a></span></li><li><span><a href="#10.-Get-out-the-target-subsample-for-the-estimator." data-toc-modified-id="10.-Get-out-the-target-subsample-for-the-estimator.-10">10. Get out the target subsample for the estimator.</a></span></li><li><span><a href="#11.-Fit-a-decision-tree-equivalent-to-our-first-base-estimator." data-toc-modified-id="11.-Fit-a-decision-tree-equivalent-to-our-first-base-estimator.-11">11. Fit a decision tree equivalent to our first base estimator.</a></span></li><li><span><a href="#12.-Bonus:-Take-each-of-the-decision-trees-from-the-ensemble-above-and-obtain-its-predictions-for-the-target-variable-in-the-test-set.-Use-majority-voting-to-obtain-the-ensemble-prediction-for-the-target-label.-Compare-with-the-bagging-classifier-score." data-toc-modified-id="12.-Bonus:-Take-each-of-the-decision-trees-from-the-ensemble-above-and-obtain-its-predictions-for-the-target-variable-in-the-test-set.-Use-majority-voting-to-obtain-the-ensemble-prediction-for-the-target-label.-Compare-with-the-bagging-classifier-score.-12">12. Bonus: Take each of the decision trees from the ensemble above and obtain its predictions for the target variable in the test set. Use majority voting to obtain the ensemble prediction for the target label. Compare with the bagging classifier score.</a></span></li></ul></div>

### 1. Load the breast cancer data.

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

# Converting data into a dataframe structure 
X = pd.DataFrame(data['data'], columns=data['feature_names'])
# Setting up our Y value as well
y = pd.Series(data['target'])

### 2. Load required sklearn packages.

In [2]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

### 3. Make a train-test split 70-30 or your choice.

In [3]:
# A:
X_train , X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y)

### 4. Create and fit a `BaggingClassifier` with a `DecisionTreeClassifier` base estimator.

- Fit on the training data.
- Report the score on the test data.

In [4]:
# A:
# Initiate and fit the classifier
dt = DecisionTreeClassifier()
dt_bagging = BaggingClassifier(base_estimator=dt, n_estimators=20, max_samples=0.8, max_features=0.8)
dt_bagging.fit(X_train,y_train)

# Score our classifier
print('The score of test data is ',dt_bagging.score(X_test,y_test))

The score of test data is  0.9590643274853801


<font color='blue'> Note: go to the below link to see how to answer the coming questions
* https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html

### 5. Pull out the base estimator from the ensemble model.

In [6]:
# A:
# Get the base estimator used
dt_bagging.base_estimator_

DecisionTreeClassifier()

### 6. Pull out *all* the base estimators.

In [7]:
# A:
# Get all the base estimators
dt_bagging.estimators_

[DecisionTreeClassifier(random_state=742235302),
 DecisionTreeClassifier(random_state=523110125),
 DecisionTreeClassifier(random_state=217458536),
 DecisionTreeClassifier(random_state=192344927),
 DecisionTreeClassifier(random_state=1006122287),
 DecisionTreeClassifier(random_state=206305134),
 DecisionTreeClassifier(random_state=549596221),
 DecisionTreeClassifier(random_state=1234355400),
 DecisionTreeClassifier(random_state=507752521),
 DecisionTreeClassifier(random_state=611645019),
 DecisionTreeClassifier(random_state=1722580582),
 DecisionTreeClassifier(random_state=929325106),
 DecisionTreeClassifier(random_state=1543499687),
 DecisionTreeClassifier(random_state=415884873),
 DecisionTreeClassifier(random_state=311351614),
 DecisionTreeClassifier(random_state=1823198891),
 DecisionTreeClassifier(random_state=1486366474),
 DecisionTreeClassifier(random_state=736829200),
 DecisionTreeClassifier(random_state=1744152442),
 DecisionTreeClassifier(random_state=1896944775)]

### 7. Get the features used in each of the bagged base estimators.

In [8]:
# A:
# Check features used in each estimator
dt_bagging.estimators_features_

[array([23,  3, 17, 11,  1, 18, 10, 14, 15, 29,  4, 16,  5,  8, 12, 28, 21,
        25, 22,  9,  0, 27, 26,  7]),
 array([12,  4,  6, 29, 19,  1, 28, 23, 16,  5,  9,  8, 18,  0, 20, 11,  7,
         2, 17, 10, 15, 25,  3, 27]),
 array([26, 13, 17, 16, 23, 21,  0,  4,  5, 27, 28, 25, 14, 19, 12, 15, 29,
        24,  2,  3,  1,  7, 10,  9]),
 array([10,  2, 21, 28, 22, 16, 17, 23, 24, 11,  7, 18, 29, 19, 13,  8,  4,
         6,  1,  0,  5, 14,  9, 25]),
 array([25, 23, 13, 26,  6, 21,  1,  2, 10,  5, 19, 18, 11, 24,  9,  0, 12,
         4, 22, 27, 29, 14,  8,  3]),
 array([ 8,  2,  1, 29,  3, 18, 25, 11,  4,  9, 16,  7, 20, 10, 13,  0, 21,
        28, 27, 23,  6,  5, 17, 15]),
 array([28, 23,  9, 20, 17, 26, 24,  0, 19,  8, 15, 11,  6, 27, 12, 22, 18,
        10, 25,  1, 16,  4,  3,  5]),
 array([17, 10, 24, 22,  2, 19, 25,  7, 18,  4, 15,  6, 14,  3,  5, 12, 27,
        11, 26, 28,  1, 21,  0,  9]),
 array([11, 17, 15, 19,  1, 24, 22, 29, 26, 20,  2, 21, 27, 14, 10,  0, 23,
        13, 

### 8. Create a list of the features used in the first base estimator.

In [9]:
# A:
# Get features used in the first estimator
features_list = dt_bagging.estimators_features_[0]

### 9. Get out the samples used in our first base estimator.

In [10]:
# A:
# Get samples used in the first base estimator
row_list = dt_bagging.estimators_samples_[0]

### 10. Fit a decision tree equivalent to our first base estimator.

In [11]:
# A:
# fit the classifer with first base estimator train and test samples
X_train_base_estimator1= X_train.iloc[row_list,features_list]
y_train_base_estimator1= y_train.iloc[row_list]
dt.fit(X_train_base_estimator1,y_train_base_estimator1)

DecisionTreeClassifier()