# 6.1: Ensemble Learning

## Setup

In [1]:
import numpy as np

# pasted from DecisionTreeFun
header = ["level", "lang", "tweets", "phd"]
attribute_domains = {"level": ["Senior", "Mid", "Junior"], 
    "lang": ["R", "Python", "Java"],
    "tweets": ["yes", "no"], 
    "phd": ["yes", "no"]}
X = [
    ["Senior", "Java", "no", "no"],
    ["Senior", "Java", "no", "yes"],
    ["Mid", "Python", "no", "no"],
    ["Junior", "Python", "no", "no"],
    ["Junior", "R", "yes", "no"],
    ["Junior", "R", "yes", "yes"],
    ["Mid", "R", "yes", "yes"],
    ["Senior", "Python", "no", "no"],
    ["Senior", "R", "yes", "no"],
    ["Junior", "Python", "yes", "no"],
    ["Senior", "Python", "yes", "yes"],
    ["Mid", "Python", "no", "yes"],
    ["Mid", "Java", "yes", "no"],
    ["Junior", "Python", "no", "yes"]
]

y = ["False", "False", "True", "True", "True", "False", "True", "False", "True", "True", "True", "True", "True", "False"]
# stitch X and y together to make one table
table = [X[i] + [y[i]] for i in range(len(X))]

## About Ensemble Learning

* Different classifiers have strengths/weaknesses
* Instead of using one classifier, use multiple different classifiers
* Use voting to select prediction results
* Leads to better prediction results

**EXAMPLES:**
* Entropy tends to result in smaller decision trees
* Leads to "better" predictions... but not guaranteed
* Instead use differen attribute selection approaches, and choose the "most popular prediction (among the ensemble)

## Approaches to Ensemble Learning

* If the classifiers are of the same type (e.g. decision trees), the ensemble is **homogeneous** (our focus)

* If different types, the ensemble is **heterogeneous**

Techniques we will be looking at:
* Bagging (Bootstrap Aggregation)
* Random Forests

Different voting schemes:
* (simple) majority voting
* weighted voting

## Bagging (Bootstrap Aggregation)

Recall the bootstrap method:

* BLAH BLAH BLAH

> **Bagging Algorithm**
>
> * Generate $k$ classifiers
> * Each classifier $M_i$ is trained on $D_i$ ($for 1 \le i \le k$)
> * $D_i$ is a bootstrap sample
>
>
> In order to classify an instance $X$:
> * Run each classifier $M_i$ on $X$ to get predicted label $L_i$
> * Each label $L_i$ is a vote for that label
> * Use the majority label (i.e., the mode) as the prediction

### Advantages of Bagging
* simple idea, simple to implement
* reduces overfitting
* generally increases accuracy
* reduces classification variance across classifiers

> **Lab Task #1:** Write a function that returns a random sample of rows with replacement

In [2]:
def compute_bootstrapped_sample(table):
    n = len(table)
    sample = []
    for _ in range(n):
        rand_index = np.random.randint(0, n) # Return random integers from low (inclusive) to high (exclusive)
        sample.append(table[rand_index])
    return sample 

sample = compute_bootstrapped_sample(table)
for row in sample:
    print(row)

['Mid', 'R', 'yes', 'yes', 'True']
['Senior', 'Java', 'no', 'no', 'False']
['Junior', 'R', 'yes', 'yes', 'False']
['Junior', 'Python', 'no', 'no', 'True']
['Junior', 'Python', 'no', 'yes', 'False']
['Junior', 'Python', 'no', 'no', 'True']
['Junior', 'Python', 'no', 'no', 'True']
['Senior', 'R', 'yes', 'no', 'True']
['Senior', 'Python', 'no', 'no', 'False']
['Senior', 'Java', 'no', 'yes', 'False']
['Senior', 'Python', 'yes', 'yes', 'True']
['Mid', 'Python', 'no', 'no', 'True']
['Junior', 'R', 'yes', 'yes', 'False']
['Junior', 'Python', 'yes', 'no', 'True']


## Random Attribute Subsets

* Let $F$ be the size of random attribute subsets, where $F \ge 2$

> **Lab Task #2:** Define a Python function that selects $F$ random attributes from an attribute list

(Note: You can do this with `np.random.choice()`)

In [25]:
def compute_random_subset(values, num_values):
    values_copy = values[:] # shallow copy
    np.random.shuffle(values_copy) # in-place shuffling
    return values_copy[:num_values]

F = 2
print(compute_random_subset(header, F))
att_indexes = list(range(len(header)))
print(compute_random_subset(att_indexes, F))

['phd', 'level']
[1, 0]


> **THE BIG TAKEAWAY**
> * $N$ - the number of weak ones
> * $M$ - the number of best trees from the weak ones
> * $F$ - size of the attribute subsets that we pass to select attribute