In [11]:
using PlotlyJS

## Variance Reduction

Bagging’s main advantage is that it reduces forecasts’ variance, hence helping address
overfitting. The variance of the bagged prediction ($\phi_i [c]$) is a function of the number
of bagged estimators (N), the average variance of a single estimator’s prediction ($\sigma$), and the average correlation among their forecasts ($\bar{\rho}$): (part 6.3.1)

In [12]:
function calculateVarianceReduction(N, ρBar; σBar=1)
    reduction = σBar^2 * (ρBar + (1 - ρBar) / N)
    reduction
end

calculateVarianceReduction (generic function with 1 method)

In [13]:
N, ρBar, σBar = collect(5:1:30), collect(0.0:0.01:1.0), 1;

grid = Iterators.product(N, ρBar) |> collect 

reductions = map((nρBarPair) -> calculateVarianceReduction(nρBarPair[1], nρBarPair[2]; σBar=σBar), grid)

toSavePlot = plot(
    heatmap(z=reductions, x=ρBar, y=N),
    Layout(
        title="Accuracy of a bagging classifier as a function of n and p",
        xaxis_title="Accuracy of a bagging classifier as a function of the individual estimator’s accuracy (p)",
        yaxis_title="the number of estimators (n)",
    )
)   

PlotlyJS.savefig(toSavePlot, "plot1.png")

"plot1.png"

# Snippet 6.1
## Low bound on Bagging Classifier Accuracy

the bagging classifier’s accuracy exceeds the average
accuracy of the individual classifiers. Snippet 6.1 implements this calculation.

In [14]:
include("snippet_6.1.jl")

n, p, k = 1000, 1.0 / 3.0, 3.0

n, p, k = 100, 0.2, 2
accuracy = calculateBaggingClassifierAccuracy(n, p, k)
print(accuracy)

5.174329575275358496185881479638123523738939405934833816058752784178385331142243e-12

In [15]:
N, P, k = collect(1:1:101), collect(0.2:0.01:0.8), 2;

grid = Iterators.product(N, P) |> collect 

accuracies = map((np) -> calculateBaggingClassifierAccuracy(np[1], np[2], k), grid)

toSavePlot = plot(
    heatmap(z=accuracies, x=P, y=N),
    Layout(
        title="Accuracy of a bagging classifier as a function of n and p",
        xaxis_title="Accuracy of a bagging classifier as a function of the individual estimator’s accuracy (p)",
        yaxis_title="the number of estimators (n)",
    )
)   

PlotlyJS.savefig(toSavePlot, "plot2.png")

"plot2.png"