# Chapter 7. Minimum, Maximum, and Mixture

[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap07.html)

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

# Imports

In [None]:
import DataFrames as Dfs
import Distributions as Dsts
import CairoMakie as Cmk

In [None]:
include("./pmf.jl")
import .ProbabilityMassFunction as Pmf

include("./simplestat.jl")
import .SimpleStatistics as Ss

## Reminder from previous chapters

Bayes's theorem:

$P(H|D) = \frac{P(H) * P(D|H)}{P(D)}$

Bayes's rule:

$odds(A|D) = odds(A) * \frac{P(D|A)}{P(D|B)}$

## Cumulative Distribution Functions

A useful alternative to probability mass function (Pmf) that we used before is
**cumulative distribution function** or CDF.

Let's look at it with the Euro problem from chapter 4 (Estimating Proportions).

> When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times
> and tails 110. "It looks very suspicious to me", said Barry Blight, a
> statistics lecturer at the London School of Economics. "If the coin were
> unbiased, the  chance of getting a result as extreme as that would be less
> than 7%".

In [None]:
euroPmf = Pmf.getPmfFromSeq(range(0, 1, 101) |> collect)
heads = 140
tosses = 250

In [None]:
Pmf.updateBinomial!(euroPmf, heads, tosses)

In [None]:
mutable struct Cdf{T}
    names::Vector{T}
    posteriors::Vector{Float64}

    # posteriors are uniform, i.e. initially each prior is equally likely
    Cdf(ns::Vector{Int}, posts) =
        (length(ns) != length(posts)) ?
        error("length(names) must be equal length(posteriors)") :
        new{Int}(ns, posts)
    Cdf(ns::Vector{Float64}, posts) =
        (length(ns) != length(posts)) ?
        error("length(names) must be equal length(posteriors)") :
        new{Float64}(ns, posts)
    Cdf(ns::Vector{String}, posts) =
        (length(ns) != length(posts)) ?
        error("length(names) must be equal length(posteriors)") :
        new{String}(ns, posts)
end

function Base.show(io::IO, cdf::Cdf)
    trim::Bool = length(cdf.names) > 10
    result::String = "names: $(join(trim ? cdf.names[1:10] : cdf.names, ", "))$(trim ? ", ..." : "")\n"
    result = result * "posteriors: $(join(map(x -> round(x, digits=3) |> string, trim ? cdf.posteriors[1:10] : cdf.posteriors),  ", "))$(trim ? ", ..." : "")\n"
    print(io, result)
end

In [None]:
"""
	convertPmf2Cdf(pmfDist::pmf.Pmf{T}, usePriors::Bool=true)::Cdf{T}

	returns cdf build from pmf

	--
	args:
		pmfDist: Pmf struct
		usePiors: if true then pmfDist.priors are used to construct cdf
				otherwise pmfDist.posteriors are used to construct cdf
"""
function convertPmf2Cdf(pmfDist::Pmf.Pmf{T}, usePriors::Bool=true)::Cdf{T} where T
	if usePriors
		return Cdf(pmfDist.names, cumsum(pmfDist.priors))
	else
		return Cdf(pmfDist.names, cumsum(pmfDist.posteriors))
	end
end

function convertCdf2Pmf(cdfDist::Cdf{T})::Pmf.Pmf{T} where T
	diffs::Vector{Float64} = diff(cdfDist.posteriors)
	prepend!(diffs, cdfDist.posteriors[1])
	return Pmf.Pmf(cdfDist.names, diffs)
end

In [None]:
euroCdf = convertPmf2Cdf(euroPmf, false)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    euroPmf.names, euroPmf.posteriors,
    color="orange",
    axis=(;
        title="Posterior distribution for the Euro problem",
        xlabel="Proportion of heads(x)",
        ylabel="Probability")
)
l2 = Cmk.lines!(fig[1, 1],
    euroCdf.names, euroCdf.posteriors,
    color="blue"
)
Cmk.axislegend(ax1,
    [l1, l2],
    ["PMF", "CDF"],
    "Probability\ndistributions\nfunctions",
    position=:lt
)
fig

The range of the CDF is always from 0 to 1, in contrast with the PMF, where the
maximum can be any probability.

In [None]:
"""
	getNameGtEqPosterior(cdfDist::Cdf{T}, posterior::Float64)::T

	returns name from cdfDist.names that is >= posterior
"""
function getNameGtEqPosterior(cdfDist::Cdf{T}, posterior::Float64)::T where T
	@assert 0 <= posterior <= 1
	return cdfDist.names[findfirst(x -> x >= posterior, cdfDist.posteriors)]
end

function getPosteriorGtEqName(cdfDist::Cdf{T}, name::T)::Float64 where {T<:Union{Int, Float64}}
	return cdfDist.posteriors[findfirst(x -> x == name, cdfDist.names)]
end

In [None]:
# what percentile is smaller than 0.61
getPosteriorGtEqName(euroCdf, 0.61)

In [None]:
# what proportion of heads is approx. 0.96 percentile
getNameGtEqPosterior(euroCdf, 0.96)

In [None]:
function getCredibleInterval(cdfDist::Cdf{T}, prob::Float64)::Vector{T} where T
	@assert 0 <= prob <= 1
	probs::Vector{Float64} = [0.5 - prob / 2, 0.5 + prob / 2]
	return [getNameGtEqPosterior(cdfDist, p) for p in probs]
end

In [None]:
# credible interval = 0.9
getCredibleInterval(euroCdf, 0.9)

In [None]:
# credible interval = 0.9
Pmf.getCredibleInterval(euroPmf, 0.9)

In [None]:
# transform Cdf to Pmf
convertCdf2Pmf(euroCdf)

## Best Three of Four

In *Dungeons & Dragons*, each character has six attributes: strength,
intelligence, wisdom, dexterity, constitution, and charisma.

To generate a new character, players roll four 6-sided dice for each attribute
and add up the best three. For example, if I roll for strength and get 1, 2, 3,
4 on the dice, my character’s strength would be the sum of 2, 3, and 4, which is
9.

As an exercise, let’s figure out the distribution of these attributes. Then, for
each character, we’ll figure out the distribution of their best attribute.

First something simpler, three dice only

In [None]:
dice6s = Pmf.getPmfFromSeq(1:6 |> collect)
dices6s = repeat([dice6s], 3)

In [None]:
pmf3dice6 = reduce(Pmf.addDist, dices6s)

In [None]:
Pmf.drawLinesPriors(pmf3dice6,
    "Distributions of attributes",
    "Outcome",
    "PMF"
    )

Now, 4 dice, we choose best 3

In [None]:
nThrows = Int(1e4)
a = rand(1:6, nThrows, 4);

In [None]:
# dims = 1 by/along columns
# dims = 2 by/along rows
# julia is column major, so I guess sorting, summing would be faster with dims=1
# I leave dims=2 for sake of consistency with the chapter in the book
sort!(a, dims=2)
t = sum(a[:, 2:4], dims=2) # dims = 2, sum by/along rows
t = t[:, 1];

In [None]:
pmfBest3of4 = Pmf.getPmfFromSeq(t);

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    pmf3dice6.names, pmf3dice6.priors,
    color="blue",
    axis=(;
        title="Distribution of attributes",
        xlabel="Outcome: sum of dots on 6 sided dice",
        ylabel="PMF",
        xticks=3:18)
)
l2 = Cmk.lines!(fig[1, 1],
    pmfBest3of4.names, pmfBest3of4.priors,
    color="orange",
    linestyle=:dash
)
Cmk.axislegend(ax1,
    [l1, l2],
    ["sum of 3 dice", "sum best 3 of 4"],
    "type of dice throw",
    position=:lt
)
fig

Choosing the best three out of four tends to yield higher values.

## Maximum

To compute the distribution of a maximum or minimum, we can make good use of the
cumulative distribution function.

In [None]:
cdfBest3of4 = convertPmf2Cdf(pmfBest3of4)

`Cdf(x)` is the sum of probabilities for quantities less than or equal to `x`.
Equivalently, it is the probability that a random value chosen from the
distribution is less than or equal to `x`.

Now suppose I draw 6 values from this distribution.
The probability that all 6 of them are less than or equal to `x` is `Cdf(x)`
raised to the 6th power, which we can compute like this:

In [None]:
cdfBest3of4.posteriors .^ 6

If all 6 values are less than or equal to `x`, that means that their maximum is
less than or equal to `x`. So the result is the `CDF` of their maximum. We can
convert it to a `Cdf` object, like this:

In [None]:
cdfMax6 = Cdf(cdfBest3of4.names, cdfBest3of4.posteriors .^ 6)

And compute the equivalent `Pmf` like this:

In [None]:
pmfMax6 = convertCdf2Pmf(cdfMax6)

In [None]:
Pmf.drawLinesPriors(pmfMax6,
    "Distribution of attributes",
    "Outcome",
    "PMF")

Most *Dungeons & Dragons* characters have at least one attribute greater than
12; almost 10% of them have an 18.

The following figure shows the CDFs for the three distributions we have
computed.

In [None]:
cdf3dice6 = convertPmf2Cdf(pmf3dice6)
cdfBest3of4 = convertPmf2Cdf(pmfBest3of4)

fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    cdf3dice6.names, cdf3dice6.posteriors,
    color="blue", linestyle=:solid, linewidth=3,
    axis=(;title="Distribution of attributes",
        xlabel="Outcome", ylabel="CDF",
        xticks=4:2:18,
    )
)
l2 = Cmk.lines!(fig[1, 1],
    cdfBest3of4.names, cdfBest3of4.posteriors,
    color="orange", linestyle=:dash, linewidth=3
)
l3 = Cmk.lines!(fig[1, 1],
    cdfMax6.names, cdfMax6.posteriors,
    color="green", linestyle=:dot, linewidth=3
)
Cmk.axislegend(ax1,
    [l1, l2, l3],
    ["sum of 3 dice", "best 3 of 4 dice", "max of 6 attributes"],
    "type of distribution",
    position=:lt
)
fig

Let's write a function to automate computing maximum.

In [None]:
"""
	Computes and returns the distribution of a maximum of a cdf

	---
	args:
		n: Int, drawing n times from cdfDist,
			returns cdf where cdf(x) prob. that all n of drawings are <= to x
"""
function getMaxCdfDist(cdfDist::Cdf{T}, n::Int)::Cdf{T} where T
	cdfMaxN::Vector{Float64} = cdfDist.posteriors .^ n
	return Cdf(cdfDist.names, cdfMaxN)
end

Let's see if it works.

In [None]:
x = getMaxCdfDist(cdfBest3of4, 6)

In [None]:
all(x.names .== cdfMax6.names),
all(x.posteriors .== cdfMax6.posteriors)

It does.

## Minimum
To compute the distribution of the minimum, we'll use the **complementary CDF**,
which we can compute like this:

In [None]:
probGt = 1 .- cdfBest3of4.posteriors

As the variable name suggests, the complementary CDF is the probability that a
value from the distribution is greater than `x`. If we draw 6 values from the
distribution, the probability that all 6 exceed `x` is:

In [None]:
probGt6 = probGt .^ 6

If all 6 exceed `x` that means their minimum exceeds `x`, so `probGt6` is the
complementary CDF of the minimum. And that means we can compute the CDF of the
minimum liek this:

In [None]:
probLe6 = 1 .- probGt6

In [None]:
cdfMin6 = Cdf(cdfBest3of4.names |> copy, probLe6)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(
    fig[1, 1],
    cdfMax6.names, cdfMax6.posteriors,
    color="green", linestyle=:dot,
    axis=(;title="Minimum and maximum of six attributes",
        xlabel="Outcome", ylabel="CDF",
        xticks=4:2:18
    )
)
l2 = Cmk.lines!(
    fig[1, 1],
    cdfMin6.names, cdfMin6.posteriors,
    color="purple", linestyle=:solid,
)
Cmk.axislegend(ax1,
    [l1, l2],
    ["maximum of 6", "minimum of 6"],
    "type of distribution",
    position=:lt
)
fig

Now let's write a function that will speed up these calculations for us.

In [None]:
"""
Computes and returns the distribution of a minimum of a cdf

---
args:
    n: Int, drawing n times from cdfDist(x),
            returns cdf where cdf(x) prob. that all n of drawings are >= x
"""
function getMinCdfDist(cdfDist::Cdf{T}, n::Int)::Cdf{T} where T
    # prob that a val from cdfDist is greater than x (a given cdfDist.name)
    probGt::Vector{<:Float64} = 1 .- cdfDist.posteriors
    # prob that all n vals drawn from dist exceed x
    # (their min exceeds x)
    probGtN::Vector{ <:Float64 } = probGt .^ n
    probLEqN::Vector{<:Float64} = 1 .- probGtN
    return Cdf(cdfDist.names |> copy, probLEqN) 
end

In [None]:
x = getMinCdfDist(cdfBest3of4, 6)

In [None]:
all(x.names .== cdfMin6.names),
all(x.posteriors .== cdfMin6.posteriors)

## General Mixtures

Suppose three more monsters join the combat, each of them with a battle axe that
causes one 8-sided die of damage. Still, only one monster attacks per round,
chosen at random, so the damage they inflict is a mixture of:

- One 4-sided die,
- Two 6-sided dice, and
- Three 8-sided dice.

In [None]:
# Pmf represents a randomly chosen monster
hypos = [4, 6, 8]
counts = [1, 2, 3]
# Pmf constructor, normalizes priors by default
pmfDice = Pmf.Pmf(hypos, counts)
pmfDice

In [None]:
# sequence of Pmf objects to represent the dice
dice = [Pmf.getPmfFromSeq(1:nSides |> collect) for nSides in hypos]

In [None]:
df = Dfs.DataFrame(nDots=1:8)

In [None]:
for d in dice
    df[:, string("d", length(d.names))] = Pmf.getPriorsByNames(d, df[:, :nDots])
end

In [None]:
df

In [None]:
# distribution of the mixture is weighted average of the dice
# weights (priors in pmfDice)
for rowNum in 1:size(df, 1)
    df[rowNum, 2:end] = Vector(df[rowNum, 2:end]) .* pmfDice.priors
end;

In [None]:
df

In [None]:
# sum by row, because one of the monsters (dices) strikes
# so we throw d4 or d6 or d8
df.posteriors = df[:, 2:end] |> Array |> a -> sum(a, dims=2) |> vec;

In [None]:
df

In [None]:
fig = Cmk.Figure()
Cmk.barplot(fig[1, 1],
            df.nDots, df.posteriors,
            color="lightblue",
            axis=(;title="Distribution of damage with three different weapons",
                  xlabel="Outcome", ylabel="PMF")
)
fig

Let's put that to a function.

In [None]:
function padVect(v::Vector{Float64}, finalLen::Int, padVal::Float64=0.0)::Vector{Float64}
	return [get(v, i, padVal) for i in 1:finalLen]
end

In [None]:
"""getMixture(pmfDist::pmf.Pmf{Int}, pmfSeq::Vector{pmf.Pmf{Int}})::pmf.Pmf{Int}

	Make a mixture of distributions.

	---
	args:

		pmfDist: probs of getting a dist in pmfSeq (names and priors)
		pmfSeq: pmfDists and their probs (priors), names betw seqs should overlap
	"""
function getMixture(pmfDist::Pmf.Pmf{Int}, pmfSeq::Vector{Pmf.Pmf{Int}})::Pmf.Pmf{Int}
	maxLen::Int = max([length(p.names) for p in pmfSeq]...)
	names::Vector{Int} = [p.names for p in pmfSeq if length(p.names) == maxLen][1]
	pmfsNamesAndPriors::Dict{Int, Vector{Float64}} = Dict(
		pmfDist.names[i] => padVect(s.priors, maxLen) for (i, s) in enumerate(pmfSeq))
	pmfsNamesAndPosteriors::Dict{Int, Vector{Float64}} = Dict(
		k => v .* Pmf.getPriorByName(pmfDist, k) for (k, v) in pmfsNamesAndPriors
	)
	df::Dfs.DataFrame = Dfs.DataFrame(
		Dict(string(k) => v for (k, v) in pmfsNamesAndPosteriors))
	mixProbs::Vector{Float64} = (df |> Matrix |> x -> sum(x, dims=2))[:, 1]
	return Pmf.Pmf(names, mixProbs)
end

In [None]:
mix = getMixture(pmfDice, dice)

## Summary

A `Pmf` and the corresponding `Cdf` are equivalent in the sense that they contain
the same information, so you can convert from one to the other. The primary
difference between them is performance: some operations are faster and easier
with a `Pmf`; others are faster with a `Cdf`.

In this chapter we used `Cdf` objects to compute distributions of maximums and
minimums.

## Exercises

### Exercise 1

When you generate a D&D character, instead of rolling dice, you can use the
“standard array” of attributes, which is 15, 14, 13, 12, 10, and 8. Do you think
you are better off using the standard array or (literally) rolling the dice?

Compare the distribution of the values in the standard array to the distribution
we computed for the best three out of four:

- Which distribution has higher `mean`? Use the mean method.
- Which distribution has higher standard deviation? Use the `std` method.
- The lowest value in the standard array is 8. For each attribute, what is the
probability of getting a value less than 8? If you roll the dice six times,
what’s the probability that at least one of your attributes is less than 8?
- The highest value in the standard array is 15. For each attribute, what is the
probability of getting a value greater than 15? If you roll the dice six times,
what’s the probability that at least one of your attributes is greater than 15?

To get you started, here’s a `Cdf` that represents the distribution of attributes in the standard array:

In [None]:
standard = [15,14,13,12,10,8]
cdfStandard = Pmf.getPmfFromSeq(standard) |> convertPmf2Cdf

We can compare it to the distribution of attributes you get by rolling four dice
at adding up the best three.

In [None]:
fig = Cmk.Figure()
ax1, sc1 = Cmk.scatter(fig[1, 1], 
    cdfBest3of4.names, cdfBest3of4.posteriors, color=:red, markersize=20,
    axis=(;title="Distribution of attributes",
        xlabel="Outcome", ylabel="CDF", xticks=3:18
    )
)
l1 = Cmk.lines!(fig[1, 1], 
    cdfBest3of4.names, cdfBest3of4.posteriors, color=:red, linewidth=3
)
sc2 = Cmk.scatter!(fig[1, 1], cdfStandard.names, cdfStandard.posteriors,
    color=:navy, marker=:hline, markersize=30,  
)
Cmk.axislegend(ax1,
    [[sc1, l1], sc2],
    ["Best 3 of 4", "Standard Set"],
    "Probability\ndistribution\nfunction",
    position=:lt
)
fig