# Mixture Modeling

finding separate distributions in combined data. This algorihtm is called Expectation Maximization (EM).

##### First, create some data in a single vector that in reality comes from two different distributions. The parameter true μ and true σ² of these distributions are in practice unknown. We only have the data x.

In [1]:
true_μ = [-0.6 0.7]
true_σ² = [0.5^2 0.2^2]
x=[true_μ[1]+randn(1000,1)*sqrt(true_σ²[1]); true_μ[2]+randn(1000,1)*sqrt(true_σ²[2])];

##### Next, Initialize parameters for some distributions. These parameters will be adjusted to fit the true distribution parameters. Since we are going to model a joint distribution, we need a relative size (a for area) of each distribution aswell.

In [2]:
μ = [-0.01 0.01]
σ² = [0.2^2 0.2^2]
a = [0.5 0.5];

##### These are the functions we need for the EM algorihtm. The probability density function, a function for calculating the likelihood of the data under the model, and a function for estimating better parameters. And lastly, the EM algorithm itself.

In [3]:
function pdf(x, μ, σ²)
    #
    return (σ²*2*π)^(-0.5) * exp(-((x-μ)^2)/(2*σ²))
end

function likelihood(x, a, μ, σ²)
    p = [a[c]*[pdf(x[i], μ[c], σ²[c]) for i=1:length(x)] for c=1:length(a)]
    r = hcat(p...)
    r = r./sum(r,2)
    return r
end

function dist_estimate(x, r)
    ac = mean(r,1)
    μc = sum(r.*x,1) ./ sum(r,1)
    σ²c = sum(r.*(x.-μc).*(x.-μc),1) ./ sum(r,1)
    return ac, μc, σ²c
end

function EM(a, μ, σ², x, Nsteps)
    for n=1:Nsteps
        r = likelihood(x, a, μ, σ²) # (E)xpectation
        a, μ, σ² = dist_estimate(x, r) # (M)aximize
    end
    return a, μ, σ²
end

EM (generic function with 1 method)

##### Now we can adjust the bad distribution parameters (that we already initialized) to fit the true (unknown) distribution parameters of the data x

In [4]:
a, μ, σ² = EM(a, μ, σ², x, 500)

println("The algorihtm estimated these parameters")
println("μ = ",round.(μ,3))
println("σ² = ",round.(σ²,3))

println("\nThe true (unknown to the algorihtm) parameters were")
println("true_μ = ",round.(true_μ,3))
println("true_σ² = ",round.(true_σ²,3))

The algorihtm estimated these parameters
μ = [-0.599 0.699]
σ² = [0.278 0.039]

The true (unknown to the algorihtm) parameters were
true_μ = [-0.6 0.7]
true_σ² = [0.25 0.04]


##### Its straightforward to estimate how likely a data point is to come from either distribution

In [5]:
function print_likelihood(x, a, μ, σ²)
    r = likelihood(x, a, μ, σ²)
    println("likelihood that the value ",x," belongs to")
    println("cluster 1: ",round.(r[1]*100,1),"%")
    println("cluster 2: ",round.(r[2]*100,1),"%\n")
end

print_likelihood(0.5, a, μ, σ²)
print_likelihood(0.2, a, μ, σ²)
print_likelihood(-1.0, a, μ, σ²)

likelihood that the value 0.5 belongs to
cluster 1: 6.9%
cluster 2: 93.1%

likelihood that the value 0.2 belongs to
cluster 1: 75.3%
cluster 2: 24.7%

likelihood that the value -1.0 belongs to
cluster 1: 100.0%
cluster 2: 0.0%



## What does EM do? (visualization)

In [6]:
using Plots; gr() #plotting backend

#reset parameters for visualization
μ = [-0.01 0.01]
σ² = [0.2^2 0.2^2]
a = [0.5 0.5]

function visualize_EM(a, μ, σ², x, Nsteps)
    @gif for n=1:Nsteps
        r = likelihood(x, a, μ, σ²) # (E)xpectation
        
        # adjust the parameters slowly for purpose of visualization
        ac, μc, σ²c = dist_estimate(x, r)
        a = 0.95*a + 0.05*ac
        μ = 0.95*μ + 0.05*μc
        σ² = 0.95*σ² + 0.05*σ²c
        
        # visualize
        plotx=collect(-2.5:0.01:2.5)
        dist1 = a[1]*[pdf(plotx[i], μ[1], σ²[1]) for i=1:length(plotx)];
        dist2 = a[2]*[pdf(plotx[i], μ[2], σ²[2]) for i=1:length(plotx)];
        plot(plotx, [dist1 dist2], ylim=(0,1), xlim=(-3,3), title="mixture modeling of 2 gaussians", leg=false)
    end every 1
    return a, μ, σ²
end

a, μ, σ² = visualize_EM(a, μ, σ², x, 200)

<img src="tmp.gif">