# Mixture of Gaussians

This tutorial shows how to make a mixture of multivariate Gaussians and fit it to some data. It will introduce the switch block which allows variable-sized mixtures to be constructed.

This has been updated in Oct 2018 (post Infer.NET open source launch!) to use: 
* the Paket dependency manager to get the package(s) from nuget
* Paket to get the `FSharpWrapper` from github
* the revised namespace `Microsoft.ML.Probabilistic` (was `MicrosoftResearch.Infer`)

[`FSharpWrapper` tutorial examples](https://dotnet.github.io/infer/userguide/FSharp%20Wrapper.html) include:
* TwoCoins
* TruncatedGaussian
* GaussianRanges
* ClinicalTrial
* BayesPoint
* MixtureGaussians

### Source & links

http://infernet.azurewebsites.net/default.aspx    
https://dotnet.github.io/infer/userguide/Mixture%20of%20Gaussians.html

[Imperative Statement Blocks in F#](https://dotnet.github.io/infer/userguide/Imperative%20Statement%20Blocks%20in%20FSharp.html)     
[Branching on variables to create mixture models](https://dotnet.github.io/infer/userguide/Branching%20on%20variables%20to%20create%20mixture%20models.html)  

## Load & open packages

https://fsprojects.github.io/Paket/

Packages in F# can be installed via `nuget.org` using the Paket dependency manager.
To install packages from NuGet you should first load the Packet package manager:

In [46]:
#load "Paket.fsx"

You can then have Paket install the packages:

In [103]:
Paket.Package
  [   // "Infer.NET" (pre Oct 2018 namespace)
      "Microsoft.ML.Probabilistic"
      "Microsoft.ML.Probabilistic.Compiler"
      //"Microsoft.ML.Probabilistic.FSharp"  // Not on nuget, yet?
      "NETStandard.Library" // Necessary??
  ]

In [104]:
#load "Paket.Generated.Refs.fsx"  // Do we need this?

### `FSharpWrapper`

See  https://github.com/dotnet/infer/blob/master/src/FSharpWrapper/FSharpWrapper.fs   
https://github.com/fsprojects/IfSharp/issues/146 (Thanks for help, Colin Gravill!)   
https://fsprojects.github.io/Paket/github-dependencies.html

In [49]:
Paket.GitHub ["dotnet/infer src/FSharpWrapper/FSharpWrapper.fs"]

In [13]:
// #load "/home/nbuser/IfSharp/bin/paket-files/github/dotnet/infer/src/FSharpWrapper/FSharpWrapper.fs" // Not yet!?

### Check `paket` files

Project dependencies file:

In [None]:
open System.IO
File.ReadAllText(@"../../IfSharp/bin/paket.dependencies")

The `paket install` command will analyze your dependencies and automatically generate a `paket.lock` file:

In [None]:
File.ReadAllText(@"../../IfSharp/bin/paket.lock")

### Assembly search paths and references

In [131]:
#I "/home/nbuser/IfSharp/bin/packages/Microsoft.ML.Probabilistic/lib/netstandard2.0"
#I "/home/nbuser/IfSharp/bin/packages/Microsoft.ML.Probabilistic.Compiler/lib/netstandard2.0"
#I "/home/nbuser/IfSharp/bin/packages/NETStandard.Library/build/netstandard2.0/ref"

In [133]:
#r "Microsoft.ML.Probabilistic"
#r "Microsoft.ML.Probabilistic.Compiler"
#r "netstandard"

In [126]:
//Directory.GetDirectories ("/home/nbuser/IfSharp/bin/packages/")

In [127]:
//Directory.GetFiles ("/home/nbuser/IfSharp/bin/packages/NETStandard.Library/lib/netstandard1.0")

### Import declarations for modules or namespaces 

In [52]:
open System

In [53]:
open Microsoft.ML.Probabilistic  
open Microsoft.ML.Probabilistic.Algorithms 
open Microsoft.ML.Probabilistic.Models  
open Microsoft.ML.Probabilistic.Distributions  
open Microsoft.ML.Probabilistic.Factors  
open Microsoft.ML.Probabilistic.Math 

In [54]:
#load "/home/nbuser/IfSharp/bin/paket-files/github/dotnet/infer/src/FSharpWrapper/FSharpWrapper.fs"

In [55]:
open Microsoft.ML.Probabilistic.FSharp

## Infer.NET: F# script for a mixture of 2 multivariate Gaussians  

### Data for mixture example  

In [56]:
let GenerateData nData =  // (int -> Vector[])
    let trueM1,trueP1 = Vector.FromArray[|2.0;3.0|],PositiveDefiniteMatrix(Array2D.create2D [ [3.0;0.2];[0.2;2.0] ]) 
    let trueM2,trueP2 = Vector.FromArray[|7.0;5.0|],PositiveDefiniteMatrix(Array2D.create2D [ [2.0;0.4];[0.4;4.0] ])    
    let trueVG1 = VectorGaussian.FromMeanAndPrecision(trueM1,trueP1) 
    let trueVG2 = VectorGaussian.FromMeanAndPrecision(trueM2,trueP2) 
    let truePi = 0.6 
    let trueB = new Bernoulli(truePi)  
    Rand.Restart(12347) // Restart the infer.NET random number generator  
    Array.init nData (fun j -> if trueB.Sample() then trueVG1.Sample() else trueVG2.Sample())

Test

In [57]:
GenerateData 3

[|seq [2.330928543; 2.275960704]; seq [8.154464106; 5.326416609];
  seq [2.987223143; 3.038850355]|]

### The model 

Define a range for the number of mixture components  

In [58]:
let k = Range(2)

Mixture component means  

In [112]:
let means = Variable.ArrayInit   
                k (fun k -> Variable.VectorGaussianFromMeanAndPrecision(Vector.Zero(2), PositiveDefiniteMatrix.IdentityScaledBy(2,0.01)))  

Mixture component precisions  

In [60]:
let precs = Variable.ArrayInit   
                k (fun k -> Variable.WishartFromShapeAndScale(  
                                100.0,PositiveDefiniteMatrix.IdentityScaledBy(2,0.01)))

Mixture weights  

In [61]:
let weights = Variable.Dirichlet(k,[|1.0; 1.0|])  

Number of data points

In [62]:
let n = new Range(300)  

Create (mutable) latent indicator variable for each data point  

In [63]:
let z = Variable.ArrayInit n (fun i -> Variable.Discrete(weights))

Initialise messages randomly so as to break symmetry  

In [64]:
let zinit = Array.init n.SizeAsInt (fun _ -> Discrete.PointMass(Rand.Int(k.SizeAsInt), k.SizeAsInt))  
let _ = z.InitialiseTo(Distribution.Array(zinit))

#### The mixture of Gaussians model using `Variable.SwitchExpr`

In [65]:
let data = Variable.ArrayInit n (fun i ->  
                   Variable.SwitchExpr (z.[i]) (fun zi ->  
                        Variable.VectorGaussianFromMeanAndPrecision(means.[zi], precs.[zi])))  

Binding the data  

In [66]:
data.ObservedValue <- GenerateData(n.SizeAsInt)  

The inference  

In [134]:
let ie = InferenceEngine(VariationalMessagePassing())  

In [135]:
let wPost = ie.Infer<Dirichlet>(weights)  

Compiling model...done.
Iterating: 
.........|.........|.........|.........|.........| 50


In [136]:
printfn "Estimated means for pi = (%A)" (wPost.GetMean())  
printfn "Distribution over pi = %A" wPost  

Estimated means for pi = (seq [0.4039729302; 0.5960270698])
Distribution over pi = Dirichlet(122 180)


In [137]:
let meansPost = Inference.InferVectorGaussianArray(ie, means)  
let precsPost = Inference.InferWishartArray(ie,precs)  

Compiling model...done.


In [138]:
printfn "Distribution over vector Gaussian means = %A" meansPost  

Distribution over vector Gaussian means = [|VectorGaussian(0 0, 100 0  )
                    0   100;
  VectorGaussian(0 0, 100 0  )
                    0   100|]


In [139]:
printfn "Distribution over vector Gaussian precisions = %A" precsPost  
()

Distribution over vector Gaussian precisions = [|Wishart(160.5, 0.007988  9.308e-05)[mean=1.282   0.01494]
               9.308e-05 0.008788        0.01494 1.41   ;
  Wishart(189.5, 0.007813 0.000175)[mean=1.481   0.03316]
               0.000175 0.006848       0.03316 1.298  |]


## Misc

### Switch block example

[Imperative Statement Blocks in F#](https://dotnet.github.io/infer/userguide/Imperative%20Statement%20Blocks%20in%20FSharp.html)     
[Branching on variables to create mixture models](https://dotnet.github.io/infer/userguide/Branching%20on%20variables%20to%20create%20mixture%20models.html)  (C#)

In [74]:
let mixtureSize = 2  

[Working with arrays and ranges](https://dotnet.github.io/infer/userguide/Arrays%20and%20ranges.html) (C#)

In [75]:
let k2 = Range(mixtureSize)  

[Creating variables](https://dotnet.github.io/infer/userguide/Creating%20variables.html) (C#)

In [98]:
let c:Variable<int> = Variable.Discrete(k2, [|0.5;0.5|]).Named("c")  

[Variable Arrays in F#](https://dotnet.github.io/infer/userguide/Variable%20Arrays%20in%20FSharp.html)

In [86]:
let means2:VariableArray<float> = Variable.Observed( [|1.0;2.0|], k2).Named("means2")

In [96]:
let x:Variable<float> = Variable.New<float>().Named("x")  

[Branching on variables to create mixture models](https://dotnet.github.io/infer/userguide/Branching%20on%20variables%20to%20create%20mixture%20models.html) (C#)

"The basic idea is to define a random selector variable and then branch on the value of that variable. If the selector is 0, then x has distribution Gaussian(1, 1); if the selector is 1, then x has distribution Gaussian(2,1). This example makes use of the constructs `Variable.New` and `Variable.Case`. The static method `Variable.New` is similar to `Variable.Array`, but for scalars. It creates a new random variable whose definition will be provided later using **`SetTo`**. (You may recall `SetTo` from the page on working with arrays and ranges.) `Variable.Case` is more sophisticated, because it changes the state of the modelling API. All random variables and constraints defined within the lifetime of the `Variable.Case` object are tagged to only exist conditionally on the two arguments being equal.  

You can put any modelling code you like inside a Case block, including other Case blocks, allowing you to define arbitrarily complex mixtures."

The C# syntax uses `using (Variable.Case(c,0))`, whereas F# uses `Variable.SwitchBlock`.

[Imperative Statement Blocks in F#](https://dotnet.github.io/infer/userguide/Imperative%20Statement%20Blocks%20in%20FSharp.html)     

In [100]:
Variable.SwitchBlock c (fun _ -> 
    let _ = x.SetTo(Variable.GaussianFromMeanAndVariance(means2.[c], 1.0))  
    ()  
)