# Bayesian Inference in FSharp


## Introduction

For my F# Advent submission this year, I decided to implement an extremely simple Bayesian Statistical Framework to be used for inference. The purpose of statistical inference is to infer characteristics of a population via samples to be used for further tasks such as making predictions. The "Bayesian" prefix refers to making use of the Bayes Theorem to conduct inferential statistics.

![image.png](./Images/StatisticalInference.png)

I like to think of Statistical Inference via a pasta boiling analogy: hypothetically, assume you lose the cooking instructions of dried pasta you are trying to cook. You start boiling the water and don't know when the pasta is al-dente. One way to check is to careful remove a single piece and test if it is cooked to your liking. Through this process, you'll know right away if the pasta needs to cook more or is ready to be sauce'd up. Similarly, inference deals with trying to figure out characteristics (__is the pasta ready?__) of a population (__all the pasta__) via a sample (__a single piece of pasta__).

![image.png](./Images/pasta.jpg)

F#, as a language, didn't fail to deliver an awesome development experience! The particular aspects of the language that made it easy to develop a Bayesian Statistical Framework are [Pattern Matching](https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching), [Functional Domain Drive Design](

## Motivation

I recently received a Masters in Data Science but discovered that none of my courses dove deep into Bayesian Statistics. Therefore, motivated by a penchant desire to fully understand the differences between frequentist approaches and Bayesian ones to eventually gain an understanding of Bayesian Neural Networks was the impetus behind trying to understand the statistical theory. 

While reading  ["A First Course in Bayesian Statistical Methods"](https://www.amazon.com/Bayesian-Statistical-Methods-Springer-Statistics/dp/1441928286/ref=sr_1_2?crid=3D7COZ06U9AU1&dchild=1&keywords=a+first+course+in+bayesian+statistical+methods&qid=1608705310&sprefix=A+first+course+in+baye%2Caps%2C239&sr=8-2) by Peter Hoff (highly recommend this book to those with a strong mathematical background), I discovered [pymc3](https://docs.pymc.io/), a Probabilistic Programming library in Python. Pymc3, in my opinion, has some great abstractions that I wanted to implement myself in a functional-first setting. 

The best way to learn a new statistical algorithm is my implementing it and the best way to learn something well is to teach it to others and hence, this submission is a result of that ideology. 

## Bayes Theorem

The crux of Bayesian Inference lies in making use of the Bayes Theorem whose formula is the following:

$p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)}$

| Term           | Formula | Definition  |
| :------------- | :----------: | -----------: |
|  Prior         |  $p(\theta)$  | Quantification of t   |  |
| Likelihood     | $p(y \mid \theta)$ | In light of observations,  | | |
| Evidence       | $p(y)$ | The normalizing constant so that the probability is between 0 and 1. |
| Posterior      | $p(\theta \mid y)$ |  | | |

The premise of the interpretation of Bayes Theorem that resonates most with the scientific method is that you first come up with a hypothesis (prior), you figure out the likelihood that the hypothesis is valid based on available evidence and then quantify the validity of the hypothesis by highlighting the level of associated uncertainty. 


## Bayesian Inference in Practice

In practice, the denominator or the evidence i.e. the normalization constant of the Bayes theorem is a pain to calculate. For example, for continuous variables, Bayes theorem has the following form:

$p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{\int_{\Theta}p(y \mid \tilde{\theta}) p(\tilde{\theta}) \; d\tilde{\theta}}$

That integral is UGLY and sometimes computational infeasible and so, approximation techniques must be used to get to generate the posterior distribution : enter __Markov-Chain Monte Carlo (MCMC)__ methods.

### Markov-Chain Monte-Carlo Methods

Monte Carlo  = Random Number Generation; you 
Markov-Chain = The future only depends on the present, not the past. Linked list of approximations.

## Goals

The goals of this submission are to develop the following:

1. __A Bayesian Domain Specific Language (DSL)__ that a statistician with no programming experience can understand and use to specify a model and its parameters and code that parses the model.
2. A way to represent the Bayesian DSL as a Bayesian Network addressing the simplest case i.e. one prior and one likelihood.
3. An algorithm that makes use of the Bayesian DSL representation to approximate the posterior distribution.

![image](./Images/Plan.png)

## Getting Setup

The inferential logic will require minimal dependencies. Here are the dependencies and their respective uses:

| Dependency        | Use          |
| :-----------------| :----------: | 
| __XPlot.Plotly__  | Used for charting the histogram of the distributions.
| __MathNet.Numerics__     | Used for the out-of-the-box statistical distributions. |
| __Newtonsoft.Json__      | Parsing the parameters of the Bayesian DSL. |


In [12]:
#!fsharp

// Dependencies
#r "nuget: XPlot.Plotly"
#r "nuget: MathNet.Numerics"
#r "nuget: Newtonsoft.Json"

// Imports 
open XPlot.Plotly
open MathNet.Numerics
open Newtonsoft.Json

## A Bayesian Domain Specific Language (DSL)  and Its Parser

The goal of this section is to develop "A Bayesian Domain Specific Language (DSL) that a statistician with no programming experience can understand and use to specify a model and its parameters and code that parses the model.". 

Drawing inspiration from [Stan](https://mc-stan.org) and pymc3 tutorials such as this [one](https://docs.pymc.io/notebooks/stochastic_volatility.html#Stochastic-Volatility-model), I wanted this DSL to be extremely simple but complete and therefore, the components I found that describe each random variable of the Bayesian Model are:

1. The Name of the Random Variable 
2. Conditionals of the Random Variables i.e. what other variables are given to be true to completely specify the random variable
3. The distribution associated with the random variable.
4. The parameters as variables of the distribution e.g. for a Normal Distribution, the parameters will be $\mu$ and $\sigma$, the mean and the variance respectively. 
5. The observed data for the Random Variable - if the random variable has 
6. A map of the parameters to constants to decouple the model from the values associated with the model.

### Format

```
Random Variable Name [|Comma-separated Conditionals] ~ Distribution(Comma-separated Parameters without spaces) [: observed] 
```

The details enclosed in ``[]`` imply optionality.

### Example

```
θ ~ Gamma(a,b)
Y|θ ~ Poisson(θ) : observed1
Z|Y,θ ~ Beta(θ, Y) : observed2
```

### The Domain

The domain or the types associated with the representation of the parsed random variable representing a single line in the specified model. 

In [19]:
#!fsharp

type ParsedRandomVariable = 
    { Name             : string; 
      Conditionals     : string list; 
      Distribution     : string; 
      Parameters       : string list; 
      Observed         : string option }
type ParsedBayesianModel = ParsedRandomVariable list

### Parsing Logic

The parsing logic for each line in the user specified model is as follows:
1. Split the line by spaces.
2. Extract the Name and the Conditionals from the first part of the model split on the tilde (~). ``Random Variable Name | [Conditionals]``.
3. From the second part of the model split on the tilde, get the name of the distribution and its associated parameters along with the optionally available observed variable.

F#'s awesome pattern matching was a lifesaver here not only for it's ease of use but it's automatic ability to specify the failure cases. 

In [24]:
#!fsharp
open System

// Format: RVName [|Conditionals] ~ Distribution(Parameters) [: observed] 
// [] -> optional
// NOTE: There can't be any spaces in the distribution parameters.
let parseLineOfModel (lineInModel : string) : ParsedRandomVariable = 
 
    // Helper fn to split the string based on a variety of type of delimiters.
    // Resultant type is a list of strings to feed in for the pattern matching.
    let splitToList (toSplit : string) (delimiters : obj) : string list = 
        let split = 
            match delimiters with
            | :? string        as s   -> toSplit.Split(s, StringSplitOptions.RemoveEmptyEntries) 
            | :? array<string> as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) 
            | :? array<char>   as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) 
            | _ -> failwithf "Splitting based on delimiters failed as it is neither a string nor an array of strings: Of Type: %A - %A" (delimiters.GetType()) toSplit
        
        Array.toList split

    match splitToList lineInModel " " with
    | nameAndConditionals :: "~" :: distributionParametersObserved ->
        // Get the name and conditionals.
        let splitNameAndConditionals = splitToList nameAndConditionals "|"
        let name = splitNameAndConditionals.[0]
        let conditionals = 
            match splitNameAndConditionals with 
            | name :: conditionals -> 
                if conditionals.Length > 0 then splitToList conditionals.[0] ","
                else []
            | _ -> failwithf "Pattern not found for RV Name and Conditionals - the format is: RVName|Condtionals: %A" splitNameAndConditionals

        let extractAndGetParameters (distributionNameAndParameters : string) : string * string list = 
            let splitDistributionAndParameters = splitToList distributionNameAndParameters [| "("; ")" |]
            (splitDistributionAndParameters.[0], splitToList splitDistributionAndParameters.[1] ",")
            
        match distributionParametersObserved with 

        // Case: Without Observations. Example: θ ~ Gamma(a,b)
        | distributionNameAndParameters when distributionNameAndParameters.Length = 1 ->
            let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters.[0]
            { Name             = name; 
              Conditionals     = conditionals; 
              Distribution     = (fst extractedDistributionAndParameters).ToLower();
              Observed         = None; 
              Parameters       = snd extractedDistributionAndParameters; }

        // Case: With Observations. Example: Y|θ ~ Poisson(θ) : observed
        | distributionNameAndParameters :: ":" :: observed ->
            let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters
            { Name             = name;
              Conditionals     = conditionals; 
              Distribution     = (fst extractedDistributionAndParameters).ToLower();
              Observed         = Some observed.Head; // Only 1 observed list permitted.
              Parameters       = snd extractedDistributionAndParameters; } 

        // Case: Error.
        | _ -> failwithf "Pattern not found for the model while parsing the distribution, parameters and optionally, the observed variables: %A" distributionParametersObserved 

    | _ -> failwithf "Pattern not found for the following line in the model - please check the syntax: %A" lineInModel

let parseModel (model : string) : ParsedBayesianModel = 
    model.Split('\n') 
    |> Array.map(parseLineOfModel)
    |> Array.toList

let printParsedModel (model : string) : unit = 
    let parsedModel = parseModel model
    printfn "Model: %A is represented as %A" model parsedModel

### Examples of Parsing a User-Specified DSL

In [25]:
#!fsharp
// Print out our simple 1-Parameter Model.
let model1 = @"θ ~ Gamma(a,b)
              Y|θ ~ Poisson(θ) : observed"
printParsedModel(model1)

// This model doesn't make sense but adding to test multiple conditionals.
let model2  = @"θ ~ Beta(unit,unit)
               gamma ~ Gamma(a,b)
               Y|θ,gammma ~ Binomial(n,θ) : observed"
printParsedModel(model2)

Model: "θ ~ Gamma(a,b)
              Y|θ ~ Poisson(θ) : observed" is represented as [{ Name = "θ"
   Conditionals = []
   Distribution = "gamma"
   Parameters = ["a"; "b"]
   Observed = None }; { Name = "Y"
                        Conditionals = ["θ"]
                        Distribution = "poisson"
                        Parameters = ["θ"]
                        Observed = Some "observed" }]
Model: "θ ~ Beta(unit,unit)
               gamma ~ Gamma(a,b)
               Y|θ,gammma ~ Binomial(n,θ) : observed" is represented as [{ Name = "θ"
   Conditionals = []
   Distribution = "beta"
   Parameters = ["unit"; "unit"]
   Observed = None }; { Name = "gamma"
                        Conditionals = []
                        Distribution = "gamma"
                        Parameters = ["a"; "b"]
                        Observed = None }; { Name = "Y"
                                             Conditionals = ["θ"; "gammma"]
                                             Distribution = "binomi

### Specifying the Parameters

The idea is to decouple the parameters separate from the model and this is done by saving the details as a JSON string.

In [33]:
open System
open System.Collections.Generic

open MathNet.Numerics.Distributions

open Newtonsoft.Json

type Observed = float list

type ParameterList = 
    { Observed : float list; Parameters : Dictionary<string, float> } 

let deserializeParameters (paramsAsString : string) : ParameterList = 
    JsonConvert.DeserializeObject<ParameterList>(paramsAsString)

### Examples of the Deserialization of the Parameters

In [48]:
// Parameter List 1
let parameters1 = "{Parameters : {μ0 : 0, σ0 : 1, μ : 5, σ : 2, λ : 4}, observed : [4.2,0.235,2.11]}"
let deserializedParameters1 = deserializeParameters parameters1
printfn "Deserialized Parameters 1: %A" (deserializedParameters1)

// Parameter List 2
let parameters2 = "{Parameters: {λ : 2}}"
let deserializedParameters2 = deserializeParameters parameters2
// Applying the Deserialized Parameters to Sample from a Distribution
let exp = Exponential deserializedParameters2.Parameters.["λ"] 
printfn "Sampling from the Exponential Distribution with the λ = %A parameter: %A" exp (exp.Sample())

Deserialized Parameters 1: { Observed = [4.2; 0.235; 2.11]
  Parameters = seq [[μ0, 0]; [σ0, 1]; [μ, 5]; [σ, 2]; ...] }
Sampling from the Exponential Distribution with the λ = Exponential(λ = 2) parameter: 0.009017571093


## Lessons and How I Can Improve The Implementation

## References

## Music Listened to While Working On This

1. [John Zorn and Friends](https://www.youtube.com/watch?v=c4eO2o9u1j0&ab_channel=podgoryt)
2. [Mike Patton and Mondo Cane](https://www.youtube.com/watch?v=iDOl5q7UXfg&ab_channel=FNM4EVER2)
3. [Burt Bacharach - This Guy's In Love with you](https://www.youtube.com/watch?v=2dDGnl8_Dzg&ab_channel=MusicWonders )
4. [Melvins - Lysol](https://www.youtube.com/watch?v=JtO_Awk4pqU&ab_channel=AboveDeath)
5. [The Jesus Lizard - Puss](https://www.youtube.com/watch?v=PaBJQG6A9SQ&ab_channel=spycory1)
6. [Mr. Bungle - Sudden Death](https://www.youtube.com/watch?v=-QWFV_057KM&ab_channel=IpecacRecordings)
7. [Madreblu - Certamente](https://www.youtube.com/watch?v=Ivh0FWTSJ78&ab_channel=Milano2000Records)
8. [Pato Banton - Spirits in the Material World](https://www.youtube.com/watch?v=S--kTyvm_fM&ab_channel=PatoBanton%26TheReggaeRevolution-Topic)
9. [Max Cooper - Ripple](https://www.youtube.com/watch?v=P_X1KGCgWlE&ab_channel=MaxCooper-Topic)
10. [The Sword - Cheap Sunglasses](https://www.youtube.com/watch?v=RYnTLIZkjlg&ab_channel=SwordofDoomMusic)
11. [Brother Dege - Too Old To Die Young](https://www.youtube.com/watch?v=FQNFcYvILJE&ab_channel=UltimatePowa)
12. [Mr. Bungle - Retrovertigo](https://www.youtube.com/watch?v=DRyh2cxJCp0&ab_channel=tkan)

And Mr. Bungle's self-titled first album for motivation.