# Lesson 4: working with Distributions

In [1]:
import marmote.core as marmotecore
import marmote.markovchain as mmc
import numpy as np

Distributions are everywhere in `Marmote`: as inputs and as outputs of many procedures.

The complete hierarchy of distributions is currently:

* `Distribution`
  * `DiscreteDistribution`
    * `DiracDistribution`
    * `BernoulliDistribution`
    * `UniformDiscreteDistribution`
    * `PoissonDistribution`
    * `ShiftedGeometricDistribution`
      * `GeometricDistribtion`
    * `PhaseTypeDiscreteDistribution`
  * `GammaDistribution`
    * `ErlangDistribution`
      * `ExponentialDistribution`
  * `GaussianDistribution`
  * `UniformDistribution`
  * `PhaseTypeDistribution`

## Common Features

### Statistical information

`Distribution` objects have all usual statistical methods, plus Laplace transform ones useful in some stochastic modeling.

The following code is self-explanatory.

In [2]:
udis = marmotecore.UniformDistribution( 4, 10 )
print( udis.className() )
print( udis.Mean() )
print( udis.Rate() )         ## Rate is the inverse of Mean
print( udis.Variance() )
print( udis.Cdf(6.0) )
print( udis.Ccdf(6.0) )
print( udis.Laplace(0.01) )  ## Laplace transform (only at real values)
print( udis.DLaplace(0.01) ) ## DLaplace computes the derivative of the Laplace transform

UniformDistribution
7.0
0.14285714285714285
3.0
0.3333333333333333
0.6666666666666667
0.9325336852727276
-6.4997614647675315


In [3]:
uddis = marmotecore.UniformDiscreteDistribution( 4, 10 )
print( uddis.className() )
print( uddis.Mean() )
print( uddis.Rate() )
print( uddis.Variance() )
print( uddis.Cdf(6.0) )
print( uddis.Ccdf(6.0) )
print( uddis.Laplace(0.01) )
print( uddis.DLaplace(0.01) )

UniformDiscreteDistribution
7.0
0.14285714285714285
4.0
0.2857142857142857
0.7142857142857143
0.9325803095481554
-6.490762062693861


In [4]:
edis = marmotecore.ExponentialDistribution( 4.0 )
print( edis.Mean() )
print( edis.Rate() )
print( edis.Variance() )
print( edis.Cdf(6.0) )
print( edis.Ccdf(6.0) )
print( edis.Laplace(0.0) )
print( edis.DLaplace(0.0) )

4.0
0.25
16.0
0.7768698398515702
0.2231301601484298
1.0
-4.0


### Some structural information

Some properties can be tested (mostly useful for `Marmote` developpers).

In [5]:
print( edis.HasProperty("integerValued") )
print( edis.HasProperty("continuous"))
print( uddis.HasProperty("integerValued") )
print( uddis.HasProperty("compactSupport") )

False
True
True
True


### Sampling

Distributions can be sampled.

In [6]:
for i in range(4):
    print( edis.Sample() )
print()
for i in range(4):
    print( udis.Sample() )
print()
for i in range(4):
    print( uddis.Sample() )

6.7478234724747015
0.2836608852383009
0.21473993099798133
2.8964953963664555

5.923218618268661
4.926560053597672
8.193176136172351
4.719703264002601

7.0
8.0
9.0
8.0


## Some distributions can be manipulated

Rescaling is possible for some families of distributions.

In [7]:
print( udis )
print( udis.Rescale( 0.5 ) )

Uniform distribution on [4,10]
Uniform distribution on [2,5]


In [8]:
print( edis )
print( edis.Rescale( 5.0 ) )

Exponential distribution with mean 4
Exponential distribution with mean 20


In [9]:
didis = marmotecore.DiscreteDistribution( [ 3.4, 4.5, 6.7, 8.9 ], [ 0.1, 0.2, 0.3, 0.4 ] )
print( didis )
print( didis.Rescale(10.0) )

Discrete distribution values { 3.4 4.5 6.7 8.9 } probas {      0.1      0.2      0.3      0.4 }
Discrete distribution values { 34 45 67 89 } probas {      0.1      0.2      0.3      0.4 }


But rescaling does not work for all distributions...

In [10]:
print( uddis )
try:
    print( uddis.Rescale( 0.5 ) )
except:
    pass

Discrete uniform distribution on [4..10]
Discrete uniform distribution on [4..10]




## Comparison of distributions

The distance between some of the classes of distributions can be computed.
Available distances are:

* L1 distance
* L2 distance
* L-infinity distance
* Total variation distance

In [11]:
d1 = marmotecore.UniformDiscreteDistribution( 0, 19 )
d2 = marmotecore.GeometricDistribution( 0.5 )
d3 = marmotecore.UniformDiscreteDistribution( 0, 24 )
d4 = marmotecore.GeometricDistribution( 0.55 )

In [12]:
print( "L1 = ", marmotecore.Distribution.DistanceL1( d1, d2 ) )
print( "L2 = ", marmotecore.Distribution.DistanceL2( d1, d2 ) )
print( "Linf = ", marmotecore.Distribution.DistanceLInfinity( d1, d2 ) )
print( "TV = ", marmotecore.Distribution.DistanceTV( d1, d2 ) )

L1 =  1.4750019073486333
L2 =  0.532290737004473
Linf =  0.45
TV =  0.7375009536743167


In [13]:
print( "Computable L-infinity distance:", marmotecore.Distribution.DistanceLInfinity( d1, d3 ) )
try:
    print( "Not computable distance:", marmotecore.Distribution.DistanceLInfinity( d2, d4 ) )
except:
    pass

Computable L-infinity distance: 0.04
Not computable distance: 0.0




## Markov Chain operations which return distributions

### State distributions in Markov Chains

Example: with a 4-state continuous birth-death Markov Chain

In [14]:
four = mmc.Homogeneous1DBirthDeath( 4, 3.0, 1.0 )

Distributions of one-step transitions

In [15]:
print( four.generator().TransDistrib(0) )
print( four.generator().TransDistrib(1) )
print( four.generator().TransDistrib(2) )
print( four.generator().TransDistrib(3) )

Discrete distribution values { 1 } probas {        1 }
Discrete distribution values { 0 2 } probas {     0.25     0.75 }
Discrete distribution values { 1 3 } probas {     0.25     0.75 }
Discrete distribution values { 2 } probas {        1 }


Transient and stationary distributions

In [16]:
print( four.TransientDistribution(4) )
print( four.StationaryDistribution() )

Discrete distribution values { 0  1  2  3  } probas {  0.02598  0.07642   0.2255   0.6721 }
Discrete distribution values { 0  1  2  3  } probas {    0.025    0.075    0.225    0.675 }


Empirical state distributions through simulation

In [17]:
simres = four.SimulateChain( 20, True, False, False, False )
print( simres.Distribution() )

Discrete distribution values { 0  1  2  3  } probas {  0.06264    0.134   0.2516   0.5518 }


### Hitting time distributions

For some Markov chains, hitting time distributions are avaliable.

More on these special Markov chains in Lesson5.

In [18]:
two = mmc.TwoStateContinuous( 5.0, 1.0 )

In [19]:
hitset = np.array( [ 0, 1 ], dtype=bool )
hd = two.HittingTimeDistribution( hitset )

In [20]:
print( hd[0] )
print( hd[1] )

Exponential distribution with mean 0.2
Dirac distribution at 0


In [21]:
f81 = mmc.Felsenstein81( [ 0.1, 0.2, 0.3, 0.4 ], 1.0 )

In [22]:
hitset = np.array( [ False, False, True, False ], dtype=bool )
hd = f81.HittingTimeDistribution( hitset )
print( hd[0] )
print( hd[1] )
print( hd[2] )
print( hd[3] )

Exponential distribution with mean 3.33333
Exponential distribution with mean 3.33333
Dirac distribution at 0
Exponential distribution with mean 3.33333
