## Probability

In the `scipy` stack we have two means to perform probability: a symbolic setting, and a numerical setting.  In this brief section we are going to compare both with a sequence of examples.

For the symbolic treatment of random variables, we employ the module `sympy.stats`, while for the numerical treatment, we use the module `scipy.stats`.  In both cases, the goal is the same: the instantiation of any random variable, and the following three kinds of operations on them:

* Description of the probability distribution of a random variable with numbers (parameters).
* Description of a random variable in terms of functions
* Computation of associated probabilities

Let us observe several situations through the scope of the two different settings:

### Symbolic Setting

Let us start with discrete random variables.  For instance, let us consider several random variables used to describe the process of rolling three 6-sided dice, one 100-sided dice, and the possible outcomes.

In [1]:
from sympy import var
from sympy.stats import Die, sample_iter, P, variance, std, E, moment, cdf, density, Exponential, skewness

D6_1, D6_2, D6_3 = Die('D6_1', 6), Die('D6_2', 6), Die('D6_3', 6)
D100 = Die('D100', 100)
X = D6_1 + D6_2 + D6_3 + D100

We run a simulation, where we cast those four dice 20 times, and collect the sum of each throw:

In [2]:
for item in sample_iter(X, numsamples=20):
    print item,

17 58 62 96 83 28 35 62 7 79 73 91 63 85 55 61 97 13 100 95


Let us illustrate how easily we may compute probabilities associated with these variables.  For instance, to calculate the probability that the sum of the three 6-sided dice amount to a smaller number than the throw of the 100-sided dice can be obtained as follows:

In [3]:
P(D6_1 + D6_2 + D6_3 < D100)

179/200

Conditional probabilities are also realizable:  What is the probability of obtaining at least a 10 throwing two 6-sided dice, if the first one shows a 5?

In [4]:
from sympy import Eq     #  Do NOT use "==" with symbolic objects!

P(D6_1 + D6_2 > 9, Eq(D6_1, 5))

1/3

The computation of parameters of the associated probability distributions is also very simple.  In the following session we obtain the variance, standard deviation, and expected value of `X`, together with some other higher-order moments of this variable around zero.

In [5]:
print variance(X), std(X), E(X)

842 sqrt(842) 61


In [6]:
for n in range(2,10):
    print "mu_{0} = {1}".format(n, moment(X, n, 0))

mu_2 = 4563
mu_3 = 381067
mu_4 = 339378593/10
mu_5 = 6300603685/2
mu_6 = 1805931466069/6
mu_7 = 176259875749813/6
mu_8 = 29146927913035853/10
mu_9 = 586011570997109973/2


We can easily compute the probability mass function and cumulative density function too.


In [7]:
cdf(X) 

{4: 1/21600,
 5: 1/4320,
 6: 1/1440,
 7: 7/4320,
 8: 7/2160,
 9: 7/1200,
 10: 23/2400,
 11: 7/480,
 12: 1/48,
 13: 61/2160,
 14: 791/21600,
 15: 329/7200,
 16: 1193/21600,
 17: 281/4320,
 18: 3/40,
 19: 17/200,
 20: 19/200,
 21: 21/200,
 22: 23/200,
 23: 1/8,
 24: 27/200,
 25: 29/200,
 26: 31/200,
 27: 33/200,
 28: 7/40,
 29: 37/200,
 30: 39/200,
 31: 41/200,
 32: 43/200,
 33: 9/40,
 34: 47/200,
 35: 49/200,
 36: 51/200,
 37: 53/200,
 38: 11/40,
 39: 57/200,
 40: 59/200,
 41: 61/200,
 42: 63/200,
 43: 13/40,
 44: 67/200,
 45: 69/200,
 46: 71/200,
 47: 73/200,
 48: 3/8,
 49: 77/200,
 50: 79/200,
 51: 81/200,
 52: 83/200,
 53: 17/40,
 54: 87/200,
 55: 89/200,
 56: 91/200,
 57: 93/200,
 58: 19/40,
 59: 97/200,
 60: 99/200,
 61: 101/200,
 62: 103/200,
 63: 21/40,
 64: 107/200,
 65: 109/200,
 66: 111/200,
 67: 113/200,
 68: 23/40,
 69: 117/200,
 70: 119/200,
 71: 121/200,
 72: 123/200,
 73: 5/8,
 74: 127/200,
 75: 129/200,
 76: 131/200,
 77: 133/200,
 78: 27/40,
 79: 137/200,
 80: 139/200,


In [8]:
density(X)

{4: 1/21600,
 5: 1/5400,
 6: 1/2160,
 7: 1/1080,
 8: 7/4320,
 9: 7/2700,
 10: 3/800,
 11: 1/200,
 12: 1/160,
 13: 1/135,
 14: 181/21600,
 15: 49/5400,
 16: 103/10800,
 17: 53/5400,
 18: 43/4320,
 19: 1/100,
 20: 1/100,
 21: 1/100,
 22: 1/100,
 23: 1/100,
 24: 1/100,
 25: 1/100,
 26: 1/100,
 27: 1/100,
 28: 1/100,
 29: 1/100,
 30: 1/100,
 31: 1/100,
 32: 1/100,
 33: 1/100,
 34: 1/100,
 35: 1/100,
 36: 1/100,
 37: 1/100,
 38: 1/100,
 39: 1/100,
 40: 1/100,
 41: 1/100,
 42: 1/100,
 43: 1/100,
 44: 1/100,
 45: 1/100,
 46: 1/100,
 47: 1/100,
 48: 1/100,
 49: 1/100,
 50: 1/100,
 51: 1/100,
 52: 1/100,
 53: 1/100,
 54: 1/100,
 55: 1/100,
 56: 1/100,
 57: 1/100,
 58: 1/100,
 59: 1/100,
 60: 1/100,
 61: 1/100,
 62: 1/100,
 63: 1/100,
 64: 1/100,
 65: 1/100,
 66: 1/100,
 67: 1/100,
 68: 1/100,
 69: 1/100,
 70: 1/100,
 71: 1/100,
 72: 1/100,
 73: 1/100,
 74: 1/100,
 75: 1/100,
 76: 1/100,
 77: 1/100,
 78: 1/100,
 79: 1/100,
 80: 1/100,
 81: 1/100,
 82: 1/100,
 83: 1/100,
 84: 1/100,
 85: 1/100,
 

Let us move onto continuous random variables.  This short session computes the density and cumulative distribution function, as well as several parameters, of a generic exponential random variable:

In [10]:
var('mu', positive=True)
var('t')
X = Exponential('X', mu)

density(X)(t)

mu*exp(-mu*t)

In [11]:
cdf(X)(t)

Piecewise((1 - exp(-mu*t), t >= 0), (0, True))

In [12]:
print variance(X), skewness(X)

mu**(-2) 2


In [13]:
[moment(X, n, 0) for n in range(1,10)]

[1/mu,
 2/mu**2,
 6/mu**3,
 24/mu**4,
 120/mu**5,
 720/mu**6,
 5040/mu**7,
 40320/mu**8,
 362880/mu**9]

> For a complete description of the module `sympy.stats` with an exhaustive enumeration of all its implemented random variables, a good reference is the official documentation online at docs.sympy.org/dev/modules/stats.html