## Variables Revisited

In both Algebra and programming, we deal with either constants or variables. Lets check some examples of variables:

$$y = x^2-3$$

Whenever we will have $x=0$, it will return $-3$. Similarly,

$$x = sin(\theta)$$

will always return $1$ for $\theta = \pi/2$.

Nothing special. Now check a couple of programming examples:



In [20]:
y = lambda x: 3*x*x+1

Now, regardless of the user, time, OS, Python version, your location or any other parameter, this function will always return the same value for same value of `x`.

In [21]:
y(2)

13

On the other hand, consider the variable `x`:

In [22]:
import datetime
x = datetime.datetime.now()

In [23]:
x

datetime.datetime(2023, 8, 25, 11, 3, 29, 866635)

Its value is not deterministic like the other variables above and depends on the time at which it is called (doesn't depend on the location as it shows UTC time).

There can be a number of other examples, for example:

- Current temperature
- Outcome of a coin toss or a dice roll
- Sunset time
- Time since last boundary in a cricket (or goal in a football) match
- Age (height, gender or any attribute) of the next person we will meet

Since these variables are non-deterministic (or simply random), we can't model them making a simple function. Studying/explaining them requires a dedicated field of its own.

## Random Variables

Probability is nothing but study of the events we aren't **absolutely** certain about (in other words, its study of the non-deterministic events). The word probability may scare a number of students (it still scares me at times), but this is something we do everyday, without even realizing it. For example:

- It has been a week since petrol price didn't go up, so chances are high that it will be up tonight.
- Wearing a helmet would decrease the chances of having an accident (or a fatal one), so I would prefer to wear one.
- Since previous lunar month was of 30, so there are high chances of this month being of 29 days.

We can come across 2 types of variables:

- Deterministic
- Stochastic/Random


A random variable is just a function of the experimental outcome of these random/probabilistic events.

> **Note:** A random variable will always have a real value, and usually denoted in the capital letters.

### Discrete and Continuous RVs

For example, we have a random variable:

$$X= Outcome \space of \space Coin$$

Since coin can take two values. i.e, $x=\{{0,1}\}$, so it will be a **discrete** random variable.

On the other hand, picking a value from the range of $[0,1]$ (we can pick any fractional number like 0.07, 0.53, 0.192, etc.) leads to infinite values of $x$ and hence its a **continuous** random variable.

> **Note:** Please note the difference between the $x$ and $X$ here. $X$ represents the random variable, while $x$ represents any of the value this variable can have.

### Probability Mass Function

In the above coin throw example, we have an equal probability of head or tail. We can write these probabilities for $X$ as:

$$p_X(x) = \begin{cases}1/2 &x= 0, \\
\\1/2 & x=1\end{cases}$$

This function can be used to characterize/observe the probabilities of all the outcomes for a random variable.

> **Note:** Sum of all values for a $p_X(x)$ will always be $1$. Pretty inevitable given probabilities of all the possible outcomes of an event always sum to $1$.

### Expected Value

Expected value of a random variable, $E[X]$ is defined as the sum of all the possible outcomes, multiplied by the respective probabilities:

$$E[X] = \sum _x x p_X(x)$$

In the above case, it will be $1/2 \times 0 + 1/2 \times 1 = 1/2$.

Lets consider another example of a dice roll. Assuming its a fair dice, we will have:

$$p_X(x) = \begin{cases}
1/6 &x= 1 \\
1/6 & x=2 \\
1/6 & x =3 \\
1/6 &x= 4 \\
1/6 & x=5 \\
1/6 & x =6 \\
\end{cases}$$



Its expected value will be:

$$E[X] = \sum _x x p_X(x)$$

$$= \frac{1}{6}+\frac{2}{6}+\frac{3}{6}+\frac{4}{6}+\frac{5}{6}+1$$

$$=\frac{1+2+3+4+5+6}{6} = \frac{21}{6}$$

$$=3.5$$


### Variance

Similarly, we have another important attribute of random variables, known as variance.

Its defined as:

$$Var(X) = E[(X-E(X)^2)]$$


Lets calculate it for the previous example:

$$Var(X)=\sum _x (x-E[X])^2 p_X(x)$$

$$=\frac{1}{6}\sum _x (x-3.5)^2 $$

$$=\frac{1}{6} (2\times2.5^2 + 2\times1.5^2 + 2\times0.5^2)$$

$$= 2.917$$

We can verify it:


In [24]:
sum = 0
for x in range(1,7):
  sum = sum+((x-3.5)**2)

print(sum/6)

2.9166666666666665


### Variance as Sum of Expected Values

We can rewrite Variance as combination of expected values too:

$$Var(X)=\sum _x (x-E[X])^2 p_X(x)$$

$$=\sum _x (x^2 + (E[X])^2 - 2xE[X])p_X(x)$$

$$=\sum_x x^2 p_X(x)+\sum _x (E[X])^2 p_X(x) -\sum _x2xE[X]p_X(x)$$

Since $E[X]$ (or $E[X^2]$) is independent of $x$, we can take it out of the summation.

$$=\sum_x x^2 p_X(x)+(E[X])^2 \sum _x p_X(x) -2 E[X]\sum _x xp_X(x)$$

Since sum of probabilities is $1$, and re-applying the definition of expected value:

$$=E[X^2]+(E[X])^2  -2 (E[X])^2$$

$$=E[X^2]-E[X]^2$$

## Continuous Random Variables

As mentioned above, if number of possible outcomes is infinite, it is a continuous RV. And since summation isn't possible, we replace it by integration.



### CDF/PDF

PMF is replaced by CDF for continuous variables as:

$$F_X(x) = {P}(X\leq x) = \int_{-\infty}^x f_X(x) \, dx$$

While PDF is calculated by taking the derivative:

$$f_X(x) = \frac{d}{dx} F_X(x)$$

In [25]:
import jax
import jax.numpy as jnp
import numpy as np
import pandas as pd

In [26]:
X = np.random.rand(10,10)
df6 = pd.DataFrame(X)

df6

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.662322,0.24778,0.309497,0.763998,0.776863,0.356946,0.259577,0.926105,0.943642,0.988306
1,0.913841,0.948413,0.263725,0.157475,0.884221,0.401372,0.881749,0.842141,0.180091,0.616033
2,0.963035,0.430902,0.588554,0.55437,0.20074,0.00395,0.990952,0.431816,0.207736,0.246427
3,0.310136,0.036988,0.812652,0.866788,0.878779,0.655576,0.642211,0.08357,0.69207,0.055851
4,0.167152,0.690268,0.321561,0.49216,0.619961,0.888054,0.185492,0.454043,0.941841,0.841417
5,0.484752,0.693271,0.872197,0.054192,0.483206,0.540771,0.075209,0.661416,0.929001,0.830085
6,0.667519,0.717925,0.398796,0.875562,0.643998,0.274507,0.981851,0.162394,0.18814,0.243973
7,0.330432,0.423985,0.503593,0.964605,0.422909,0.969034,0.679695,0.901093,0.286674,0.493251
8,0.335038,0.710813,0.627063,0.52915,0.614734,0.468689,0.946301,0.835914,0.176824,0.721712
9,0.52939,0.259844,0.246039,0.336727,0.962108,0.105224,0.401813,0.940434,0.415794,0.246386


In [27]:
print(round(np.mean(X),2))

print(round(np.var(X),2))

0.55
0.08


In [28]:
Y = np.random.rand(10000,10000)

print(round(np.mean(Y),2))

print(round(np.var(Y),2))

0.5
0.08


## Normal Distribution

Normal/Gaussian distribution is defined as:

$$ f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}} $$

This apparently nightmarish equation is one of the fundamental equations finding applications in a number of things from filters for blurring images to SVMs.

Irrespective of the equation above, normal distribution finds its application in a number of fields. Traffic on the road throughout the 24 hours, sum of two dices rolled together, heights of students in a class, incomes... you can model any of them and it would find a nice bell curve.

![](https://upload.wikimedia.org/wikipedia/commons/3/3a/Standard_deviation_diagram_micro.svg)

We refer to Normal distribution as $\mathcal{N}(\mu,\sigma)$, where $\mu$ is the mean and $\sigma$ is the square root of variance (also known as the Standard Deviation).

### Standard Normal Distribution

Standard Normal Distribution has variance of 1 and mean of 0 (i.e, spread around $[-0.5,+0.5]$). In other words, it can be represented as $\mathcal{N}(0,1)$.

Standard Normal Distribution has been extensively studied and finds a number of applications. For example, [Batch Normalization](https://arxiv.org/abs/1502.03167) is nothing but converting the data distribution to standard normal. Its CDF is represented as $\phi$ and is available as [a table](https://en.wikipedia.org/wiki/Standard_normal_table).

We can make any other distribution, $\mathcal{N}(\mu,\sigma)$ using standard normal as:

$$\mathcal{N}(\mu,\sigma) = \sigma X+\mu$$

Where $X$ is $\mathcal{N}(0,1)$.

In all the libraries, Normal Distribution is provided as the Standard Normal. Enough of talking, lets start working:

### In NumPy

Its syntax is:

`np.random.normal(<mean>,<var>,<dimensions>)`

In [29]:
X = np.random.normal(0,1,(100,100))
X

array([[-0.37529053,  1.95463159,  0.25520342, ..., -0.82509029,
        -1.10124242, -2.11138761],
       [ 0.00892951,  1.36967162,  0.86245275, ..., -0.69618752,
        -0.98141952, -0.75326658],
       [ 1.13484989,  0.74348579, -0.4645234 , ...,  0.30638949,
         1.37514697,  1.2531738 ],
       ...,
       [ 0.40095472,  1.4349091 , -0.41887036, ...,  0.54683223,
        -0.97306452, -0.32107951],
       [ 0.97419496,  0.17061086,  1.00972606, ...,  0.09280233,
         0.65994512,  1.90256928],
       [ 1.17085317, -0.06712288,  0.1789978 , ...,  0.45193875,
         0.47666872, -2.62714285]])

We can also check its (statistical) properties:

In [30]:
print(np.mean(X))
print(np.var(X))

-0.00021173461498912475
0.997393340414016


### In JAX

We can also implement it in JAX as:

`jax.random.normal(<random number key>,<dimensions of the matrix>)`

In order to make the random number's key, we can simply declare it as:

`jax.random.PRNGKey(<seed>)`

In [31]:
Z = jax.random.normal(jax.random.PRNGKey(0),(100,100))

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)


Lets check its properties:

In [32]:
print(jnp.mean(Z))
print(jnp.var(Z))

0.010327662
1.022222


They are almost (0,1) as we can see. We can also check other properties CDF/PDF etc from Scipy Stats.

In [33]:
import jax.scipy.stats as stats
Z_Pdf = stats.norm.pdf(Z)
df5 = pd.DataFrame(Z_Pdf)

df5

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,0.150373,0.386625,0.040435,0.382287,0.206837,0.325382,0.098942,0.216099,0.273841,0.378409,...,0.143647,0.386948,0.368169,0.361709,0.313341,0.289468,0.354952,0.379687,0.146643,0.263947
1,0.308301,0.342649,0.343162,0.187371,0.350835,0.196011,0.382772,0.344328,0.279763,0.318749,...,0.171622,0.398022,0.366110,0.179201,0.313117,0.107915,0.155022,0.232558,0.326796,0.192233
2,0.396582,0.365379,0.330414,0.330271,0.395009,0.375503,0.224348,0.346247,0.271714,0.371488,...,0.064787,0.356239,0.152272,0.354122,0.142169,0.262439,0.196124,0.385827,0.051142,0.280252
3,0.221008,0.299264,0.317785,0.398915,0.189120,0.324169,0.259959,0.133391,0.110097,0.047714,...,0.369123,0.233550,0.312136,0.267702,0.364227,0.398814,0.398860,0.368230,0.395512,0.397764
4,0.268160,0.391057,0.014478,0.346148,0.284845,0.102752,0.322483,0.398314,0.384148,0.331561,...,0.346670,0.301405,0.242761,0.345458,0.130136,0.237126,0.273768,0.082443,0.394132,0.377010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.259236,0.310335,0.305832,0.241717,0.398861,0.390511,0.362696,0.392487,0.388968,0.398247,...,0.378569,0.232027,0.395025,0.398233,0.398147,0.075706,0.398857,0.392260,0.374216,0.372041
96,0.157414,0.371091,0.100652,0.321527,0.354816,0.180333,0.309888,0.081874,0.226672,0.214478,...,0.284266,0.393407,0.103511,0.265950,0.312548,0.013244,0.051005,0.351218,0.276693,0.255068
97,0.082502,0.356899,0.225867,0.185499,0.320309,0.379213,0.328434,0.380694,0.081927,0.315630,...,0.240335,0.139198,0.391023,0.236771,0.160850,0.371936,0.387986,0.398327,0.080311,0.370431
98,0.206434,0.287175,0.394590,0.387262,0.386569,0.176613,0.383673,0.328879,0.009558,0.297351,...,0.340089,0.241375,0.135995,0.142881,0.352852,0.131493,0.207132,0.381917,0.390560,0.368089


In [34]:
print(jnp.mean(Z_Pdf))
print(jnp.var(Z_Pdf))

0.2802959
0.012561914


### Translation Properties of Normal Distribution

As mentioned above, we can make any other normal distribution as well from the standard distribution using the linear properties of Normal Distribution to get:

$$Y = \sigma X + \mu $$

Where $\sigma$ is variance and $\mu$ is mean.

For example:

In [35]:
Y = 2*X+5
df7 = pd.DataFrame(Y)

df7

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,4.249419,8.909263,5.510407,5.669047,2.833022,2.271997,7.096772,5.053940,2.347655,3.424820,...,4.525950,3.052605,5.335541,6.117855,6.733153,7.080659,4.800466,3.349819,2.797515,0.777225
1,5.017859,7.739343,6.724905,9.477735,7.427137,4.907944,6.167716,6.459439,1.746238,4.520806,...,7.956862,4.386196,5.573839,8.199242,4.537235,7.155252,7.294679,3.607625,3.037161,3.493467
2,7.269700,6.486972,4.070953,3.229985,5.893409,8.232425,-0.622005,4.006899,6.309834,3.426512,...,2.658960,5.852320,2.744379,4.517478,5.209117,3.871169,6.226866,5.612779,7.750294,7.506348
3,5.323220,2.321833,5.670238,4.238907,2.074906,4.664439,3.083395,1.990803,5.480241,4.122071,...,9.442264,5.567185,3.968422,6.532430,4.530657,6.155069,8.165006,3.436934,5.110260,3.577320
4,2.715365,5.622785,5.459821,4.081591,5.091358,5.376593,6.798765,0.867249,3.955865,4.265766,...,8.886169,3.927435,4.245460,5.000148,4.517903,5.622684,6.067121,7.646622,6.822633,2.636562
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,6.035124,7.128514,8.037067,5.950916,3.567542,8.019034,5.439658,2.164359,3.260082,3.994672,...,3.695499,6.663088,3.349434,3.596808,6.058551,3.634740,6.751325,8.050938,4.810546,2.780790
96,3.227817,6.507091,3.483558,3.354654,5.901924,1.595084,5.169292,4.312330,5.022827,4.397750,...,0.164087,7.226954,5.756932,3.093642,3.427934,4.155643,5.534247,5.759712,6.114005,5.333700
97,5.801909,7.869818,4.162259,4.108160,-0.948522,6.513542,4.970445,3.336809,5.723129,1.997632,...,7.113189,3.135776,3.326022,4.360317,7.791597,3.514390,7.724584,6.093664,3.053871,4.357841
98,6.948390,5.341222,7.019452,5.744390,3.858426,6.534206,5.074567,3.509284,5.022102,-0.059305,...,5.375569,4.228264,5.559098,3.742513,7.050334,4.680031,4.161560,5.185605,6.319890,8.805139


In [36]:
print(np.mean(Y))
print(np.var(Y))

4.999576530770022
3.989573361656064


## Uniform Distribution

While Normal distribution can model a number of real-world problems, there are some scenarios where we need some other distributions as well. For example, we want to keep things simple and want evenly spaced numbers between $a$ and $b$. In such a scenarios, we can use Uniform distribution.

Its PDF is:

$$
f(x) = \begin{cases}
  \frac{1}{b-a} & \text{for } a \le x \le b, \\[8pt]
  0 & \text{else }
  \end{cases}
$$

Its represented as $\mathcal U (a,b)$ and its mean and variance are defined as:

$$\sigma(X) = \frac{a+b}{2}$$

$$Var(X) = \frac{a+b}{12}$$

>**Note:** Apparently, Uniform distribution sounds quite similar to the `linspace()` function in NumPy/JAX and yes it is. The only difference is that `linspace()` works with the integers only, while Uniform distribution can take any real value.

### JAX Implementation

Its also provided in all numerical computing libraries. We can sample from Uniform distribution as:

`

In [37]:
Z2 = jax.random.uniform(jax.random.PRNGKey(12),(1,100))
Z2

Array([[0.135782  , 0.8054502 , 0.34972322, 0.31671536, 0.4659462 ,
        0.1358118 , 0.28977072, 0.6001446 , 0.5793271 , 0.50028276,
        0.07537627, 0.16775274, 0.70918393, 0.64529526, 0.61780894,
        0.47586024, 0.29939163, 0.5360943 , 0.12123454, 0.19173431,
        0.24298811, 0.5144229 , 0.9119023 , 0.52567923, 0.8117422 ,
        0.06818879, 0.97613776, 0.8851491 , 0.19259703, 0.5278493 ,
        0.8302579 , 0.83680856, 0.3619182 , 0.741122  , 0.47197282,
        0.5574049 , 0.11807811, 0.7000779 , 0.75200284, 0.43683994,
        0.5089122 , 0.05460227, 0.54496515, 0.3569013 , 0.03997386,
        0.59746957, 0.3002249 , 0.20396924, 0.685223  , 0.66098833,
        0.28954458, 0.20489669, 0.8871467 , 0.08411002, 0.41029394,
        0.43963313, 0.9699341 , 0.5563878 , 0.15509117, 0.9355521 ,
        0.13684642, 0.5180836 , 0.09811914, 0.82724285, 0.12814164,
        0.77433825, 0.36741984, 0.27361977, 0.14178288, 0.6273705 ,
        0.96565807, 0.27617788, 0.40702486, 0.72

## Other Distributions

As many of you may have noticed, we can switch across distributions in `jax.random`, so do we need to cover all these distributions? 

Luckily, the answer is no. If you know these distributions, they are more than good enough for you:

- Gaussian/Normal
- Uniform
- Exponential
- Rayleigh