# Visualizing continuous probability distributions with altair

```python
import altair as alt
import numpy as np
import pandas as pd
import scipy.stats as stats
```

```python
# Set the random seed for reproducibility
np.random.seed(42)
```

```python
# Generate 1000 random numbers from a normal distribution with mean 0 and standard deviation 1
x = np.random.normal(0, 1, 1000)
```

```python
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': x})
```

```python
# Create a histogram of normally distributed random numbers
alt.Chart(df).mark_bar().encode(
    alt.X("x", bin=alt.Bin(maxbins=100)),
    y='count()'
)
```


In [1]:
import altair as alt
import numpy as np
import pandas as pd
import scipy.stats as stats

In [2]:
# Set the random seed for reproducibility
np.random.seed(42)

In [3]:
# Generate 1000 random numbers from a normal distribution with mean 0 and standard deviation 1
x = np.random.normal(0, 1, 1000)

In [4]:
x

array([ 4.96714153e-01, -1.38264301e-01,  6.47688538e-01,  1.52302986e+00,
       -2.34153375e-01, -2.34136957e-01,  1.57921282e+00,  7.67434729e-01,
       -4.69474386e-01,  5.42560044e-01, -4.63417693e-01, -4.65729754e-01,
        2.41962272e-01, -1.91328024e+00, -1.72491783e+00, -5.62287529e-01,
       -1.01283112e+00,  3.14247333e-01, -9.08024076e-01, -1.41230370e+00,
        1.46564877e+00, -2.25776300e-01,  6.75282047e-02, -1.42474819e+00,
       -5.44382725e-01,  1.10922590e-01, -1.15099358e+00,  3.75698018e-01,
       -6.00638690e-01, -2.91693750e-01, -6.01706612e-01,  1.85227818e+00,
       -1.34972247e-02, -1.05771093e+00,  8.22544912e-01, -1.22084365e+00,
        2.08863595e-01, -1.95967012e+00, -1.32818605e+00,  1.96861236e-01,
        7.38466580e-01,  1.71368281e-01, -1.15648282e-01, -3.01103696e-01,
       -1.47852199e+00, -7.19844208e-01, -4.60638771e-01,  1.05712223e+00,
        3.43618290e-01, -1.76304016e+00,  3.24083969e-01, -3.85082280e-01,
       -6.76922000e-01,  

In [5]:
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': x})

In [6]:
# Create a histogram of normally distributed random numbers
alt.Chart(df).mark_bar().encode(
    alt.X("x", bin=alt.Bin(maxbins=100)),
    y='count()'
)

## What is a normal distribution? 

A normal distribution is a continuous probability distribution for a real-valued random variable. It is also called a Gaussian distribution. The normal distribution is a commonly encountered continuous probability distribution because it describes the distribution of many random variables, such as the heights and weights of people, the performance of students on a test, and the number of widgets produced in a factory in a day.

real-valued random variable: A random variable that can take on any value in the real numbers.

real numbers: The set of all numbers that can be represented on a number line. The real numbers include all rational and irrational numbers.

random variable: A variable whose value is determined by the outcome of a random experiment.

Python code for visualizing the heights of people in a population using a normal distribution in altair:



In [10]:
import numpy as np
import pandas as pd
import altair as alt

# define the number of samples
num_samples = 1000

# generate random data using a normal distribution
mean_height = 170  # mean height in cm
std_dev_height = 10  # standard deviation in cm

# generate data
np.random.seed(0)  # for reproducibility
heights = np.random.normal(loc=mean_height, scale=std_dev_height, size=num_samples)

# convert to pandas DataFrame
df = pd.DataFrame({'Height': heights})

# create the histogram using altair
chart = alt.Chart(df).mark_bar().encode(
    alt.X("Height:Q", bin=True),
    y='count()',
)

# display the chart
chart


## Representing probability distribution functions using mathematics 

The normal distribution is a continuous probability distribution for a real-valued random variable. The normal distribution is parameterized by two parameters: the mean $\mu$ and the standard deviation $\sigma$. The mean $\mu$ determines the location of the center of the distribution, and the standard deviation $\sigma$ determines the width of the distribution. The normal distribution is symmetric about the mean $\mu$. The normal distribution is also called a Gaussian distribution.

The probability density function (PDF) of a normal distribution is given by the following equation:

$$f(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left(\frac{x - \mu}{\sigma}\right)^2}$$

The PDF of a normal distribution is a bell-shaped curve. The PDF of a normal distribution is always positive. The PDF of a normal distribution is symmetric about the mean $\mu$. The PDF of a normal distribution has a maximum value at the mean $\mu$. The PDF of a normal distribution approaches zero as $x$ approaches positive or negative infinity.

This can be visualized using altair and python:


In [11]:
import numpy as np
import pandas as pd
import altair as alt
from scipy.stats import norm

# define the mean and standard deviation
mu = 0
sigma = 1

# create a range of x-values
x_values = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)

# calculate the corresponding y-values
y_values = norm.pdf(x_values, mu, sigma)

# create a dataframe
df = pd.DataFrame({'x': x_values, 'y': y_values})

# create the altair chart
chart = alt.Chart(df).mark_line().encode(
    x='x',
    y='y'
)

# display the chart
chart


# Discrete probability mass functions

A discrete probability mass function (PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value. The PMF of a discrete random variable $X$ is denoted $f(x)$, where $f(x)$ is the probability that $X$ is exactly equal to $x$. The PMF of a discrete random variable $X$ must satisfy the following two properties:

1. $f(x) \geq 0$ for all $x$.
2. $\sum_{x} f(x) = 1$.

The PMF of a discrete random variable $X$ can be visualized using a bar chart. The height of the bar for a value $x$ is equal to the probability that $X$ is exactly equal to $x$.

Examples of discrete random variables include the number of heads in 10 coin flips, the number of people in a household, and the number of cars in a parking lot.

The PMF of a discrete random variable can be visualized using altair and python:

```python
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'f(x)': [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]})
```

```python
# Create a bar chart of a discrete probability mass function
alt.Chart(df).mark_bar().encode(
    x='x',
    y='f(x)'
)
```

### Probability of rolling a fair, six-sided die

In [12]:
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'f(x)': [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]})
# Create a bar chart of a discrete probability mass function
alt.Chart(df).mark_bar().encode(
    x='x',
    y='f(x)'
)

## Visualization of flipping an unfair coin

```python
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': [0, 1], 'f(x)': [0.25, 0.75]})
```

```python
# Create a bar chart of a discrete probability mass function
alt.Chart(df).mark_bar().encode(
    x='x',
    y='f(x)'
)
```

In [15]:
# Create a pandas DataFrame with a column x
df = pd.DataFrame({'x': [0, 1], 'f(x)': [0.25, 0.75]})

In [16]:
# Create a bar chart of a discrete probability mass function
alt.Chart(df).mark_bar().encode(
    x='x',
    y='f(x)'
)

## Representing probability mass functions of flipping a coin using mathematics


For a fair coin:

$$
P(H) = P(T) = 0.5
$$

For an unfair coin:

$$
P(H) = p
$$

$$
P(T) = 1 - p
$$