In [1]:
import pandas as pd
import numpy as np

### You were actually thinking about this. A random variable just returns a single value given its probability from a universe.

If the variable is discrete, then for each event in the universe there is an associated probability. If we do not have these explicitly, we can use the available data to count the frequency of each event and then determine the probability for each event, resulting in our discrete random variable. Then to sample it, we just obtain one random event per sample.

On the other hand, if we have a discrete multivariate random variable which models a random event, then when we sample it we obtain a vector, not a single value. 

The next example describes a continues bivariate random variable modeling the height and weight of different persons. The values are correlated. When drawing samples we draw a 2D vector.

In [2]:
# Define the means and standard deviations for heights and weights
mean_height = 170
std_height = 10
mean_weight = 70
std_weight = 15

# Define the correlation coefficient between heights and weights
correlation = 0.7

# Number of samples to generate
num_samples = 100

# Generate random samples for heights and weights
np.random.seed(42)  # Set a seed for reproducibility
heights = np.random.normal(loc=mean_height, scale=std_height, size=num_samples)
weights = np.random.normal(loc=mean_weight, scale=std_weight, size=num_samples)

# Introduce correlation between heights and weights
cov_matrix = np.array([[std_height ** 2, correlation * std_height * std_weight],
                       [correlation * std_height * std_weight, std_weight ** 2]])

# Generate correlated heights and weights
correlated_heights, correlated_weights = np.random.multivariate_normal(
    mean=[mean_height, mean_weight], cov=cov_matrix, size=num_samples).T

# Create a pandas DataFrame to represent the multivariate random variable
data = {'Height': correlated_heights, 'Weight': correlated_weights}
multivariate_random_variable = pd.DataFrame(data)

# Print the first few rows of the DataFrame
print(multivariate_random_variable.head())


       Height     Weight
0  163.921347  66.511495
1  155.152331  57.420154
2  186.664388  87.265252
3  162.869453  64.057491
4  144.441117  74.533933
