# Use Pandas to help visualize the random simulation

We will use Pandas methods to visualize the random numbers generated in our simulation and their summary statistics.

## Import Modules

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Generate random numbers

Let's start with creating 7 random numbers from a Uniform Distribution.

We need to initialize the random number generator object.

In [None]:
rg = np.random.default_rng(2100)

Let's replicate generating 7 random numbers 25 times.

Assign the result to a 2D NumPy array.

In [None]:
X = rg.random( (7, 25) )

In [None]:
X.shape

In [None]:
X.ndim

Convert the NumPy array into a Pandas DataFrame.

In [None]:
Xdf = pd.DataFrame( X )

In [None]:
%whos

In [None]:
Xdf.info()

In [None]:
Xdf.describe()

In [None]:
X.mean(axis=0)

In [None]:
Xdf.describe().index

In [None]:
Xdf.describe().loc['mean']

In [None]:
Xdf.describe().loc[ ['mean', 'std'] ]

## Visualization

Boxplot can help visualize the CENTRAL behavior AND the VARIATION of COLUMNS in DataFrames!

Let's use a boxplot now to understand the variability in the samples across the REPLICATIONS.

Please remember, the COLUMNS in `Xdf` are REPLICATIONS of generating 7 random numbers!

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

Xdf.boxplot(ax=ax)

plt.show()

But the horizontal green line is NOT the AVERAGE.

Let's include the AVERAGE in the boxplot!

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

Xdf.boxplot(ax=ax, showmeans=True)

plt.show()

Let's change the GREEN triangle to a RED triangle to make it easier to see the sample average.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

Xdf.boxplot(ax=ax, showmeans=True, meanprops={'markerfacecolor': 'red', 'markeredgecolor': 'red'})

plt.show()

But, we used far more than just 25 replications in our simulations.

We saw that when thousands of replications were used the SIMULATION behaved like theory. 

The CLT showed replicated thousands of times produced a GAUSSIAN or BELL CURVE distribution for the averages!

Let's now use 500 replications to see the variation in the sample average based on 7 random samples!

In [None]:
rg = np.random.default_rng(2100)

In [None]:
Xdf_more_reps = pd.DataFrame( rg.random( (7, 500) ) )

In [None]:
Xdf_more_reps.shape

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

Xdf_more_reps.boxplot(ax=ax, showmeans=True, meanprops={'markerfacecolor': 'red', 'markeredgecolor': 'red'})

plt.show()

What happens if we used FEWER random samples to calculate the sample average? 

Let's generate 3 random samples and replicate 500 times.

In [None]:
rg = np.random.default_rng(2100)

In [None]:
X_N003_df = pd.DataFrame( rg.random( (3, 500) ) )

In [None]:
X_N003_df.shape

In [None]:
type( X_N003_df )

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

X_N003_df.boxplot(ax=ax, showmeans=True, meanprops={'markerfacecolor': 'red', 'markeredgecolor': 'red'})

plt.show()

What if we generated many more samples per replication?

What if the sample average was calculated based on 700 random samples per replication?

In [None]:
rg = np.random.default_rng(2100)

In [None]:
X_N700_df = pd.DataFrame( rg.random( (700, 500)) )

In [None]:
X_N700_df.shape

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

X_N700_df.boxplot(ax=ax, showmeans=True, meanprops={'markerfacecolor': 'red', 'markeredgecolor': 'red'})

plt.show()

In [None]:
X_N700_df.describe()