# Video: Calculating Basic Statistics with Python

In this video, you will learn how to compute means and standard deviations in Python.

## Calculating Basic Statistics with Python

* $X = [x_1, x_2, \ldots, x_n]$
* $\mathrm{mean}(X) = \mu_X = \bar{X}$
* $\mathrm{stdev}(X) = \sigma_X$

## Calculating the Mean with Python

* $X = [x_1, x_2, \ldots, x_n]$
* $\mathrm{mean}(X) = \mu_X = \bar{X} = \frac{\sum_i x_i}{|X|}$

In [None]:
X = [1, 2, 3, 4, 5]

In [None]:
1 + 2 + 3 + 4 + 5

15

In [None]:
sum(X)

15

In [None]:
len(X)

5

In [None]:
sum(X) / len(X)

3.0

## Using NumPy to Calculate the Mean

Libraries give us pre-written code to solve common tasks.
* Pre-written
* Pre-tested
* Often faster

NumPy is the first library we will use for these common tasks.

In [None]:
import numpy as np

In [None]:
X

[1, 2, 3, 4, 5]

In [None]:
X_np = np.array(X)

In [None]:
X_np

array([1, 2, 3, 4, 5])

In [None]:
X_np.mean()

np.float64(3.0)

In [None]:
np.mean(X_np)

np.float64(3.0)

In [None]:
np.mean(X)

np.float64(3.0)

## Using Pandas to Calculate the Mean

* pandas is a library for data management and visualization.
* Will use pandas to load data in mod 1.
* More advanced pandas usage in mod 2.

In [None]:
import pandas as pd

In [None]:
X_pd = pd.DataFrame(data={"X": X})

In [None]:
X_pd

Unnamed: 0,X
0,1
1,2
2,3
3,4
4,5


In [None]:
X_pd.mean()

Unnamed: 0,0
X,3.0


In [None]:
np.mean(X_pd)

np.float64(3.0)

## Calculating the Standard Deviation with Python

$\mathrm{stdev}(X) = \sigma_X = \sqrt{\frac{\sum_i (x_i - \mu_X)^2)}{|X|-1}}$


In [None]:
X

[1, 2, 3, 4, 5]

In [None]:
np.mean(3)

np.float64(3.0)

In [None]:
[x - 3 for x in X]

[-2, -1, 0, 1, 2]

In [None]:
[(x - 3) ** 2 for x in X]

[4, 1, 0, 1, 4]

In [None]:
sum([(x - 3) ** 2 for x in X])

10

In [None]:
sum([(x - 3) ** 2 for x in X]) / 4

2.5

In [None]:
import math

In [None]:
math.sqrt(sum([(x - 3) ** 2 for x in X]) / 4)

1.5811388300841898

In [None]:
np.std(X, ddof=1)

np.float64(1.5811388300841898)

In [None]:
np.std(X)

np.float64(1.4142135623730951)

In [None]:
X_np.std(ddof=1)

np.float64(1.5811388300841898)

In [None]:
X_pd.std(ddof=1)

Unnamed: 0,0
X,1.581139


In [None]:
X_pd.describe()

Unnamed: 0,X
count,5.0
mean,3.0
std,1.581139
min,1.0
25%,2.0
50%,3.0
75%,4.0
max,5.0
