# Chapter 3 - Descriptive and Inferential Statistics

Statistics is the heart of many data-driven innovations. Machine learning in itself is a statistical tool, searching for possible hypotheses
to correlate relationships between different variables in data.

## What Is Data?

The source of not just truth…but intelligence! It’s the fuel for artificial intelligence
and it is believed that the more data you have, the more truth you have. Therefore,
you can never have enough data. Data is not important in itself. It’s the
analysis of data (and how it is produced) that is the driver of all these innovations and
solutions.

Always ask questions about how the data was obtained, and then scrutinize
how that process could have biased the data

## Descriptive Statistics

### Mean and Weighted Mean

The mean is the average of a set of values. The operation is simple to do: sum the
values and divide by the number of values. The mean is useful because it shows where
the “center of gravity” exists for an observed set of values.
The mean is calculated the same way for both populations and samples

#### Calculating mean in Python

In [1]:
samples = [1, 3, 2, 5, 0, 7, 2, 3]

mean = sum(samples) / len(samples)
print(mean)  # 2.875

2.875


There are two versions of the mean you will see: the sample mean $\tilde{x}$ and the population mean μ as expressed here

$\tilde{x}$ = $\frac{x_1 + x_2 + x_3 + ... x_n}{n}$ = $\displaystyle\sum_{}$ $\frac{x_i}{n}$

μ = $\frac{x_1 + x_2 + x_3 + ... x_n}{N}$ = $\displaystyle\sum_{}$ $\frac{x_i}{N}$

The ***n*** and the ***N*** represent the sample and population size, respectively, but mathematically they
represent the same thing: the number of items. 

The same goes for calling the sample mean $\tilde{x}$ (“x-bar”) and the population mean μ (“mu”). Both x and μ are the same calculation, just different names depending on whether it’s a sample or population we are working with.

The mean we commonly use gives equal importance to each value. But we can manipulate the mean and give each item a different weight

weighted mean =  $\frac{(x_1 * w_1) + (x_2 * w_2) + (x_3 * w_3) + ... (x_n * w_n)}{w_1 + w_2 + w_3 + ... w_n}$

Calculating a weighted mean in Python

In [2]:
# Three exams of .20 weight each and final exam of .40 weight
sample = [90, 80, 63, 87]
weights = [.2, .2, .2, .4]

weighted_mean = sum(s*w for s,w in zip(sample, weights)) / sum(weights)

print(weighted_mean)

81.4
