# Statistics Refresher

![Grand-average-event-related-potential-ERP-across-all-participants-n13-under-the-646341849.png](attachment:Grand-average-event-related-potential-ERP-across-all-participants-n13-under-the-646341849.png)

## Descriptive Statistics
Descriptive statistics are stastics (quantities or measures) that describe the values in a data set.

## Mean, Median and Mode

Mean, Median and Mode are measures of [central tendency](https://en.wikipedia.org/wiki/Central_tendency), which is a central or typical value for a probability distribution.

For example: If someone asks you how long it takes you to get to school and you say, "about 20 minutes," 20 minutes is the central tendency of all of your trips to school.

## Mean
[Mean](https://en.wikipedia.org/wiki/Mean), or usually referred as arithmetic mean, is one of the basic concepts in statistics. It represents a central value of a finite set of numbers. The formula of mean is the sum of all numbers divided by the number of numbers, i.e.
$$\mu = \frac{\sum_{i=1}^{n}x_{i}}{n}$$

In [None]:
from statistics import mean

response_times =[403, 457, 403, 423, 395, 344, 403, 497]

mean_value = mean(response_times)
print('The mean response time is {:.2f} ms'.format(mean_value))

## Median
[Median](https://en.wikipedia.org/wiki/Median) is a number separating the higher half from the lower half of a data set. It is often thought as the midpoint value of a data set. If the data set has an odd number of numbers, the middle one is selected; if the data set has an even number of numbers, the median is usually defined as the average of the two middle values.

The median is a better measure of the typical value of a dataset when outliers may cause the mean to not be representative of those data. For example:

There are five people at a party. There net worths in thousands of dollars are:
34, 67, 123, 64, 78

Let's calculate the mean:

In [None]:
net_worths = [34, 67, 123, 64, 78]

mean_value = mean(net_worths)
print('The mean net worth is {:.2f} thousand dollars'.format(mean_value))

Now Bill Gates walks into the room. How does that affect the mean net worth of everyone at the party?

In [None]:
net_worths = [34, 67, 123, 64, 78, 130_000_000]

mean_value = mean(net_worths)
print('The mean net worth is {:,.2f} thousand dollars'.format(mean_value))

The mean isn't a good descriptive value of the data. To remedy this, we can use the median, the center value.

In [None]:
from statistics import median

net_worths = [34, 67, 123, 64, 78, 130_000_000]

median_value = median(net_worths)
print('The median net worth is {:,.2f} thousand dollars'.format(median_value))

If we sort the net worths, we can see that 72,500 likes in the middle of the values, and is therefore a good measure of central tendency. Bill Gates, whose net worth is an outlier, doesn't affect the median value.

In [None]:
net_worths.sort()
net_worths

## Mode
[Mode](https://en.wikipedia.org/wiki/Mode_%28statistics%29) is a number which appears most often in a data set. A data set is said as multimodal if no number in the set appears more than 1 time, so every number in the set is a valid mode.

If we use the response times from above again, we can find the mode of the dataset.

In [None]:
from statistics import mode

response_times =[403, 457, 403, 423, 395, 344, 403, 497]

mode_value = mode(response_times)
print('The mode of the response times is {:.2f} ms'.format(mode_value))

# 🧠 Thoughts

Look at the image at the top of this page.
The solid green line, for example, is a representation of all of the dashed green lines. At each timepoint the solid green line is a descriptive statistic of the dashed green lines. 

Tip: highlight the question below, press `b` to create a cell **b**elow, press `m` to make it a **m**arkdown cell, then press `Enter` to edit the cell. Type your thoughts there. When you're finished, press `Ctrl+Enter`, move down to the next cell, and do the same.

🤔 Which of the three descriptive statistics we've look at (mean, median, mode) would be appropriate to create the solid green line?

🤔 In what situations would the other two descriptive statistics be more appropriate?

# Summary

This lesson introduces three measures of central tendency. Their PROs and CONs can be summarized as the following:

Measures | PROs | CONs
-------- | ---- | ----
Mean | All values are taken account in calculation. | Extreme numbers can affect the value.
Median | Not affected by extreme numbers | If the number of data is even, the median is not any value in the data, but is the average of the two center values.
Mode | Applicable to non numeric data | Multiple mode values can happen or no mode value exists, i.e. mode does not always represent the data.