## What are the 3 main types of descriptive statistics?
The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

#### Distribution refers to the frequencies of different responses.
#### Measures of central tendency give you the average for each response.
#### Measures of variability show you the spread or dispersion of your dataset.

reference = https://www.scribbr.com/frequently-asked-questions/main-types-of-descriptive-statistics/

## Measures of central tendency 

help you find the middle, or the average, of a data set. The 3 most common measures of central tendency are the mode, median, and mean.
  
Mode: the most frequent value.  
Median: the middle number in an ordered data set.  
Mean: the sum of all values divided by the total number of values.  
In addition to central tendency, the variability and distribution of your data set is important to understand when performing descriptive statistics.  

## Distributions and central tendency
A data set is a distribution of n number of scores or values.

Normal distribution  
In a normal distribution, data is symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center. The mean, mode and median are exactly the same in a normal distribution.

Example: Normal distribution
You survey a sample in your local community on the number of books they read in the last year.
A histogram of your data shows the frequency of responses for each possible number of books. From looking at the chart, you see that there is a normal distribution.

![Normal_Distribution.svg](attachment:Normal_Distribution.svg)

The mean, median and mode are all equal; the central tendency of this data set is 8.



### Skewed distributions

In skewed distributions, more values fall on one side of the center than the other, and the mean, median and mode all differ from each other. One side has a more spread out and longer tail with fewer scores at one end than the other. The direction of this tail tells you the side of the skew

###### In a positively skewed distribution,
there’s a cluster of lower scores and a spread out tail on the right.
###### In a negatively skewed distribution,
there’s a cluster of higher scores and a spread out tail on the left.

In this histogram, your distribution is skewed to the right, and the central tendency of your data set is on the lower end of possible scores.

In a positively skewed distribution, mode < median < mean.

![Positive_Skew_Distribution.svg](attachment:Positive_Skew_Distribution.svg)

In this histogram, your distribution is skewed to the left, and the central tendency of your data set is towards the higher end of possible scores.
In a negatively skewed distribution, mean < median < mode.

![Negative_Skew_Distribution.svg](attachment:Negative_Skew_Distribution.svg)

### Mode
The mode is the most frequently occurring value in the data set. It’s possible to have no mode, one mode, or more than one mode.

To find the mode, sort your data set numerically or categorically and select the response that occurs most frequently.

Example: Finding the mode
In a survey, you ask 9 participants whether they identify as conservative, moderate, or liberal.
To find the mode, sort your data by category and find which response was chosen most frequently.

To make it easier, you can create a frequency table to count up the values for each category.


Political ideology	Frequency  
Conservative	2  
Moderate	3  
Liberal	4  
Mode: Liberal  

###### When to use the mode
The mode is most applicable to data from a nominal level of measurement. Nominal data is classified into mutually exclusive categories, so the mode tells you the most popular category.

For continuous variables or ratio levels of measurement, the mode may not be a helpful measure of central tendency. That’s because there are many more possible values than there are in a nominal or ordinal level of measurement. It’s unlikely for a value to repeat in a ratio level of measurement.

### Median
The median of a data set is the value that’s exactly in the middle when it is ordered from low to high.

Example: Finding the median
You measure the reaction times of 7 participants on a computer task and categorize them into 3 groups: slow, medium or fast.
Participant	1	2	3	4	5	6	7
Speed	Medium	Slow	Fast	Fast	Medium	Fast	Slow
To find the median, you first order all values from low to high. Then, you find the value in the middle of the ordered data set – in this case, the value in the 4th position.

Ordered data set	Slow	Slow	Medium	Medium	Fast	Fast	Fast
Median: Medium

In larger data sets, it’s easier to use simple formulas to figure out the position of the middle value in the distribution. You use different methods to find the median of a data set depending on whether the total number of values is even or odd.

Median of an odd-numbered data set
For an odd-numbered data set, find the value that lies at the (n+1)/2 position, where n is the number of values in the data set.

Example
You measure the reaction times in milliseconds of 5 participants and order the data set.
Reaction time (milliseconds)	287	298	345	365	380
The middle position is calculated using (n+1)/2, where n = 5.

(5+1)/2 = 3

That means the median is the 3rd value in your ordered data set.

Median: 345 milliseconds

Median of an even-numbered data set
For an even-numbered data set, find the two values in the middle of the data set: the values at the n/2 and (n/2) + 1 positions. Then, find their mean.

Example
You measure the reaction times of 6 participants and order the data set.
Reaction time (milliseconds)	287	298	345	357	365	380
The middle positions are calculated using n/2 and (n/2) + 1, where n = 6.

6/2 = 3

(6/2) + 1 = 4

That means the middle values are the 3rd value, which is 345, and the 4th value, which is 357.

To get the median, take the mean of the 2 middle values by adding them together and dividing by two.

(345 + 357)/2 = 351

Median: 351 milliseconds

### Mean
The arithmetic mean of a data set is the sum of all values divided by the total number of values. It’s the most commonly used measure of central tendency because all values are used in the calculation.

Example: Finding the mean
Participant	1	2	3	4	5
Reaction time (milliseconds)	287	345	365	298	380
First you add up the sum of all values:

⅀x = 287 + 345 + 365 + 298 + 380 = 1675

Then you calculate the mean using the formula ⅀x/n. There are 5 values in the dataset, so n = 5.

Mean (x̄) = 1675/5 = 335

Mean: 335 milliseconds

Outlier effect on the mean
Outliers can significantly increase or decrease the mean when they are included in the calculation. Since all values are used to calculate the mean, it can be affected by extreme outliers. An outlier is a value that differs significantly from the others in a data set.

Example: Mean with an outlier
In this data set, we swap out one value with an extreme outlier.
Participant	1	2	3	4	5
Reaction time (milliseconds)	832	345	365	298	380
⅀x = 832 + 345 + 365 + 298 + 380 = 2220

Mean (x̄) = ⅀x/n = 2220/5 = 444

Due to the outlier, the mean becomes much higher, even though all the other numbers in the data set stay the same.

Mean: 444 milliseconds

Population versus sample mean
A data set contains values from a sample or a population. A population is the entire group that you are interested in researching, while a sample is only a subset of that population.

While data from a sample can help you make estimates about a population, only full population data can give you the complete picture.

In statistics, the notation of a sample mean and a population mean and their formulas are different. But the procedures for calculating the population and sample means are the same.

Sample mean formula Population mean formula
The sample mean is written as M or x̄ (pronounced x-bar). For calculating the mean of a sample, use this formula:

x̄ = ⅀x/n

x̄:  sample mean
⅀x: sum of all values in the sample data set
n: number of values in the sample data set
When should you use the mean, median or mode?
The 3 main measures of central tendency are best used in combination with each other because they have complementary strengths and limitations. But sometimes only 1 or 2 of them are applicable to your data set, depending on the level of measurement of the variable.

The mode can be used for any level of measurement, but it’s most meaningful for nominal and ordinal levels.
The median can only be used on data that can be ordered – that is, from ordinal, interval and ratio levels of measurement.
The mean can only be used on interval and ratio levels of measurement because it requires equal spacing between adjacent values or scores in the scale.



