# Measure of central tendency: Mean, Median, Mode

#### Topics covered:
- Calculate mean, median and mode of data from a CSV file

## Mean
- Mean = (Sum of data) / Total_number_of_data
- Mean of 2, 4, 4, 5, 6, 7, 7, 120 is (2+4+4+5+6+7+7+120)/8 = 19.375 

In [25]:
import pandas as pd
import numpy as np

data_file = "students_performance.csv"

In [17]:
# lets create a DF

data = {
    "age_kids": [2, 4, 4, 5, 6, 7, 7, 120]
}
df = pd.DataFrame(data)
print(df)

   age_kids
0         2
1         4
2         4
3         5
4         6
5         7
6         7
7       120


In [18]:
mean = df["age_kids"].mean()
print(mean)

19.375


In [14]:
# Lets read a CSV file and calculate mean of math score
df = pd.read_csv(data_file)
# print(df)

mean = df["math score"].mean()
print(mean)

66.089


## Median

```
The median is the middle value in a dataset when the values are arranged in order (ascending or descending).

If the dataset has an odd number of values → the median is the middle number.
If the dataset has an even number of values → the median is the average of the two middle numbers.

E.g. Find median: 4, 120, 7, 5, 7, 6, 4
Step1: Arrange data in ascending: 4, 4, 5, 6, 7, 7, 120
Step2: Middle value: 6

E.g. Find median: 4, 120, 7, 2, 5, 7, 6, 4
Step1: Arrange data in ascending: 2, 4, 4, 5, 6, 7, 7, 120
Step2: Mean of 2 middle values: (5+6)/2 = 5.5
```

In [19]:
# lets create a DF

data = {
    "age_kids": [4, 120, 7, 2, 5, 7, 6, 4]
}
df = pd.DataFrame(data)
print(df)

   age_kids
0         4
1       120
2         7
3         2
4         5
5         7
6         6
7         4


In [13]:
median = df["age_kids"].median()
print(median)

5.5


In [16]:
# Lets read a CSV file and calculate median of math score
df = pd.read_csv(data_file)
print(df)

median = df["math score"].median()
print(median)

     gender race/ethnicity parental level of education         lunch  \
0    female        group B           bachelor's degree      standard   
1    female        group C                some college      standard   
2    female        group B             master's degree      standard   
3      male        group A          associate's degree  free/reduced   
4      male        group C                some college      standard   
..      ...            ...                         ...           ...   
995  female        group E             master's degree      standard   
996    male        group C                 high school  free/reduced   
997  female        group C                 high school  free/reduced   
998  female        group D                some college      standard   
999  female        group D                some college  free/reduced   

    test preparation course  math score  reading score  writing score  
0                      none          72             72         

## Mode

```
Mode in descriptive statistics is the value that appears most frequently in a dataset. Unlike the mean (average) or median (middle value), the mode tells us about the most common or popular item.

A dataset can have one mode (unimodal), more than one mode (bimodal or multimodal), or no mode if all values occur with the same frequency.

Unimodal: [2, 2, 4, 6, 7, 7, 7, 7, 9]   Here there is only one mode, 7

Unimodal: ['L', 'L', 'XL', 'M', 'M', 'L', 'XL', 'M', 'L']   Here there is only one mode, L

Bimodal: [2, 2, 4, 6, 7, 7, 7, 7, 9, 9, 9, 9]   Here there are 2 modes, 7 and 9

Mode is great for categorical or discrete data where mean/median may not make sense.
```




In [46]:
# data in numbers
data = {
    "Score": [2, 2, 4, 6, 7, 7, 7, 7, 9]
}
df = pd.DataFrame(data)
# print(df)

mode = df["Score"].mode()
# print(mode)
print(mode[0])

7


In [43]:
# side note: You can also calculate the value counts
print(df["Score"].value_counts())

Score
7    4
2    2
4    1
6    1
9    1
Name: count, dtype: int64


In [32]:
# data is in form of strings
data = {
    "Size": ['L', 'L', 'XL', 'M', 'M', 'L', 'XL', 'M', 'L']
}
df = pd.DataFrame(data)
# print(df)

mode = df["Size"].mode()
print(mode[0])

0    L
Name: Size, dtype: object


In [47]:
# data has 2 modes
data = {
    "Score": [2, 2, 4, 6, 7, 7, 7, 7, 9, 9, 9, 9]
}
df = pd.DataFrame(data)
# print(df)

mode = df["Score"].mode()
print(mode)

0    7
1    9
Name: Score, dtype: int64


In [39]:
# Lets read a CSV file and calculate mode of race, most common occuring values
df = pd.read_csv(data_file)
# print(df)

mode = df["race/ethnicity"].mode()
print(mode)

0    group C
Name: race/ethnicity, dtype: object


In [41]:
# side note: You can also calculate the value counts

print(df["race/ethnicity"].value_counts())

race/ethnicity
group C    319
group D    262
group B    190
group E    140
group A     89
Name: count, dtype: int64
