# Measures of Central Tendency
We will explore three key measures of central tendency: mean, median, and mode. These measures help us understand the **centre** or **typical** value in a collection of numbers. Each one summarises the data differently and is useful in different situations.

## Mean
The **mean** is the sum of all values in a dataset divided by the number of values, that is,  $\bar{x} = \frac{x_1 + x_2 + \dots + x_n}{n}$. This gives us a single value that represents the centre of the data. However, it can be **heavily affected by outliers** — very large or very small values that pull the mean away from where most of the data lies.

### Reading the Data
We will work with a CSV file named `'student_data.csv'`, which contains information about students scores in math, reading, and writing. Let's read the dataset and store it as a dictionary before proceeding. Instead of using any external libraries, we will manually read the file using basic Python. This will give us a better understanding of how CSV files are structured and how data can be extracted using lists and dictionaries.

In [1]:
# Open the CSV file, read it, and store it as a dictionary
file_path = 'student_data.csv'

with open(file_path, mode = 'r') as file:  # opening the file in reading mode
    lines = file.readlines()  # reading all the lines in the file as a list of strings

data = [line.strip().split(',') for line in lines]  # use the commas in the lines to split the values for each line

df = {key: [int(row[i]) for row in data[1:]] for i, key in enumerate(data[0])}  # building the dictionary

In [2]:
# Take a look at the data
print(df)

{'math score': [72, 69, 90, 47, 76, 71, 88, 40, 64, 38, 58, 40, 65, 78, 50, 69, 88, 18, 46, 54, 66, 65, 44, 69, 74, 73, 69, 67, 70, 62, 69, 63, 56, 40, 97, 81, 74, 50, 75, 57, 55, 58, 53, 59, 50, 65, 55, 66, 57, 82, 53, 77, 53, 88, 71, 33, 82, 52, 58, 0, 79, 39, 62, 69, 59, 67, 45, 60, 61, 39, 58, 63, 41, 61, 49, 44, 30, 80, 61, 62, 47, 49, 50, 72, 42, 73, 76, 71, 58, 73, 65, 27, 71, 43, 79, 78, 65, 63, 58, 65, 79, 68, 85, 60, 98, 58, 87, 66, 52, 70, 77, 62, 54, 51, 99, 84, 75, 78, 51, 55, 79, 91, 88, 63, 83, 87, 72, 65, 82, 51, 89, 53, 87, 75, 74, 58, 51, 70, 59, 71, 76, 59, 42, 57, 88, 22, 88, 73, 68, 100, 62, 77, 59, 54, 62, 70, 66, 60, 61, 66, 82, 75, 49, 52, 81, 96, 53, 58, 68, 67, 72, 94, 79, 63, 43, 81, 46, 71, 52, 97, 62, 46, 50, 65, 45, 65, 80, 62, 48, 77, 66, 76, 62, 77, 69, 61, 59, 55, 45, 78, 67, 65, 69, 57, 59, 74, 82, 81, 74, 58, 80, 35, 42, 60, 87, 84, 83, 34, 66, 61, 56, 87, 55, 86, 52, 45, 72, 57, 68, 88, 76, 46, 67, 92, 83, 80, 63, 64, 54, 84, 73, 80, 56, 59, 75, 85, 

### Example
Calculate the mean math score for all the students in the dataset

In [3]:
# Mean math score
mean = sum(df['math score']) / len(df['math score'])
print('Mean of math scores:', mean)

Mean of math scores: 66.089


In [4]:
# Mean reading score
mean = sum(df['reading score']) / len(df['reading score'])
print('Mean of reading scores:', mean)

Mean of reading scores: 69.169


In [5]:
# Mean writing score
mean = sum(df['writing score']) / len(df['writing score'])
print('Mean of writing scores:', mean)

Mean of writing scores: 68.054


## Median
The **median** represents the **middle value** in **sorted** data. Unlike the mean, the median is not affected by very high or very low values. This makes it a better measure of central tendency when the data contains outliers. The formula for the median depends on whether the number of values is odd or even.
- If the number of elements is **odd**, the median is the middle value, that is, $\text{Median} = x\scriptscriptstyle \scriptsize{\left[\dfrac{n + 1}{2}\right]}$
- If the number of elements is **even**, the median is the mean of the two middle values, that is, $\text{Median} = \dfrac{x[\frac{n}{2}] + x[\frac{n}{2} + 1]}{2}$

### Example
Calculate the median math score for all the students in the dataset

In [6]:
def calculate_median(x):
    # Sort the list
    x.sort()  # The data must be sorted to compute the median

    # Count the number of elements
    n = len(x)

    # Calculate the median based on whether n is odd or even
    if n % 2 == 1:
        median = x[n // 2]  # Odd number of elements → take the middle value (mathematically n // 2 + 1 adjusted for Python indexing)
    else:
        mid1 = x[n // 2 - 1]  # The lower of the two middle values
        mid2 = x[n // 2]  # The greater of the two middle values
        median = (mid1 + mid2) / 2   # Even number of elements → average of the two middle values

    return median

In [7]:
# Median math score
print('Median of math scores:', calculate_median(df['math score']))

Median of math scores: 66.0


In [8]:
# Median reading score
print('Median of reading scores:', calculate_median(df['reading score']))

Median of reading scores: 70.0


In [9]:
# Median writing score
print('Median of writing scores:', calculate_median(df['writing score']))

Median of writing scores: 69.0


## Mode
The **mode** is the value that appears most frequently. Unlike mean or median, there can be:
- **One mode** (unimodal)
- **Multiple modes** (multimodal)
- **No mode** (if all values occur only once)

### Example
Calculate the mode(s) of the scores for all the students in the dataset; return these as a list of all modes

In [10]:
def calculate_mode(x):
    # Count frequencies
    frequency = {}
    for value in x:
        frequency[value] = frequency.get(value, 0) + 1  # Return current count and increment by 1 or return 0 and increment by 1

    # Identify max frequency
    max_count = max(frequency.values())  # The maximum count in the dictionary

    # Find all values with that frequency
    mode = [key for key, count in frequency.items() if count == max_count]  # Obtain all keys with maximum frequency

    # Check if all values are unique (no mode)
    if len(mode) == len(frequency):
        return 'No mode'
    else:
        return mode

In [11]:
# Modal math score
print('Mode of math scores:', calculate_mode(df['math score']))

Mode of math scores: [65]


In [12]:
# Modal reading score
print('Mode of reading scores:', calculate_mode(df['reading score']))

Mode of reading scores: [72]


In [13]:
# Modal writing score
print('Mode of writing scores:', calculate_mode(df['writing score']))

Mode of writing scores: [74]
