# Numerical Summary
Sometimes, data can be large and difficult to understand. Numerical Summary thus summurizes such data into a single value that is easy to understand. There are two types of Numerical Summaries:
* Measures of Central Tendancy
* Measures of Dispersion

## 1. Measures of Central Tendancy
When given a large set of Data, measures of central tendancy tend to estimate the middle ground as a way of summarizing the data. There are three major measures of central tendancy:
- The Mean
- The Median
- The Mode

### i. The Mean
The mean is simply the average of the data. This is a type of mean called the **arithmetic mean**. There are other types of mean as well, such as the weighted mean and the golden mean. Our main focus will be the arithmetic mean which is simply the average of the data.

Given a set of observations, the arithmetic mean is derived by dividing the sum of the observations by the number of observations (N).

Observe:

In [27]:
# The following data shows the annual salaries of 7 employees in two companies:
company_a_salaries = [34500, 30700, 32900, 36000, 34100, 33800, 32500]
company_b_salaries = [34900, 27500, 31600, 39700, 35300, 33800, 31700]

# Find the mean salary in both companies
def get_mean(data: list):
    mean = (sum(data))/len(data)
    return mean

company_a_mean_salary = get_mean(company_a_salaries)
company_b_mean_salary = get_mean(company_b_salaries)

print(f"The average salary of Company A employees is ${company_a_mean_salary} and that of Company B employees is ${company_b_mean_salary}")

The average salary of Company A employees is $33500.0 and that of Company B employees is $33500.0


The arithmetic means of both data sets turned out to be identical. 

Above is an example of a population mean, that is the mean of all the observations. Sometimes, when the population size is too large, we may opt to calculate the mean of a sample - a subset of the population. The sample mean is calculated in the same way as the population mean.

### 2. The Median
The median is simply defined as the middle observance when a set of observances is ordered in ascending order.
When calculating the median, one may run into one of these two cases:
1. The number of observations (N) is odd.
2. The number of observations (N) is even.

In [28]:
# Find the median of both companies:
def get_median(data: list):

    data.sort()
    
    if (len(data)%2) != 0:
        median = data[(int((len(data) + 1) / 2) - 1)] # All we're doing is getting the {(n+1)/2}th value. I then turn this from a float to an int. I finally subtract 1 because I'm indexing.

    else:
        median = (data[(int((len(data)) / 2)) - 1] + data[(int(((len(data)) + 2) / 2)) - 1]) / 2 # Here, I get the mean of the {n/2}th value and the {(n+2)/2}th value. Then carry out the same process as above.
    
    return median

company_a_median_salary = get_median(company_a_salaries)
company_b_median_salary = get_median(company_b_salaries)

print(f"The median salary of Company A is {company_a_median_salary} and that of Company B is {company_b_median_salary}")

The median salary of Company A is 33800 and that of Company B is 33800


Coincidentally, just like the arithmetic mean, the median of these data sets are identical.

The if-else statement checks whether N is even or odd and calculates the mean based on the findings. Both our data sets had the same N, 7, which is odd. What if we had a data set where N is even?

In [30]:
'''
A sample of 8 US corporations showed the following percentage changes in earnings per share in the current year
compared with the previous year. Find the mean and the median of the percentage change in earnings per share.
'''

# percentage changes in earnings per share
percentage_changes = ['13.6%', '25.5%', '43.6%', '-19.8%', '-13.8%', '12.0%', '36.3%', '14.3%']

# Changing the format in order to compute the mean and the median
changes = [] # a list without the % and the strings have been turned to numberss

for change in percentage_changes:
    remove_percentage = change[:-1]
    change_to_number = float(remove_percentage)
    changes.append(change_to_number)

# Compute the Mean
sample_mean = get_mean(changes)

# Compute the Median
median_change = get_median(changes)

print(f"The mean percentage change in earnings per share is {sample_mean}% while the median percentage change in earnings per share is {median_change}%")

The mean percentage change in earnings per share is 13.9625% while the median percentage change in earnings per share is 13.95%


This time, the mean and the median aren't identical, but they sure are close. While the mean is the most popular measure of central tendancy, there are times when it is more appropriate to use the median... eg, continue tomorrow!