## III. Describing Data with Python

### Finding the Mean

#### The mean is a common and intuitive way summarize a set of numbers. It's what we might simply call the "average" in everyday use, although as we'll see, there are other kinds of averages as well. Let's take sample set of numbers and calculate the mean

In [3]:
"""
Calculating the mean
"""

def calculate_mean(numbers):
    s = sum(numbers)
    N = len(numbers)
    # Calculate the mean
    mean = s / N
    
    return mean

if __name__ == '__main__':
    donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
    mean = calculate_mean(donations)
    N = len(donations)
    print('Mean donation over the last {0} days is {1}'. format(N, mean))


Mean donation over the last 12 days is 477.75


## Finding the Median

#### The median of a collection of numbers is another kind of average. To find the median, we sort the numbers in ascending order. If the length of the list of numbers is odd , the number in the middle of the lis is the median. If the length of the list of numbers is even, we get the median by taking the mean of the two middle numbers. Let's find the median of the previous list of donations: 100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, and 1200.

#### Before we write a program to find the median of a list of numbers, let's think about how we could automatically calculate the middle elements of alist in either case. If the length of a list(N) is odd, the middle numbers is the on in position (N+1)/2. If N is even, the two middle elements are N/2 and (N/2) + 1. 

### In order to write a function that calculates the median, we'll also need to sort a list in ascending order. 

In [1]:
samplelist = [4, 1, 3]
samplelist.sort()
samplelist

[1, 3, 4]

In [3]:
"""
Calculating the median
"""

def calculate_median(numbers):
    N = len(numbers)
    numbers.sort()
    
    # Find the median
    if N % 2 == 0:      # if N is even
        m1 = N/2
        m2 = (N/2) + 1
        # Convert to integer, match position (N/2 might have been a fraction, and lists do not accpet float as indexes)
        m1 = int(m1) - 1
        m2 = int(m2) - 1
        median = (numbers[m1] + numbers[m2])/2   
        
    else:
        m = (N+1) / 2
        # Convert to integer, match position
        m = int(m) - 1
        median = numbers[m]
             
    return median

if __name__ == '__main__':
    donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
    median = calculate_median(donations)
    N = len(donations)
    print('Median donation over the last {0} days is {1}'.format(N, median))

Median donation over the last 12 days is 500.0


### Finding the Mode and Creating a Frequency Table

#### Instead of finding the mean value or the median value of a set of numbers, what if you wanted to find the number that occurs most frequently? This numbers is called the mode.

#### There's no symbolic formula for calculating the mode -- you simply count how many times each unique number occurs and find the one that occurs the most

In [13]:
from collections import Counter
simplelist = [4, 2, 1, 3, 4]
c = Counter(simplelist)
c.most_common()

[(4, 2), (2, 1), (1, 1), (3, 1)]

#### The first element of the first tuple is the number that occurs most frequently, and the second element is the number of times it occurs. The second, third, and fourth tuples contain the other numbers along with the count of the number of times they appear.

In [7]:
c.most_common(2)

[(4, 2), (2, 1)]

In [8]:
mode = c.most_common(1)
mode

[(4, 2)]

In [9]:
mode[0]

(4, 2)

In [10]:
mode[0][0]

4

### Finding the Mode

In [14]:
"""
Calculating the mode
"""

from collections import Counter
def calculate_mode(numbers):
    c = Counter(numbers)
    mode = c.most_common(1)
    return mode[0][0]

if __name__ == '__main__':
    scores = [7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, 10]
    mode = calculate_mode(scores)
    print('The mode of the list of numbers is : {0}'.format(mode))

The mode of the list of numbers is : 9


#### What if you have a set of data where two or more numbers occur the same maximum number of times? For example, in the list of numbers 5, 5, 5, 4, 4, 4, 9, 1, and 3, both 4 and 5 are present three times. In such cases, the list of numbers is said to have multiple modes, and our program should find and print all the modes. The modified program follows:

In [15]:
"""
Calculating the mode when the list of numbers may
have multiple modes
"""

from collections import Counter

def calculate_mode(numbers):
    c = Counter(numbers)
    numbers_freq = c.most_common()
    max_count = numbers_freq[0][1]  # max_count represents the times the most frequent appearing element appear
    
    modes = []
    for num in numbers_freq:
        if num[1] == max_count:  # and if the times of appear equals the max_count  
            modes.append(num[0])
    return modes

if __name__ == '__main__':
    scores = [5, 5, 5, 4, 4, 4, 9, 1, 3]
    modes = calculate_mode(scores)
    print('The mode(s) of the list of numbers are: ')
    for mode in modes:
        print(mode)

The mode(s) of the list of numbers are: 
5
4
