## Computing Summary Statistics

Given a sample of $N$ values ${X_1, X_2, \ldots, X_N}$ of some experimentally observed quantity $X$:

---
**Exercise:**

Define functions that compute:

1. sample **mean** $\bar{X}=\frac{1}{N}\sum_{i=1}^N X_i$, 
2. sample **variance** $\sigma^2=\frac{1}{N-1}\sum_{i=1}^N \left(X_i-\bar{X_i}\right)^2$,
3. **maximum** $X_\text{max}$ and **minimum** $X_\text{min}$ value, and
4. sample **median**.

Having implemented all functions (1)-(4):
5. Compare the results of your functions to those computed by the [`statistics`](https://docs.python.org/3/library/statistics.html) module from Python's standard library.
   
---

*Notes*:

- All functions should accept a list of numbers as single input argument and return a single number which corresponds to the computed value.
- Don't forget to reuse functions that you have defined previously. For example, computation of the variance (2), requires evaluation of the mean. You can simply call an existing function in the definition block of another function.
- Python provides in-built `min` and `max` functions that return the min/max value of an iterable object (list, sequence, etc). Don't use those in task (3), write your own implementation of `min()`, `max()`. Choose function names that do not collide with these built-in functions.
-  Computation of the *median* in task (4) will require you to first sort the observed values in ascending order. Then two cases need to be distinguished:
   - If the list contains an *odd* number of items, the median corresponds to the $(N+1)/2$-th item of the list.
   - For an *even* number of items, the median is computed as the average of the $(N/2)$-th item and the $(N/2+1)$-th item.
   
    Remember that Python uses '0-based' indexing!
- You are welcome to write your own sorting algorithm for (4), but you can also use the built-in [`sorted()`](https://docs.python.org/3/howto/sorting.html)  function:

In [None]:
a_list = [3, 1, 0, -1, 2, 4, -5, -2, 5, -4, -2]        # this is just an example list
a_list_sorted = sorted(a_list)                         # `sorted` returns a sorted (ascending) 
                                                       # version of that list
print(a_list)
print(a_list_sorted)

- You can generate 'observations' for testing your functions using a random number generator from the [`random`](https://docs.python.org/3/library/random.html) module from Python's standard library.
  In the following example, we draw $N=100$ random numbers from a Gaussian distribution with mean $\mu=50$ and standard deviation $\sigma=15$:

In [None]:
import random                                          # module with random number generators

N = 100
mean_dist = 50
std_dist  = 15

observations = []                                      # an empty list
for i in range(N):                                     # loop with N iterations -> draw N numbers
    sample_value = random.gauss(mean_dist, std_dist)   # draw number from distribution
    observations.append(sample_value)                  # add number to our 'observations' list
    

- You can report the results returned by your functions in a similar way:

In [None]:
import statistics

print("mean:     %.4f"% statistics.mean(observations))
print("variance: %.4f"% statistics.variance(observations))
print("median:   %.4f"% statistics.median(observations))
print("min:      %.4f"% min(observations))
print("max:      %.4f"% max(observations))

## Solution

In [None]:
# (1)

def mean(obs_list):
    """
    Computes arithmetic mean of list of values.
    
    Args:
        - obs_list: list of 'observation' values
        
    Returns:
        - arithmetic mean of sample values in obs_list
    """
    sum_obs = 0
    for obs in obs_list:
        sum_obs += obs
    return sum_obs/len(obs_list)

In [None]:
# (2)
def variance(obs_list):
    """
    Computes (corrected) sample variance.
    
    Args:
        - obs_list: list of 'observation' values
        
    Returns:
        - corrected sample variance
    """
    sample_mean = mean(obs_list)
    var_tmp = 0
    for obs in obs_list:
        diff = obs - sample_mean
        var_tmp += diff**2
    return var_tmp/(len(obs_list)-1)
            

In [None]:
# (3)

def my_min(obs_list):
    """
    Returns smallest value from list of values.
    """
    min_val = obs_list[0]
    for obs in obs_list:
        if obs < min_val:
            min_val = obs
    return min_val

def my_max(obs_list):
    """
    Returns largest value from list of values.
    """
    max_val = obs_list[0]
    for obs in obs_list:
        if obs > max_val:
            max_val = obs
    return max_val


In [None]:
# (4) with naive sorting algorithm and (better) built-in sorting algorithm

def sort_list(obs_list, ascending=True):
    sorted_list = obs_list.copy()
    continue_sort=True
    cnt = 0
    while continue_sort:
        for i, curr_obs in enumerate(sorted_list[:-1]):
            next_obs = sorted_list[i+1]
            if ((curr_obs > next_obs) and ascending) or \
               ((curr_obs < next_obs) and not ascending):
                sorted_list[i] = next_obs
                sorted_list[i+1] = curr_obs
                continue_sort=True
                break
            else:
                continue_sort=False
        cnt += 1
    print("Sorted in %i iterations"%cnt)
    return sorted_list

def median_slow(obs_list):
    """
    Computes median using naive sorting algorithm.
    
    Args:
        - obs_list: list of 'observation' values
        
    Returns:
        - median of sample values in obs_list
    """
    N = len(obs_list)
    sorted_list = sort_list(obs_list, ascending=True)
    if N%2==0:    #even -> average of 
        n_mid_1 = int(N/2)
        n_mid_2 = int(N/2+1)
        median = (sorted_list[n_mid_1-1]+sorted_list[n_mid_2-1])/2
    else:         #odd
        n_mid = int((N+1)/2)
        median = sorted_list[n_mid-1]
    return median

def median(obs_list):
    """
    Computes median using built-in sorting algorithm.
    
    Args:
        - obs_list: list of 'observation' values
        
    Returns:
        - median of sample values in obs_list
    """
    N = len(obs_list)
    sorted_list = sorted(obs_list)
    if N%2==0:    #even 
        n_mid_1 = int(N/2)
        n_mid_2 = int(N/2+1)
        median = (sorted_list[n_mid_1-1]+sorted_list[n_mid_2-1])/2
    else:         #odd
        n_mid = int((N+1)/2)
        median = sorted_list[n_mid-1]
    return median

In [None]:
# (5) Results

print("mean:     %.4f"% mean(observations))
print("variance: %.4f"% variance(observations))
print("median:   %.4f"% median(observations))
print("min:      %.4f"% my_min(observations))
print("max:      %.4f"% my_max(observations))