# Assignment 9.2

> Replace all TODOs with your code. Do not change any other code.

In [1]:
# Do not edit this cell

from typing import List
from timeit import timeit

## Descriptive statistics

In this assignment, we will write the functions to calculate the basic statistics from scratch, not using numpy.

### Task 1

Let's start simple: write a function `mean` that calculates the average of the list.

$$\mu = \frac{{\sum_{i=1}^n x_i}}{{n}}$$

In [2]:
def mean(li: List[float]) -> float:
    list_len = len(li)
    sum_list = sum(li)
    return sum_list/list_len


assert mean([1., 2., 3.]) == 2.
assert mean([1., 1., 2., 0.]) == 1.

### Task 2

Now let's calculate variance (dispersion). You may use the `mean` function implemented before.

$$V = \frac{{\sum_{i=1}^n (x_i - \mu)^2}}{{n}}$$

In [3]:
def variance(li: List[float]) -> float:
    mid = sum(li)/len(li)
    i = 0
    for x in li:
        j = (x - mid)**2
        i += j
        
    return i / len(li)


assert variance([1., 1., 1.]) == 0.
assert variance([1., 2., 3., 4.]) == 1.25

### Task 3

The standard deviation is easy once you get the variance:

$$\sigma = \sqrt{V}$$

In [4]:
def std(li: List[float]) -> float:
    return variance(li)**0.5


assert std([1., 1., 1.]) == 0.
assert std([1., 2., 3., 4.]) == 1.25**0.5

### Task 4

**Median**

The median is the middle value in a sorted dataset. If the dataset has an odd number of values, the median is the value at the center. If the dataset has an even number of values, the median is the average of the two middle values.

In [5]:
def median(li: List[float]) -> float:
    list_len = len(li)
    mid = list_len/2 
    if list_len%2 == 0:
        return (li[int(mid)] + li[int(mid)+1] )/ 2
    else:
        return li[int(mid)]
    


assert median([1., 1., 1.]) == 1.
assert median([1., 4., 3., 2.]) == 2.5

## Measure performance

Sometimes, apart from theoretical, algorithmic complexity, it's a good idea to compare the runtime of two algorithms empirically, i.e., run the code many times and time it.

In Python's standard library, we have [timeit](https://docs.python.org/3/library/timeit.html) module that does exactly that.

Let's compare the runtime of your implementations and numpy. Use the provided setup code:

In [32]:
# generate data for tests
setup = '''
import random
import numpy as np

arr = np.random.rand(10_000) * 100
li = [random.random() * 100 for _ in range(10_000)]
'''

# pass your function to timeit module
funcs = {
    'mean': mean,
    'variance': variance,
    'std': std,
    'median': median,
}

print(funcs)

{'mean': <function mean at 0x00000178075ADC60>, 'variance': <function variance at 0x00000178075ADEE0>, 'std': <function std at 0x00000178075ACAE0>, 'median': <function median at 0x00000178075ADDA0>}


### Task 5

Complete Python statements to compare your functions to numpy. Use `li` for your function and `arr` for numpy functions.

In [42]:
stmt_mean_custom = 'mean(li)'
stmt_mean_np = 'np.mean(arr)'

stmt_var_custom = 'variance(li)'
stmt_var_np = 'np.var(arr)'

stmt_std_custom = 'std(li)'
stmt_std_np = 'np.std(arr)'

stmt_median_custom = 'median(li)'
stmt_median_np = 'np.median(arr)'

### Task 6

Measure average exec time of your statements with `timeit` module. As your submission, fill out the table with results (rounded to 2 decimal places)

In [46]:
import timeit

timeit.timeit(stmt=stmt_mean_custom, setup=setup, globals=funcs, number=10_000)

0.6155050999950618

In [45]:
timeit.timeit(stmt=stmt_var_custom, setup=setup, globals=funcs, number=10_000)

13.262132699950598

In [47]:
timeit.timeit(stmt=stmt_std_custom, setup=setup, globals=funcs, number=10_000)

14.079374200082384

In [48]:
timeit.timeit(stmt=stmt_median_custom, setup=setup, globals=funcs, number=10_000)

0.0026636000256985426

In [49]:
timeit.timeit(stmt=stmt_mean_np, setup=setup, globals=funcs, number=10_000)

0.0677512000547722

In [50]:

timeit.timeit(stmt=stmt_var_np, setup=setup, globals=funcs, number=10_000)

0.1977121999952942

In [51]:
timeit.timeit(stmt=stmt_mean_np, setup=setup, globals=funcs, number=10_000)

0.07278580009005964

In [52]:
timeit.timeit(stmt=stmt_var_np, setup=setup, globals=funcs, number=10_000)

0.2677951999939978

In [53]:
timeit.timeit(stmt=stmt_std_np, setup=setup, globals=funcs, number=10_000)

0.2563281999900937

In [54]:
timeit.timeit(stmt=stmt_median_np, setup=setup, globals=funcs, number=10_000)

0.5592418999876827

Time per 10000 executions, secs

| Func       | Custom | Numpy |
| ---------- | ------ | ----- |
| mean       |        |       |
| var        |        |       |
| std        |        |       |
| median     |        |       |