# Chapter 4: Functional Programming: Rudimentary Statistics and Analytics

| New Concepts | Description |
| --- | --- |
| _return obj_ (from function) | Functions may return an object to be saved if a variable is defined by the function i.e., var1 = function(obj1, obj2, . . .)|

In [3]:
def function_name(object1, object2, . . ., objectn):
    <operations>

SyntaxError: invalid syntax (832649975.py, line 1)

### Total 

$\sum_{i=0}^{n-1} x_{i}$

In [10]:
n = 0
total = 0
values = [i for i in range(10)]

print("total\t","value")
for value in values:
    total += value
    print(total,"\t", value)

total	 value
0 	 0
1 	 1
3 	 2
6 	 3
10 	 4
15 	 5
21 	 6
28 	 7
36 	 8
45 	 9


In [11]:
# Don't keep copying and pasting old code...

In [12]:
def total(lst):
    total_ = 0
    # in original I used the index of the list
    # . . . 
    # n = len(lst)
    # for i in range(n)
    for val in lst:
        total_ += val
    return total_
total(values)

45

In [13]:
total([i for i in range(-1000,100000,53)])

94313645

In [14]:
import random
X1 = [3, 6, 9, 12, 15,18,21,24,27,30]
X2 = [random.randint(0,100) for i in range(10)]
total(X1), total(X2)

(165, 593)

### Mean

Let $X_1, X_2,...,X_n$ represent $n$ values from a random variables. For a given dataset, useful descriptive statistics of central tendency include mean, median, and mode, which we built as functions in a previous chapter. 

We define the mean of a set of numbers:
$\bar{X} = \frac{\sum_{i=0}^{n-1} x_{i}} {n}$

In [18]:
def mean(lst):
    n = len(lst)
    mean_ = total(lst) / n
    return mean_
mean(X1), mean(X2)

(16.5, 59.3)

Now let's build the rest of the summary statistical functions
1. median
2. mode
3. variance
4. standard deviation
5. standard error
6. covariance
7. correlation

### Median

In [38]:
def median(lst):
    n = len(lst)
    lst = sorted(lst)
    
    # two cases: 
    # 1. list of odd length
    # i % j checks for remainder upon dividing i by j
    if n % 2 != 0:
        middle_index = int((n - 1) / 2)
        median_ = lst[middle_index]
     # 2. list of even length  
    else:
        upper_middle_index = int(n / 2)
        lower_middle_index = upper_middle_index - 1
        # pass slice with two middle values to mean()
        print(lst[lower_middle_index: upper_middle_index + 1])
        median_ = mean(lst[lower_middle_index: upper_middle_index + 1])
    return median_

print(X1)
median(X1), median(X2)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
[15, 18]
[55, 62]


(16.5, 58.5)

In [40]:
# transform X1 to be of odd length by removing the last index
# this is to test the first case int he median() function
median(X1[:-1])

15

In [24]:
sorted(X2)

[29, 31, 32, 35, 55, 62, 78, 85, 90, 96]

### Mode

In [56]:
def mode(lst):
    count_dct = {}
    # create entries for each value with 0
    for key in lst:
        count_dct[key] = 0
    # add up each occurence
    for key in lst:
        count_dct[key] += 1
    # calculate max_count up front
    max_count = max(count_dct.values())
    # now we can compare each count to the max count
    mode_ = []
    for key, count in count_dct.items():
        if count == max_count:
            mode_.append(key)
            # mode always returns as a list type
    
    return mode_

lst = [1,1,1,1,1,1,2,3,4,5,5,5,5,5,5,242]
mode(lst)

[1, 5]

In [58]:
if 1 == 2:
    print("run code")

In [60]:
if 1 == 1:
    print("run code")

run code


### Variance


$$ \sigma^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n}$$

When we are dealing with a sample, which is a subset of a population of observations, then we divide by $n - 1$, the **Degrees of Freedom**, to unbias the calculation. 

$$DoF = n - 1$$


$$ S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$


In [61]:
# variance is the attempt to measure the 

In [71]:
def variance(lst, sample = True):
    list_mean = mean(lst)
    n = len(lst)
    DoF = n - 1
    sum_sq_diff = 0
    
    for val in lst:
        diff = val - list_mean
        sum_sq_diff += (diff) ** 2
        # print(val, list_mean, diff, sum_sq_diff)
    if sample == False:
        variance_ = sum_sq_diff / n
    else:
        variance_ = sum_sq_diff / DoF
    return variance_

        
variance(X1, sample = True), variance(X1, sample = False)

(82.5, 74.25)

In [72]:
variance(X2, sample = True), variance(X2, sample = False)

(708.9, 638.01)

In [74]:
def SD(lst, sample = True):
    SD_ = variance(lst, sample) ** (1/2)
    return SD_
SD(X1, sample = True), SD(X1, sample = False)

(9.082951062292475, 8.616843969807043)

In [75]:
SD(X2, sample = True), SD(X2, sample = False)

(26.62517605575595, 25.258859831750126)