# Key points of functional programming
- Modular code
- Emphasis on functions and models to solve problems
- Avoidance of state and data mutation, instead opting to create new data to avoid side effects

For repeated calculations, you're probably better off abstracting that work away. It is like the multiplication * key character. This is a shorthand way of telling the computer to take the given parameters and adding the however many times. 1 * 90 is essentially taking 1 and running a for loop until the counter is 90. That in itself is a function. Just recall your days in algebra

All you have to do is declare the function's existence, specify it's arguments, and how/what it will return. Remember y = mx + b, you pass the mx + b as arguments and get y back.

1. declare existence of a function
2. delcare its name
3. declare what it takes in (var names will be used locally within the function's operations)
4. in the body of the function, write how it operates (calculates)
5. Return some value

## Statistical Functions
| New Concepts | Description |
| --- | --- |
| Operators e.g., !=, %, +=, \*\* | The operator != tests whether the values on either side of the operator are equal; _a % b_ returns the remainder of $a / b$; _a += b_ sets a equal to $a + b$; _a ** b_ raises a to the b power ($a^b$). |
| Dictionary | A dictionary is a datastructure that uses keys instead of index values. Each unique key references an object linked to that key. |
| Dictionary Methods e.g., _dct.values()_ | dct.values() returns a list of the objects that are referenced by the dictionaries keys.|
| Default Function Values | Function may assume a default value for values passed to it. e.g., _def function(val1 = 0, val2 = 2, …)_ | 

### Average Statistics

# Cumulative Multiplication Function

In [20]:
def cum_mult(numbers):
    answer = 1
    for number in numbers:
        answer *= number
    return answer

In [21]:
numbersLITERALLY = [i for i in range(1, 6)]
numbersLITERALLY

[1, 2, 3, 4, 5]

In [22]:
cum_mult(numbersLITERALLY)

120

If the function allows, you will pass an object by calling it in the parentheses that follow the function name. The first function that we build will be the total() function. We define the function algebraically as the sum of all values in a list of length j:

$\sum_{i=0}^{n-1} x_{i}$

Since lists indices start with the integer 0, we will write our functions as starting with _i = 0_ and process elements to the index of value _n - 1_. Since the range function in Python automatically counts to one less than the value identified, the for-loop used will take the form:

In [23]:
n = 0
total = 0
values = [i for i in range(20)]


# This sucks since you'd have to rewrite this every time
print("Total", "Value")
for value in values:
    total  += value
    print(total, "\t", value)

Total Value
0 	 0
1 	 1
3 	 2
6 	 3
10 	 4
15 	 5
21 	 6
28 	 7
36 	 8
45 	 9
55 	 10
66 	 11
78 	 12
91 	 13
105 	 14
120 	 15
136 	 16
153 	 17
171 	 18
190 	 19


In [24]:
lst = [i for i in range(30)]

In [25]:
def cum_total(numbers, target_index=None):
    total = 0
    cum_totals = []
    for number in numbers:
        total += number
        cum_totals.append(total)
    if target_index is None:
        return cum_totals
    else:
        return cum_totals[target_index]

In [26]:
print(cum_total(lst,9))
print("\n",cum_total(lst))
print("\n",cum_total(i for i in range(0, 29, 3)), type((i for i in range(0, 29, 3))))

45

 [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435]

 [0, 3, 9, 18, 30, 45, 63, 84, 108, 135] <class 'generator'>


### Be more abstract using built-in functions. 

In [27]:
def total(elements):
    total = 0
    totals = []
    for element in elements:
        if isinstance(element, (int, float)):
            total += element
        totals.append(total)
    return totals

In [28]:
lst[5] = "This is a string"
total(lst)

[0,
 1,
 3,
 6,
 10,
 10,
 16,
 23,
 31,
 40,
 50,
 61,
 73,
 86,
 100,
 115,
 131,
 148,
 166,
 185,
 205,
 226,
 248,
 271,
 295,
 320,
 346,
 373,
 401,
 430]

#### Mean


Let $X_1, X_2,...,X_n$ represent $n$ random variables. For a given dataset, useful descriptive statistics of central tendency include mean, median, and mode, which we built as functions in a previous chapter. 

We define the mean of a set of numbers:
$\bar{X} = \frac{\sum_{i=0}^{n-1} x_{i}} {n}$

The **mean** gives the expected value - often denoted $E(X)$ or $\bar{X}$- from a series, $X$, by summing all of the observations and dividing by the number of them. The series may be a sample or may include the full population of interest, in which we would identify the mean by the symbol, $\mu_x$. 

The top part of the function is the same as the notation that represents the sum of a list of numbers. Thus, in mean(), we call total() and divide the result by the length of the list.  Then, we use the function to calculate value and save that value as an object:

# Mean function definition

In [29]:
def mean(numbers):
    return sum(numbers) / len(numbers)

In [30]:
# Random number generator
import random
def gen_rand100(n):
    return [random.randint(0, 100) for i in range(n)]
def gen_rand10(n):
    return [random.randint(0, 10) for i in range(n)]

In [31]:
x1 = gen_rand100(10)
x2 = gen_rand10(15)
x1, x2

([76, 83, 83, 70, 82, 67, 25, 45, 100, 54],
 [6, 9, 2, 10, 6, 5, 7, 1, 2, 3, 8, 2, 2, 2, 0])

In [32]:
mean(x1), mean(x2)

(68.5, 4.333333333333333)

#### Median

The **median** is defined is the middle most number in a list. It is less sensitive to outliers than mean; it is the value in the middle of the dataset. For a series of *odd length* defined by a range [i, n] starting with index $i=0$, the median is $\frac{n}{2}$. 

For a series that is of *even length* but otherwise the same, the median is the mean value of the two values that comprise middle of the list. The indices of these numbers are equal defined: 

$$i_1 = \frac{n + 1}{2}; i_2\frac{n - 1}{2}$$

The median is thus defined:
$$\frac{x_\frac{n + 1}{2}+x_\frac{n-1}{2}}{2}$$

We can restate that:

$$k = x_\frac{n + 1}{2}+x_\frac{n-1}{2}$$

Thus, the median is defined as $\frac{k}{2}$.

# Median Function Definition

In [33]:
def median(numbers):
    sorted_numbers = sorted(numbers)
    length = len(sorted_numbers)
    
    # Two case: odd length or even length
    # Even Case
    
    if length % 2 == 0:
        mid = length // 2
        return (sorted_numbers[mid - 1] + sorted_numbers[mid]) / 2
    # Else, it is odd, just take that middle number
    else:
        mid = length // 2
        return sorted_numbers[mid]

In [34]:
median(x1), median(x2)

(73.0, 3)

# Mode Function Definition

### Pseudocode

1. Declare dict to count numbers
2. for every number in the list:
    3. If that number is in the dict, add 1 to that key value
    4. Otherwise, it hasn't been counted before. Set that key value to 1
5. calculate the maximum among the key values using .values() function
6. return the list of keys corresponding to the max count

In [35]:
def mode(nums):
    # 1
    count = {}
    # 2
    for num in nums:
        # 3
        if num in count:
            count[num] += 1
        else:
        # 4
            count[num] = 1
        #5
    max_count = max(count.values())
    print(count.keys(), count.values())
    #6
    return [num for num in count if count[num] == max_count]

In [36]:
mod1 = [1,1,2,3,3,3,3,4,4,5,5,5,5]
mod2 = [1,1,2,2, 3,3,4,4]

In [37]:
mode(mod1), mode(mod2)

dict_keys([1, 2, 3, 4, 5]) dict_values([2, 1, 4, 2, 4])
dict_keys([1, 2, 3, 4]) dict_values([2, 2, 2, 2])


([3, 5], [1, 2, 3, 4])

#### Variance

Average values do not provide a robust description of the data. An average does not tell us the shape of a distribution. In this section, we will build functions to calculate statistics describing distribution of variables and their relationships. The first of these is the variance of a list of numbers.

We define population variance as:

$$ \sigma^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n}$$

When we are dealing with a sample, which is a subset of a population of observations, then we divide by $n - 1$, the **Degrees of Freedom**, to unbias the calculation. 

$$DoF = n - 1$$

The degrees of freedom is the number of independent observation that go into the estimate of a parameter (sample size $n$), minus the number of parameters used as intermediate steps in the estimation of the parameter itself. So if we estimate $\bar{x}$ once, we estimate value of X using a single parameters. (We will see that we use multiple values to estimate X when we use Ordinarly Least Squares Regression.): 


$$ S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

Next, we build functions that calculate a population's variance and standard deviation. We will include an option for calculating sample variance and sample standard deviation.

# Variance Function Definition

### Pseudocode
1. Get len list of numbers to reduce computation
2. get the mean
3. calcualte variance and use n-1 if sample == true (1) else use n

In [38]:
def variance(nums, sample = True):
    n = len(nums)
    mean = sum(nums) / n
    
    # If sample is set to truer then use n-1 else n (since obviously it is false)
    var = sum((i - mean)**2 for i in nums) / ((n - 1) if sample == 1 else n)
    return var

In [39]:
variance((1, 2, 3, 4, 5, 6, 7, 8), sample = True), variance((1, 2, 3, 4, 5, 6, 7, 8), sample = False)

(6.0, 5.25)

In [40]:
int(True)

1

#### Standard Deviation

From a list’s variance, we calculate its standard deviation as the square root of the variance. Standard deviation is regularly used in data analysis, primarily because it has the same units of measurement as the mean. It corrects the squaring of individual observations deviations from the mean done when calculating variance. It is denoted $s$ when working with a sample with an unknown population mean $\mu$. $s$ is an _estimator_ of $\sigma$, which is standard deviation when $\mu$ is known: 

$s = \sqrt{S^2}$

This is true for both the population and sample standard deviations. The function and its employment are listed below:

# Standard Deviation Function

### Pseudocode
1. get length of list of numbers
2. get mean
3. calculate variance
4. take variance to 1/2 powern which is the same as square root

In [41]:
def stddev(nums, sample = True):
    stddev = variance(nums, sample) ** (1/2)
    return stddev

In [42]:
random_ints = gen_rand100(300)

In [43]:
stddev(random_ints, sample = False), stddev(random_ints, sample = True)

(29.138059113278103, 29.186744291945857)

# Is random really all that random?

In [44]:
def listgen(num_lists_wanted):
    randomints = {}
    for i in range(num_lists_wanted):
        randomints[i] = gen_rand100(100)
    return randomints

In [45]:
listgendict = listgen(100)

In [46]:
listgendict

{0: [5,
  50,
  3,
  23,
  35,
  94,
  78,
  10,
  42,
  74,
  35,
  61,
  69,
  50,
  72,
  58,
  96,
  61,
  60,
  85,
  35,
  27,
  4,
  73,
  17,
  20,
  45,
  94,
  81,
  78,
  98,
  74,
  32,
  73,
  2,
  4,
  32,
  91,
  74,
  88,
  10,
  50,
  89,
  10,
  26,
  3,
  36,
  12,
  99,
  38,
  50,
  12,
  31,
  92,
  97,
  65,
  2,
  69,
  56,
  56,
  82,
  50,
  35,
  22,
  12,
  19,
  97,
  53,
  21,
  14,
  11,
  0,
  50,
  8,
  51,
  57,
  71,
  63,
  95,
  52,
  100,
  90,
  67,
  92,
  48,
  17,
  42,
  30,
  73,
  3,
  62,
  78,
  88,
  56,
  26,
  91,
  97,
  61,
  74,
  68],
 1: [14,
  1,
  15,
  78,
  96,
  60,
  32,
  63,
  72,
  26,
  96,
  97,
  63,
  82,
  32,
  23,
  13,
  93,
  52,
  23,
  76,
  95,
  93,
  1,
  89,
  34,
  46,
  54,
  56,
  1,
  26,
  74,
  39,
  46,
  67,
  79,
  49,
  63,
  31,
  38,
  81,
  41,
  49,
  45,
  94,
  14,
  31,
  13,
  11,
  57,
  24,
  45,
  60,
  1,
  81,
  93,
  91,
  47,
  99,
  1,
  87,
  83,
  54,
  82,
  14,
  62,
  20,
  23,

In [47]:
randomornah = {}
for i in listgendict:
    print(stddev(listgendict[i]))
    randomornah[i] = stddev(listgendict[i])
    
var(randomornah)

30.50149193237948
28.61615705234448
28.562983744770797
29.816103031751155
28.2761204888328
29.140488685128123
28.31200444969582
28.791993485844095
29.138568283568112
30.020967420262874
30.63041335376038
28.461024208378365
29.138471220201403
29.11491686126518
28.039616634506633
28.95628191992651
28.267488552945583
27.073226552437298
29.198400053026294
26.518718374808852
29.859001992698957
29.606380693657535
29.658078430026727
29.25626238754515
29.294221514513982
29.769444032090995
27.636330575471984
28.258937980001576
29.16673766225125
32.98576368769724
28.884785062049676
29.633687832354834
26.425842493951855
29.34513536681965
30.024730547407827
30.506331517018932
28.418252421763558
28.55598789123648
29.03960374830528
28.691199861542028
29.947217877222897
28.374835913733733
28.268739203608213
31.196469302946717
28.59258221245079
30.20778379855434
27.81783999922032
28.460080140719242
28.700506805788127
28.185122158882173
26.92094674667616
30.685968892454525
27.82864413411897
30.606852074

NameError: name 'var' is not defined

In [48]:
def dictvar(dict, sample=True):
    n = len(dict)
    mean = sum(dict.values()) / n
    variance = sum((i - mean)**2 for i in dict.values()) / (n - 1 if sample else n)
    return variance

In [49]:
dictvar(randomornah)

1.5577109331727004

In [50]:
r = random.random()

In [51]:
random.seed(1)

random.random()

0.13436424411240122

In [52]:
random.seed(1)
random.random()

0.13436424411240122

In [53]:
random.seed(random.random())
random.random()

0.13403453044814118

In [54]:
random.seed(random.random())
random.random()

0.17399915189709225

### Standard Error

Next, we will calculate the **standard error** of the sample mean. This describes how likely a given random sample mean $\bar{x_i}$ is to deviate from the population mean $\mu$. It is the standard deviation of the probability distribution for the random variable $\bar{X}$, which represents all possible samples of a single given sample size $n$. As $n$ increases, $\bar{X}$ can be expected to deviate less from $\mu$, so standard error decreases. Because population standard deviation $\sigma$ is rarely given, we again use an _estimator_ for standard error, denoted $s_\bar{x}$. Populational data has no standard error as $\mu$ can only take on a single value. 

As n increases, stddeviation from population mean should decrease and vice versa.

In [58]:
def stderr(lst, sample = True):
    nums = len(lst)
    return stddev(lst,sample) / nums ** (1/2)

In [59]:
print(stderr(x1), stderr(x2))

6.9205812215770175 0.8145502215898436


### What's left? 

##### Covariance, correlation, skewness and kurtosis. 

Covariance measures the average relationship between two variables. Correlation normalizes the covariance statistic a fraction between 0 and 1.

To calculate covariance, we multiply the sum of the product of the difference between the observed value and the mean of each list for value _i = 1_ through _n = number of observations_:

$cov_{pop}(x,y) = \frac{\sum_{i=0}^{n-1} (x_{i} - x_{mean})(y_{i} - y_{mean})} {n}$

We pass two lists through the covariance() function. As with the _variance()_ and _SD()_ functions, we can take the sample-covariance.

$cov_{sample}(x,y) = \frac{\sum_{i=0}^{n-1} (x_{i} - x_{mean})(y_{i} - y_{mean})} {n - 1}$

In order for covariance to be calculated, it is required that the lists passed to the function are of equal length. So we check this condition with an if statment:

In [79]:
def covariance(x, y, population=True):
    nums = len(x)
    if nums != len(y) or nums <= 1:
        return 0.0
    
    # Calculate the means of x and y
    mean_x = sum(x) / nums
    mean_y = sum(y) / nums
    
    # Calculate the covariance
    cov = 0.0
    for i in range(nums):
        cov += (x[i] - mean_x) * (y[i] - mean_y)
    if population:
        cov /= nums
    else:
        cov /= (nums - 1)
    
    return cov

In [82]:

X_1 = [3, 6, 9, 12, 15,18,21,24,27,30]
X_2 = [10, 56, 34, 47, 41, 54, 95, 67, 69, 98]
print(covar(X_1, X_2))

82.5
