# Key points of functional programming
- Modular code
- Emphasis on functions and models to solve problems
- Avoidance of state and data mutation, instead opting to create new data to avoid side effects

For repeated calculations, you're probably better off abstracting that work away. It is like the multiplication * key character. This is a shorthand way of telling the computer to take the given parameters and adding the however many times. 1 * 90 is essentially taking 1 and running a for loop until the counter is 90. That in itself is a function. Just recall your days in algebra

All you have to do is declare the function's existence, specify it's arguments, and how/what it will return. Remember y = mx + b, you pass the mx + b as arguments and get y back.

1. declare existence of a function
2. delcare its name
3. declare what it takes in (var names will be used locally within the function's operations)
4. in the body of the function, write how it operates (calculates)
5. Return some value

## Statistical Functions
| New Concepts | Description |
| --- | --- |
| Operators e.g., !=, %, +=, \*\* | The operator != tests whether the values on either side of the operator are equal; _a % b_ returns the remainder of $a / b$; _a += b_ sets a equal to $a + b$; _a ** b_ raises a to the b power ($a^b$). |
| Dictionary | A dictionary is a datastructure that uses keys instead of index values. Each unique key references an object linked to that key. |
| Dictionary Methods e.g., _dct.values()_ | dct.values() returns a list of the objects that are referenced by the dictionaries keys.|
| Default Function Values | Function may assume a default value for values passed to it. e.g., _def function(val1 = 0, val2 = 2, …)_ | 

### Average Statistics

# Cumulative Multiplication Function

In [4]:
def cum_mult(numbers):
    answer = 1
    for number in numbers:
        result *= number
    return answer

In [5]:
numbersLITERALLY = [i for i in range(1, 6)]
numbersLITERALLY

[1, 2, 3, 4, 5]

In [6]:
cumulative_multiply(numbersLITERALLY)

NameError: name 'cumulative_multiply' is not defined

If the function allows, you will pass an object by calling it in the parentheses that follow the function name. The first function that we build will be the total() function. We define the function algebraically as the sum of all values in a list of length j:

$\sum_{i=0}^{n-1} x_{i}$

Since lists indices start with the integer 0, we will write our functions as starting with _i = 0_ and process elements to the index of value _n - 1_. Since the range function in Python automatically counts to one less than the value identified, the for-loop used will take the form:

In [7]:
n = 0
total = 0
values = [i for i in range(20)]


# This sucks since you'd have to rewrite this every time
print("Total", "Value")
for value in values:
    total  += value
    print(total, "\t", value)

Total Value
0 	 0
1 	 1
3 	 2
6 	 3
10 	 4
15 	 5
21 	 6
28 	 7
36 	 8
45 	 9
55 	 10
66 	 11
78 	 12
91 	 13
105 	 14
120 	 15
136 	 16
153 	 17
171 	 18
190 	 19


In [8]:
lst = [i for i in range(30)]

In [9]:
def cum_total(numbers, target_index=None):
    total = 0
    cum_totals = []
    for number in numbers:
        total += number
        cum_totals.append(total)
    if target_index is None:
        return cum_totals
    else:
        return cum_totals[target_index]

In [10]:
print(cum_total(lst,9))
print("\n",cum_total(lst))
print("\n",cum_total(i for i in range(0, 29, 3)), type((i for i in range(0, 29, 3))))

45

 [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435]

 [0, 3, 9, 18, 30, 45, 63, 84, 108, 135] <class 'generator'>


### Be more abstract using built-in functions. 

In [11]:
def total(elements):
    total = 0
    totals = []
    for element in elements:
        if isinstance(element, (int, float)):
            total += element
        totals.append(total)
    return totals

In [12]:
lst[5] = "This is a string"
total(lst)

[0,
 1,
 3,
 6,
 10,
 10,
 16,
 23,
 31,
 40,
 50,
 61,
 73,
 86,
 100,
 115,
 131,
 148,
 166,
 185,
 205,
 226,
 248,
 271,
 295,
 320,
 346,
 373,
 401,
 430]

#### Mean


Let $X_1, X_2,...,X_n$ represent $n$ random variables. For a given dataset, useful descriptive statistics of central tendency include mean, median, and mode, which we built as functions in a previous chapter. 

We define the mean of a set of numbers:
$\bar{X} = \frac{\sum_{i=0}^{n-1} x_{i}} {n}$

The **mean** gives the expected value - often denoted $E(X)$ or $\bar{X}$- from a series, $X$, by summing all of the observations and dividing by the number of them. The series may be a sample or may include the full population of interest, in which we would identify the mean by the symbol, $\mu_x$. 

The top part of the function is the same as the notation that represents the sum of a list of numbers. Thus, in mean(), we call total() and divide the result by the length of the list.  Then, we use the function to calculate value and save that value as an object:

# Mean function definition

In [13]:
def mean(numbers):
    return sum(numbers) / len(numbers)

In [14]:
# Random number generator
import random
def gen_rand100(n):
    return [random.randint(0, 100) for i in range(n)]
def gen_rand10(n):
    return [random.randint(0, 10) for i in range(n)]

In [15]:
x1 = gen_rand100(10)
x2 = gen_rand10(15)
x1, x2

([18, 53, 72, 71, 89, 9, 53, 19, 41, 91],
 [9, 3, 7, 8, 9, 6, 4, 1, 5, 7, 2, 2, 4, 0, 3])

In [16]:
mean(x1), mean(x2)

(51.6, 4.666666666666667)

#### Median

The **median** is defined is the middle most number in a list. It is less sensitive to outliers than mean; it is the value in the middle of the dataset. For a series of *odd length* defined by a range [i, n] starting with index $i=0$, the median is $\frac{n}{2}$. 

For a series that is of *even length* but otherwise the same, the median is the mean value of the two values that comprise middle of the list. The indices of these numbers are equal defined: 

$$i_1 = \frac{n + 1}{2}; i_2\frac{n - 1}{2}$$

The median is thus defined:
$$\frac{x_\frac{n + 1}{2}+x_\frac{n-1}{2}}{2}$$

We can restate that:

$$k = x_\frac{n + 1}{2}+x_\frac{n-1}{2}$$

Thus, the median is defined as $\frac{k}{2}$.

# Median Function Definition

In [17]:
def median(numbers):
    sorted_numbers = sorted(numbers)
    length = len(sorted_numbers)
    
    # Two case: odd length or even length
    # Even Case
    
    if length % 2 == 0:
        mid = length // 2
        return (sorted_numbers[mid - 1] + sorted_numbers[mid]) / 2
    # Else, it is odd, just take that middle number
    else:
        mid = length // 2
        return sorted_numbers[mid]

In [18]:
median(x1), median(x2)

(53.0, 4)

# Mode Function Definition

### Pseudocode

1. Declare dict to count numbers
2. for every number in the list:
    3. If that number is in the dict, add 1 to that key value
    4. Otherwise, it hasn't been counted before. Set that key value to 1
5. calculate the maximum among the key values using .values() function
6. return the list of keys corresponding to the max count

In [19]:
def mode(nums):
    # 1
    count = {}
    # 2
    for num in nums:
        # 3
        if num in count:
            count[num] += 1
        else:
        # 4
            count[num] = 1
        #5
    max_count = max(count.values())
    print(count.keys(), count.values())
    #6
    return [num for num in count if count[num] == max_count]

In [20]:
mod1 = [1,1,2,3,3,3,3,4,4,5,5,5,5]
mod2 = [1,1,2,2, 3,3,4,4]

In [21]:
mode(mod1), mode(mod2)

dict_keys([1, 2, 3, 4, 5]) dict_values([2, 1, 4, 2, 4])
dict_keys([1, 2, 3, 4]) dict_values([2, 2, 2, 2])


([3, 5], [1, 2, 3, 4])

#### Variance

Average values do not provide a robust description of the data. An average does not tell us the shape of a distribution. In this section, we will build functions to calculate statistics describing distribution of variables and their relationships. The first of these is the variance of a list of numbers.

We define population variance as:

$$ \sigma^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n}$$

When we are dealing with a sample, which is a subset of a population of observations, then we divide by $n - 1$, the **Degrees of Freedom**, to unbias the calculation. 

$$DoF = n - 1$$

The degrees of freedom is the number of independent observation that go into the estimate of a parameter (sample size $n$), minus the number of parameters used as intermediate steps in the estimation of the parameter itself. So if we estimate $\bar{x}$ once, we estimate value of X using a single parameters. (We will see that we use multiple values to estimate X when we use Ordinarly Least Squares Regression.): 


$$ S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

Next, we build functions that calculate a population's variance and standard deviation. We will include an option for calculating sample variance and sample standard deviation.

# Variance Function Definition

### Pseudocode
1. Get len list of numbers to reduce computation
2. get the mean
3. calcualte variance and use n-1 if sample == true (1) else use n

In [22]:
def variance(nums, sample = True):
    n = len(nums)
    mean = sum(nums) / n
    
    # If sample is set to truer then use n-1 else n (since obviously it is false)
    var = sum((i - mean)**2 for i in nums) / ((n - 1) if sample == 1 else n)
    return var

In [23]:
variance((1, 2, 3, 4, 5, 6, 7, 8), sample = True), variance((1, 2, 3, 4, 5, 6, 7, 8), sample = False)

(6.0, 5.25)

In [24]:
int(True)

1

#### Standard Deviation

From a list’s variance, we calculate its standard deviation as the square root of the variance. Standard deviation is regularly used in data analysis, primarily because it has the same units of measurement as the mean. It corrects the squaring of individual observations deviations from the mean done when calculating variance. It is denoted $s$ when working with a sample with an unknown population mean $\mu$. $s$ is an _estimator_ of $\sigma$, which is standard deviation when $\mu$ is known: 

$s = \sqrt{S^2}$

This is true for both the population and sample standard deviations. The function and its employment are listed below:

# Standard Deviation Function

### Pseudocode
1. get length of list of numbers
2. get mean
3. calculate variance
4. take variance to 1/2 powern which is the same as square root

In [67]:
def stddev(nums, sample = True):
    stddev = variance(nums, sample) ** (1/2)
    return stddev

In [68]:
random_ints = gen_rand100(300)

In [69]:
stddev(random_ints, sample = False), stddev(random_ints, sample = True)

(29.550343295618216, 29.599717337226394)

# Is random really all that random?

In [76]:
def listgen(num_lists_wanted):
    randomints = {}
    for i in range(num_lists_wanted):
        randomints[i] = gen_rand100(100)
    return randomints

In [84]:
listgendict = listgen(100)

In [85]:
listgendict

{0: [71,
  87,
  87,
  23,
  51,
  83,
  30,
  41,
  30,
  53,
  68,
  75,
  90,
  29,
  97,
  77,
  60,
  50,
  66,
  12,
  49,
  61,
  48,
  4,
  56,
  88,
  37,
  1,
  13,
  30,
  4,
  68,
  62,
  89,
  23,
  85,
  17,
  2,
  18,
  21,
  9,
  44,
  5,
  12,
  47,
  15,
  28,
  34,
  31,
  23,
  16,
  42,
  39,
  33,
  96,
  39,
  73,
  50,
  44,
  57,
  4,
  74,
  9,
  99,
  92,
  37,
  88,
  93,
  2,
  45,
  51,
  97,
  56,
  47,
  45,
  23,
  52,
  52,
  73,
  21,
  31,
  53,
  35,
  34,
  8,
  50,
  0,
  74,
  6,
  3,
  45,
  10,
  31,
  26,
  44,
  59,
  53,
  22,
  82,
  100],
 1: [12,
  54,
  86,
  4,
  51,
  6,
  57,
  18,
  41,
  95,
  8,
  6,
  32,
  6,
  37,
  98,
  17,
  23,
  8,
  83,
  93,
  24,
  69,
  99,
  11,
  59,
  61,
  43,
  28,
  13,
  29,
  39,
  31,
  68,
  96,
  54,
  23,
  49,
  11,
  14,
  97,
  62,
  17,
  10,
  89,
  12,
  89,
  28,
  4,
  6,
  61,
  30,
  17,
  74,
  1,
  79,
  62,
  68,
  76,
  64,
  24,
  78,
  3,
  77,
  13,
  23,
  67,
  41,
  86,
 

In [94]:
randomornah = {}
for i in listgendict:
    print(stddev(listgendict[i]))
    randomornah[i] = stddev(listgendict[i])
    
var(randomornah)

28.24127468723032
29.7238128527457
28.890030086239747
31.792056373956825
29.071925782219324
29.117380002016287
26.834568012249427
27.30170194865018
29.244293235923596
28.410872509792377
29.19484526246011
28.72193403915369
28.60702746183217
29.05913684787341
30.20321074644117
29.952654559346076
28.2127899214437
27.091276690611547
29.763206897921545
31.000637008769598
30.07534478000815
30.40269126843274
27.652042502235798
29.502232409967174
30.171996850380882
28.028960492009446
28.997420644609136
28.73828087897375
29.49897276074869
27.9969316500609
30.28523659401649
29.02659699884459
30.82596799826898
28.80087681044632
29.108051925003274
29.77377837607915
28.487550000842553
30.25755436183823
28.565890516665164
30.01146413952475
28.220481136145708
29.293526709338142
28.732334933086644
27.621926219596055
31.04892847742028
26.020774528647436
29.650891277250903
29.925017404617606
30.26332080024573
29.076595121134513
28.595832146689066
30.398491456456888
30.045210041945893
30.09148340872171
3

NameError: name 'var' is not defined

In [112]:
def dictvar(dict, sample=True):
    n = len(dict)
    mean = sum(dict.values()) / n
    variance = sum((i - mean)**2 for i in dict.values()) / (n - 1 if sample else n)
    return variance

In [113]:
dictvar(randomornah)

1.348077675311782

In [114]:
r = random.random()

In [102]:
random.seed(1)

random.random()

0.13436424411240122

In [104]:
random.seed(1)
random.random()

0.13436424411240122

In [108]:
random.seed(random.random())
random.random()

0.3774943635453354

In [109]:
random.seed(random.random())
random.random()

0.9088270083827942