# Statistics: Basic terms that you need to know

<b>Population</b>: an entire group of people, objects or items from which a sample is drawn. In the below example we have a popolation of <b>28 people</b>.<br>
<b>Sample</b>: a group of people, objects, or items that are taken from a larger population for measurement. The sample should be representative of the population. In the below example a sample of <b>5 people</b> is shown.

Descriptive vs. Inferential Statistics:
- Descriptive statistics: presents, organises and summarises data
- Inferential statistics: draw conclusions for a population based on data of a sample

![sample-size-definition.png](attachment:sample-size-definition.png)

Lets assume we want to investigate the age of the people in our population.

In our sample:

Person 1 is 6 years old.<br>
Person 2 is 20 years old.<br>
Person 3 is 76 years old.<br>
Person 4 is 18 years old.<br>
Person 5 is 6 years old.<br>

<b>Mean</b> : is the average of a data set and is found by adding all values in the data set and then dividing by the number of values in the data set.<br>
<b>Median</b> : is the middle value when a data set is ordered from smallest to largest.<br>
<b>Mode</b> :  is the value that occurs most often in a data set.<br>

In [None]:
print(f'The mean is: ( 6 + 20 + 76 + 18 + 6 )/ 5 = {(6 + 20 + 76 + 18 + 6)/5} ')

In [None]:
print(f'The median is: 18 ( 6 <= 6 < 18 < 20 < 76 )')

In [None]:
print(f'The mode is: 6 ')

![unnamed.png](attachment:unnamed.png)

<b>Standard deviation (sd)</b> : an ABSOLUTE measure of the amount of variation or dispersion of values of a data set<br>
<b>Coefficient of variation (cv)</b> : a RELATIVE measure of the amount of variation or dispersion of values of a data set; cv = sd/mean * 100<br>
Question: what is the advantage of cv over sd?

A <b>y% quantile</b> of a distribution provides x, where y% of the data set are equal or lower than x  <br> 75% quantile = upper quartile = 0.75 quantile = 75th percentile  <br> 25% quantile = lower quartile = 0.25 quantile = 25th percentile <br>50% quantile = median = 0.50 quantile = 50th percentile 

![3-s2.0-B9780123814791000022-f02-02-9780123814791.jpg](attachment:3-s2.0-B9780123814791000022-f02-02-9780123814791.jpg)

<b>Random variable (X)</b> : simply a random variable whose value is not defined definitely <br>
<b>Discrete random variable</b> : has a countable number of possible values <br>
Examples: toss a dice (6 possible outcomes), throw a coin <2 possible outcomes <br>
<b>Continuous random variable</b> : does not have a countable number of possible values but still is distinct<br>
Examples: a person's weight and height <br>

## Statistics Library

Statistics is also one of the core libraries for mathematical operations in Python and includes basic statistic functions to calculate metrics such as the mean, median and mode.<br>
(see https://docs.python.org/3/library/statistics.html)

In [None]:
import statistics #always import your libraries!

list_of_age= [6,20,76,18,6]

### Mean, Median and Mode

In [None]:
print('mean age is', statistics.mean(list_of_age))
print('median age is', statistics.median(list_of_age))
print('mode age is',statistics.mode(list_of_age))


#list_of_age.sort()
#print(f'{list_of_age}')

### Exercise 2.1

Investigate how do mean, median and mode change when the sample list is set to list_of_age_2 = [6,20,76,18] <br>
Which values do you expect?

In [None]:
#your code

### Quantile and Standard Derivation

Quantile function can take different input parameters (see documentation):

- _data_ can be any iterable containing sample data
- _n_ divides data into n continuous intervals with equal probability and returns a list of n-1 cut points separating the intervals => n = 4 (default for quantiles)
- _method_ defines whether the data includes or excludes the lowest and highest possible values from the population. Note that default is exclusive!!

In [None]:
#inclusive:
print(f'25%,50% and 75% quantile is: {statistics.quantiles(list_of_age,method="inclusive")}')
print(f'Meaning: 25% of the ages in our sample are equal to or lower than {statistics.quantiles(list_of_age,method="inclusive")[0]}')
print(f'Meaning: 50% of the ages in our sample are equal to or lower than {statistics.quantiles(list_of_age,method="inclusive")[1]}')
print(f'Meaning: 75% of the ages in our sample are equal to or lower than {statistics.quantiles(list_of_age,method="inclusive")[2]}')

#import pandas as pd
#df = pd.DataFrame({"Age":[6,20,76,18,6]})
#print(df.quantile([.1, .25, .5, .75], axis = 0))

In [None]:
print('the standard deviation of age is ',statistics.stdev(list_of_age))
print('the coefficient variation of age in % is ',(statistics.stdev(list_of_age)/statistics.mean(list_of_age))*100)

## Bonus content: Lambda as an anonymous function in Python

Lambda Function, also referred to as ‘Anonymous function’ can have the same functionality as a regular python function but 
- can be defined without a name
- are defined using the lambda keyword instead of def keyword
- are restricted to a single line of expression
- can only contain expressions and can’t include statements (such as assert, return) in its body
- can be immediately invoked (IIFE)


However, a lambda function can take in mutliple parameters as in regular functions

<b>Syntax</b>: lambda _arguments_ : expression

In [None]:
#regular function
def my_reg_function(x):
    y = 3*x + 10
    return y

#lambda function
my_lambda_function = lambda x : 3*x + 10 


In [None]:
print(f'My regular function returns {my_reg_function(5)}')

print(f'My lambda function returns {my_lambda_function(5)}')

but you can also immediately execute a python lamdba function. This is known as <b>Immediately Invoked Function Expression (IIFE)</b>:

In [None]:
(lambda x : 3*x + 10) (5)

Some further examples:

In [None]:
my_function = lambda x, y : x * y + x + y
print(my_function(2, 5))

In [None]:
compare = lambda x, y : x == y
print(compare(2, 5))

In [None]:
is_odd = lambda x : x % 2 == 1
print(is_odd(7))

In [None]:
import math #hint for exercise 2.3
math.log(4,2)

### Excercise 2.2: Rewrite your code from Exercise 1.1 as Lambda function. 
- Return True if the given number is even 

In [None]:
#your solution here

In [None]:
#%load solutions/even_or_odd_lambda.py

### Excercise 2.3: Write a lambda function that takes two integers x , y and returns:
- Return True if x is a power of y (like 9 and 3, 16 and 2, etc.)
- ReturFalse if x is not a power of y (like 10 and 5, 6 and 7, etc.)

In [None]:
#your solution here

In [None]:
#%load solutions/is_power.py