# NOTEBOOK 3 Statistics with Python
---

Today data analysis heavily relies on computers. Many statistical parameters that describe your data (like the mean and the standard deviation) are very cumbersome to compute by hand, but very easily and quickly evaluated by a computer. We will look at how to generate random numbers, and how to perform some basic statistical analysis.

## Random number generation

As the name implies, a random number generator creates random numbers. Python has its own built-in random module (random), but we will use the much more versatile numpy random generators. To access these you use an instance of the `default_rng` class. It is not important at this moment to know what a class and instance is. This will be discussed later in the course. The `default_rng` calss is part of the numpy random module:

In [None]:
import numpy as np 

# create instance of the random number generator
rng = np.random.default_rng()

You can use `dir()` to see all methods of `rng`:

In [None]:
print(dir(rng))  # print is not required but gives a bit more compact output

For example we write code that draws 10 numbers from a normal distribution with an average of $\mu=100$ and a standard deviation of $\sigma=15$.

In [None]:
data = rng.normal(loc=100, scale=15, size=10)
print(data)

The variable `data` is a numpy array and contains 10 random numbers from the normal distribution.

---
**Assigment**

Simulate the throw with a normal dice. Define a variable `throw` that contains 25 integers chosen randomly on the interval [1,6].  

HINT: Use the `integers` method of `rng`.  
HINT: Use help() or Contextual help to check how the method is used.

In [2]:
import numpy as np
np.random.randint(6, size=25)+1

data = rng.integers([1,2,3,4,5,6])
print(data) print

NameError: name 'rng' is not defined

In [None]:
# =============== YOUR CODE GOES HERE =================


## Statistical functions

Numpy an Scipy have quite a few useful functions that help to describe your data using statistical parameters. The most important ones are summarized in table:

|function|description|
|---|---|
|`np.max(x)`|Returns the largest value in array `x`|
|`np.min(x)`|Returns the smallest value in array `x`|
|`np.mean(x)`|Returns the mean of array `x`|
|`np.std(x, ddof=1)`|Returns the ***sample*** standard deviation of array `x`|
|`np.std(x, ddof=0)`|Returns the standard deviation of array `x`|
|`np.sum(x)`|Returns the sum of all values in array `x`|
|`len(x)` or `np.size(x)`|Returns the number of values in array `x`|

---
**Assigment**

For the two variables `data` and `throws` compute:
- the maximun and minimum value
- the mean
- the sample standard deviation

In [None]:
# =============== YOUR CODE GOES HERE =================



---
**Assigment**

We use the thin lens equation to compute the focal distance $f$ given 400 measured values for the object $v$ and image distance $b$:

$$\frac{1}{f} = \frac{1}{v} + \frac{1}{b}$$

First we define the sample of measured values by drawing them from a normal distribution:
Write code that:
- creates a numpy array `b` with 400 values drawn from a normal distribution with mean 31.5 cm and standard deviation 1.2 cm
- creates a numpy array `v` with 400 values drawn from a normal distribution with mean 46.0 cm and standard deviation 0.8 cm
- computes the mean and standard deviation of the samples `b` and `v`. 

In [None]:
# =============== YOUR CODE GOES HERE =================



We compute the focal distance for each measured pair of object and image distance (so we get 400 focal distances). Write code that:
- computes $f$ and stores in array `f`. (so `f` has 400 values)
- computes the mean of `f`
- computes the sample standard deviation
- computes the standard error of the mean of $f$

In [None]:
# =============== YOUR CODE GOES HERE =================

