# ***Randomness and Reproducibility***

As we learned in the beginning of this week, the concept of randomness is a cornerstone for statistical inference when drawing samples from larger populations.

In this tutorial, we are going to cover the following:

* Randomness and its uses in python.

* Utilizing `python seeds` to reproduce analysis.

* `Generating random variables` from a probability distribution.

* Random sampling from a population.


$ \ $

----

# ***What is Randomness?***

In the beginning of this week's lectures, we touched on the significance of randomness when it comes to performing statistical inference on population samples.  If we have complete randomness, our estimates of means, proportions, and totals are unbiased.  This means our estimates are equal to the population values on average. 

* In Python, we refer to randomness as the ability to generate data, strings, or, more generally, numbers at random.

* However, when conducting analysis it is important to consider reproducibility.  If we are creating random data, how can we enable reproducible analysis?

* We do this by utilizing `pseudo-random number generators (PRNGs)`.  PRNGs start with a random number, known as the seed (semilla), and then use an algorithm to generate a psuedo-random sequence based on it. This means that we can replicate the output of a random number generator in python simply by knowing which seed was used.

We can showcase this by using the functions in the python library *__random__*.


In [5]:
import random
import numpy as np

In [None]:
help(random.seed)

Help on method seed in module random:

seed(a=None, version=2) method of random.Random instance
    Initialize internal state from hashable object.
    
    None or no argument seeds from current time or from an operating
    system specific randomness source if available.
    
    If *a* is an int, all bits are used.
    
    For version 2 (the default), all of the bits are used if *a* is a str,
    bytes, or bytearray.  For version 1 (provided for reproducing random
    sequences from older versions of Python), the algorithm for str and
    bytes generates a narrower range of seeds.



In [None]:
help(random.random) 

Help on built-in function random:

random(...) method of random.Random instance
    random() -> x in the interval [0, 1).



In [None]:
random.seed(1234)
random.random()

0.9664535356921388

In [None]:
random.seed(4927)
random.random()

0.03277783697251979

In [None]:
np.random.seed(4927)
np.random.random()

0.10007637631516819

In [None]:
np.random.seed(1234)
np.random.random()

0.1915194503788923

In [None]:
# si conocemos la semilla, entonces de cierta manera no hay aleatoriedad.
np.random.seed(121)
np.random.choice(1500,18) 

array([1346,  469, 1288,   65,  339,  608,   46,   34,  766,  948,   60,
         54,  990,   52, 1081, 1099, 1007,  486])

In [None]:
# aqui nos damos cuenta que al parecer obtenemos valores diferentes y hay algo de regularidad en esta aleatoriedad
print(np.random.choice(1500,18))
print("-----------------------------------------------------------------------------------------------")
print(np.random.choice(1500,18))

[ 295  789  297  823  979  888 1331  756   10 1411 1348  503 1111  990
  542  494 1444  260]
-----------------------------------------------------------------------------------------------
[ 937  636   88  204  528 1127  903  575  385  691 1314  381    1  228
 1479   13 1038  512]


$ \ $

-----

# ***Random Numbers from Real-Valued Distributions***
## ***`Uniform`***

In [None]:
help(random.uniform)

Help on method uniform in module random:

uniform(a, b) method of random.Random instance
    Get a random number in the range [a, b) or [a, b] depending on rounding.



In [None]:
random.uniform(25,50)

38.140304771180496

In [None]:
unif_Numbers = [random.uniform(0,1) for x in range(1000)]
unif_Numbers[0:10]

[0.17923224274358018,
 0.11485465854937504,
 0.524316743391649,
 0.3613157761210012,
 0.09928392168667677,
 0.33436188027971414,
 0.6003584124573883,
 0.19524005357522511,
 0.6373200399623756,
 0.2031530729617287]

$ \ $

------

# ***Normal***

En esta parte, veremos el uso de la variable aleatoria normal en el lenguaje de programacion Python.

In [None]:
mu = 0

sigma = 1

random.normalvariate(mu, sigma)

-1.2793468408805477

In [None]:
mu = 0

sigma = 1

np.random.normal(mu, sigma)

-0.27655115201838254

In [None]:
mu = 5

sigma = 2

random.normalvariate(mu, sigma)

6.034439249303812

In [None]:
mu = 5

sigma = 2

np.random.normal(mu, sigma)

6.05594018735044

In [None]:
mu = 0

sigma = 1

lista_1=[random.normalvariate(mu, sigma) for x in range(10000)]
lista_1[0:10]

[0.478367500187624,
 -1.9533710216263718,
 -0.8451339272722066,
 -0.9157328657396834,
 1.094441198428395,
 -0.4657100067259372,
 -0.2517687187176313,
 0.7824321164029229,
 -1.5312511327902152,
 0.11602770495427425]

In [None]:
mu = 0

sigma = 1

lista_1=[np.random.normal(mu, sigma) for x in range(10000)]
lista_1[0:10]

[-2.3577179647885136,
 -0.3068966781052552,
 -0.251079429944957,
 -0.6067961240318827,
 0.4154987762091236,
 -1.3059273735949106,
 -0.3131878989802593,
 0.3551051509422392,
 0.46204450272376185,
 -0.6402871209865286]

$ \ $

-------


# ***Random Sampling from a Population***

From lecture, we know that **Simple Random Sampling (SRS)** has the following properties:

* Start with known list of *N* population units, and randomly select *n* units from the list
* Every unit has **equal probability of selection = _n/N_**
* All possible samples of size *n* are equaly likely
* Estimates of means, proportions, and totals based on SRS are **UNBIASED** (meaning they are equal to the population values on average)

In [None]:
import random
import numpy as np

In [None]:
mu = 0    
sigma = 1

Population = [random.normalvariate(mu, sigma) for x in range(10000)]
Population[0:5]

[-1.474792843060049,
 -0.671047788196512,
 0.9541047272293272,
 -0.8703878013921571,
 -0.11623433754975275]

In [None]:
Sample_A = random.sample(Population, 500)
Sample_B = random.sample(Population, 500)

In [None]:
np.mean(Sample_A) 

-0.005047073275017123

In [None]:
np.mean(Sample_B)

-0.08429097214151754

In [None]:
np.std(Sample_A)

1.0129150373601776

In [None]:
np.std(Sample_B)

1.0158254176187758

In [None]:
means = [np.mean(random.sample(Population, 1000)) for x in range(100)]

np.mean(means)

-0.002442996527135533

In [None]:
standar_dev = [np.std(random.sample(Population, 1000)) for x in range(100)]

np.std(standar_dev)

0.017524384716284484

$ \ $

------

"RB Leipzig":1

"Manchester City":2

"Brujas":3

"Benfica":4

"Liverpool":5

"Real Madrid":6

"AC Milan":7

"Tottenham":8

"Eintracht Frankfurt":9

"Nápoles":10

"Borussia Dortmund":11

"Chelsea":12

"Inter de Milán":13

"Porto":14

"PSG":15

"Bayern Munich":16



In [65]:
equipos={"RB Leipzig","Manchester City","Brujas", "Benfica","Liverpool","Real Madrid","AC Milan","Tottenham","Eintracht Frankfurt","Nápoles","Borussia Dortmund","Chelsea","Inter de Milán","Porto","PSG","Bayern Munich"}
equipos


{'AC Milan',
 'Bayern Munich',
 'Benfica',
 'Borussia Dortmund',
 'Brujas',
 'Chelsea',
 'Eintracht Frankfurt',
 'Inter de Milán',
 'Liverpool',
 'Manchester City',
 'Nápoles',
 'PSG',
 'Porto',
 'RB Leipzig',
 'Real Madrid',
 'Tottenham'}

In [66]:
# número escogido por taty
random.seed(29)

# números escogidos por taty al asar
taty=random.sample(equipos,8)

In [67]:
# resultados equipos de julian
julian=[x for x in equipos if x  not in taty]

In [68]:
# resultados finales
print("Resultados Julian =", julian)
print("\n \n \n")
print("Resultados taty =", taty)

Resultados Julian = ['Chelsea', 'Borussia Dortmund', 'Eintracht Frankfurt', 'Porto', 'Tottenham', 'Manchester City', 'Bayern Munich', 'Benfica']

 
 

Resultados taty = ['AC Milan', 'Nápoles', 'Real Madrid', 'Inter de Milán', 'Brujas', 'PSG', 'RB Leipzig', 'Liverpool']
