# Assignment - Programming for Data Analysis - Due: last commit on or before November 22nd

The following assignment concerns the numpy.random package in Python. You are
required to create a Jupyter notebook explaining the use of the package, including
detailed explanations of at least five of the distributions provided for in the package.
There are four distinct tasks to be carried out in your Jupyter notebook.

1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.

## 1. Explain the overall purpose of the package.

##### What is NumPy?
NumPy stands for Numerical Python.

* NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant.https://www.w3schools.com/python/numpy_intro.asp *

> - In Python we have lists that serve the purpose of arrays, but they are slow to process.
> - NumPy aims to provide an array object that is up to 50x faster that traditional Python lists.
> - Arrays are very frequently used in data science, where speed and resources are very important.
> - Why is NumPy Faster Than Lists? NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.

##### What is numpy.random package

The numpy.random package has many different functions for creating random numbered arrays.
They include 

rand<br>
randn<br>
randint<br>
random_integers<br>
random_sample<br>
random<br>
ranf<br>
sample<br>
choice<br>
bytes<br>

More details on how each of the different NumPy functions above generate random numbered arrays can be found on the link below 
https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html

The overall purpose of the numpy.random package is to be able to serve the needs of different data scientists and programmers requiring different randomisation models.

Randomisation is a specialist field of computer science and there isn't a one size fits all right way of doing randomisation. Therefore the contributors and developers of numpy.random over time have added more and more options and choices of randomisation functions so that data analysts and developers can simulate and develop many different random datasets used for random scenarios.

NumPy serves as an addon to Python, *The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. https://en.wikipedia.org/wiki/NumPy*

## 2. Explain the use of the “Simple random data” and “Permutations” functions.

### Simple random data

Simple random data¶

#### intergers
#### random
#### choice
#### bytes

rand(d0, d1, …, dn)	Random values in a given shape.
randn(d0, d1, …, dn)	Return a sample (or samples) from the “standard normal” distribution.
randint(low[, high, size, dtype])	Return random integers from low (inclusive) to high (exclusive).
random_integers(low[, high, size])	Random integers of type np.int between low and high, inclusive.
random_sample([size])	Return random floats in the half-open interval [0.0, 1.0).
random([size])	Return random floats in the half-open interval [0.0, 1.0).
ranf([size])	Return random floats in the half-open interval [0.0, 1.0).
sample([size])	Return random floats in the half-open interval [0.0, 1.0).
choice(a[, size, replace, p])	Generates a random sample from a given 1-D array
bytes(length)	Return random bytes.

### Permutations

Randomly permute a sequence, or return a permuted range.
If x is a multi-dimensional array, it is only shuffled along its first index.

## 3. Explain the use and purpose of at least five “Distributions” functions.

## 4. Explain the use of seeds in generating pseudorandom numbers.

What is the advantage of seed in randomization?
It increases the probability of a different result. It also changes the distribution of results when you ask for sequences of random results. 


If you need to generate a randomization list for a clinical trial, do some simulations or perhaps perform a huge bootstrap analysis, you need a way to draw random numbers. Putting many pieces of paper in a hat and drawing them is possible in theory, but you will probably be using a computer for doing this. The computer, however, does not generate random numbers. It generates pseudo random numbers. They look and feel almost like real random numbers, but they are not random. Each number in the sequence is calculated from its predecessor, so the sequence has to begin somewhere;  **it begins in the seed – the first number in the sequence.

https://www.r-bloggers.com/2019/03/how-to-select-a-seed-for-simulation-or-randomization/


#### Seeds and randomization

##### Online Casinos
'When you play a card game like blackjack at a live casino, every hand is random. That’s because the dealer physically shuffles the deck to mix up the cards. If you’re the shooter playing craps, where the dice ultimately land depends on the laws of motion. Yet what happens in most online casino games are statistical approximations of what might take place in reality, which is where random number generators come into play.'

The whole idea of random number generation becomes complicated if you understand how computers operate. As most people know, computers are programmed by humans. In other words, they merely follow the instructions we give them. Even though that might not appear to be problematic, it does create a major technical dilemma. If computers rely on human for input, the output can’t really be random. Any mathematical formula you could possibly create would be predictable and could theoretically be decoded by hackers. Naturally, programmers have come up with a fool proof solution, which involves using a seed number. ....

While there is a pattern, it’s impossible to decode if you don’t know the initial seed number, which is heavily guarded.
https://www.casino.co.uk/guides/random-number-generators/

##### Computer Security and Encryption
A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator. ... The choice of a good random seed is crucial in the field of computer security. When a secret encryption key is pseudorandomly generated, having the seed will allow one to obtain the key.

If the same random seed is deliberately shared, it becomes a secret key, so two or more systems using matching pseudorandom number algorithms and matching seeds can generate matching sequences of non-repeating numbers which can be used to synchronize remote systems, such as GPS satellites and receivers.
https://en.wikipedia.org/wiki/Random_seed

##### Seeds Clinical Trials
Knowing the seed is a good idea. It enables reproducing the analysis, the simulation or the randomization list. If you run a clinical trial, reproducibility is crucial. You must know at the end of the trial which patient was randomized to each treatment; otherwise you will throw all your data to the garbage. During the years I worked at Teva Pharmaceuticals, we took every possible safety measure: We burnt the randomization lists, the randomization SAS code and the randomization seed on a CD and kept it in a fire-proof safe. We also kept all this information in analog media. Yes, we printed the lists, the SAS code and the seed on paper, and these were also kept in the safe.

Using the same seed every time is not a good idea. If you use the same seed every time, you get the same sequence of pseudo-random numbers every time, and therefore your numbers are not pseudo-random anymore. Selecting a different seed every time is good practice.

How do you select the seed? Taking a number out of your head is still not a good idea. Our heads are biased. Like passwords, people tend to use some seeds more often than other possible seeds. Don’t be surprised if you see codes with seeds like 123, 999 or 31415.

The best practice is to choose a random seed, but this creates a magic circle. You can still turn to your old hat and some pieces of paper, but there is a simple alternative: generate the seed from the computer clock.
https://www.r-bloggers.com/2019/03/how-to-select-a-seed-for-simulation-or-randomization/
