## Programming for Data Analysis - Assignment

This notebook is for the assignment for ProgDA - Author: Sheldon D'Souza (email: g00387857@gmit.ie)

#### Problem statement

The assignment concerns the numpy.random package in Python. You are required to create a Jupyter notebook explaining the use of the package, including detailed explanations of at least five of the distributions provided for in the package. There are four distinct tasks to be carried out in your Jupyter notebook.

1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.


Project Plan:
- Research the numpy.random package via official documentation and online tutorial videos.
- Research the Simple random data and Permutatin functions via official documentation and online tutorial videos.
- Choose 5 Distributions functions. Choose a dataset or generate data to show the differences between the distibution
- Research seeds generation and given coded example
- Research creative methods to "explain" the above narratively and/or visually  


<img src=https://live.staticflickr.com/748/20445410340_c1a0fe6a6a_b.jpg alt="Randomness" width="500"/>


#### Part 1 - The purpose of the Numpy.random package

##### The problem with generating randomness

*What is a random number/sequence*

A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll or the digits of π exhibit statistical randomness.[1]

*Why is it so difficult to generate a random number/sequence*

Computers need instruction or programs to generate to do any task which includes generating random numbers. This means that an algrothim will also needed to generate a random number. The fact that there is an algrothim behind the generation means that it is not completely random and therefore these numbers are called pseudo random numbers ("PRN").[2]

##### What does numpy.random do

As per official documentation:

> Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions [3]

In other words any pseudo random numbers generator such as numpy.random will use an alogrothim to produce a stream of numbers which would exhibit statistical randomness. A 'seed' is often used as a 'starting point which is 'fed' into a software algrothim to produce a sequence of seemingly random numbers. The seed used can either be specified or can be automatically selected by e.g. the current date/time of the system running the generator etc.[4]   


A very simple way of looking at this would be in the example below:

In [119]:
def random_gen(how_many_nos=10, seed=400): #demo limitation - max number of items is 20 and seed value between 400 and 800
    l = []
    for n in range((how_many_nos)):
        seed = (seed/7) + (n * 5) + ((2 * 3/50) % 4) + (22/7) # this is an example of a software algorithim number generator
        y = str(seed) #convert number to string for slicing
        z = y[4:8] #using slicing to further 'randomise numbers'
        l.append(int(z)) #convert back to integer

    print(l) #used instead of return() so that all values can be seen below

In [120]:
random_gen(5,560)
random_gen(4,442)
random_gen(8,650)
random_gen(5,560)

[6285, 5755, 4250, 6892, 127]
[571, 4938, 9848, 1978]
[2, 9428, 489, 641, 663, 666, 3523, 6789]
[6285, 5755, 4250, 6892, 127]


The above function would be an example of an algorithim based generator which based on the seed would generate a list of seemingly random numbers. The key here is to generate a sufficiently long sequence of numbers is and based on a 'seed' number, the result will have the effect of being randomly generated. 

In the function above the user will include the number of random items to be generated along with a seed number (to reproduce results. Both inputs are then incuded in a software equation to generate a random number. In the above example I have converted the resultant number to a text string and then taken a slice of that string, to further randomise the number.

As can be seen above, If the same seed is used, the same sequence of random numbers will be generated.

The above is a very crude example of explaining how a generator like numpy generates pseudo random numbers

References
[1] - https://en.wikipedia.org/wiki/Statistical_randomness

[2] - https://www.w3schools.com/python/numpy_random.asp

[3] - https://numpy.org/doc/stable/reference/random/index.html?highlight=random#what-s-new-or-different

[4] - https://realpython.com/python-random/


Image Credits:
[5] "Binary code" by Christiaan Colen is licensed under CC BY-SA 2.0

#### Part 2 - Use of 'Simple Random Data' and 'Permutations' functions

###### 'Simple Random Data' functions


A 'Simple Random Data' function within numpy is, as the name suggests, a function to generate a 'simple' randomly generated value or array of values. Here simple means that each number within the range, has an equal chance of being selected/generated at every iteration of selection. A a real word example if we are rolling a die, each number from 1 to 6 has an equal chance of being selected on each roll.

This is as opposed to data generated by a 'Distribution' which may have a certain 'weighting' towards certain values of range of values which may be expected to occur more frequently than others. An example of this is that if we are generating random values with regards to height of a population, we would expect the results to be skewed in way which would result in a greater number of values generated towards a 'mean' height.


There are a number functions inbuilt within numpy to generate 'simple' random data:

| Function | Description |
| -------- | ----------- |
| integers | Return random integers from low to high |
| random | Return random floats in the half-open interval (0.0, 1.0) |
| choice |Generates a random sample from a given 1-D array |
| bytes | Return random bytes |
