# Programming for Data Analysis

# Assignment 2019 



## What are random numbers?

![random numbers](new.png)



Random numbers have been used for many thousands of years. Whether we talk about flipping a coin or rolling a dice, the goal is to leave the end result up to random chance. It is the same case with random number generators in a computer. The aim is to achieve an unpredictable, random result that could not be guessed. Computers generate random numbers for everything from cryptography to gambling. There are two categories of random numbers, “true” random numbers and pseudorandom numbers. Its difference is important for the security of encryption systems. To generate a “true” random number, the computer measures some type of physical phenomenon that takes place outside of the computer. For example the computer could rely on atmospheric noise or simply use the exact time you press keys on your keyboard as a source of unpredictable data (www.howtogeek.com). This type of random numbers is mostly used in programs in which data security is main priority. In this work we are going to focus more on pseudorandom numbers as they are a sample of numbers that look close to true random numbers but were generated using some deterministic process (machinelearningmastery.com).  Pseudorandom number generators (PRNGs) refer to an algorithm that uses mathematical formulas to produce sequences of random numbers (www.geeksforgeeks.org). One of the ways to generate random numbers is by using NumPy random numbers package. For that purpose, NumPy provides various routines that use particular algorithm to generate pseudorandom numbers.In order to dive in it into more details, in the next few paragraphs I will be focusing more on NumPy library and its features. 


## NumPy package
  
  NumPy (Numerical Python) is a library for the Python programing language. It is intrinsically integrated with Python (en.wikipedia.org). It is a successor for two earlier scientific libraries, Numeric code and Numarray.https://docs.scipy.org/doc/numpy-1.14.0/index.html As NumPy, its initial release was in 2006. The primary creator of NumPy is American data scientist and businessman, Travis Oliphant. NumPy is a distributed, volunteer, open-source project, and everyone is welcome to give their contribution. 
Some of the features that NumPy contains are powerful N-dimensional array objects, sophisticated functions, tools for integrating C/C++ and Fortran code and many others. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.https://docs.scipy.org/doc/numpy-1.14.0/about.html

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

- NumPy arrays have a fixed size at creation while Python lists can grow dynamically. Changing the size of an ndarray will create a new array and delete the original. There are some NumPy’s benefits over Python lists, which include: being more compact, faster access in reading and writing items, being more convenient and more efficient (https://towardsdatascience.com).

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. 

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

- In order to efficiently use most of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient, it is also crucial to know how to use NumPy arrays (https://numpy.org).

 

The more important attributes of an ndarray object are:

##### 1. ndarray.ndim

This function returns the number of axes or dimensions of the array. In order to use ndarray.ndim function I have to import numpy package first. This is an example of one-dimensional array. The list of numbers from 4 to 7. 

In [32]:
#Import numpy as np
import numpy as np
#numpy.array function is used to generate numbers
a = np.array([4, 5, 6, 7])

In [33]:
#call the value a
a

array([4, 5, 6, 7])

In [34]:
#number of axes of the array    
a.ndim

1

##### 2. ndarray.shape

This function returns the dimensions of the array. In this situation we got output 4 which means it is one-dimensional array with 4 elements in it. 

In [35]:
#dimensions of the array
a.shape

(4,)

##### 3. ndarray.size

This function shows the total number of elements in the array. It is equal to the number of elements of ndarray.shape.

In [36]:
#total number of elements in the array
a.size

4

##### 4.ndarray.dtype

This function describes the type of elements in the array. It tells us if the element is for example integer or float number. NumPy provides types of its own (int32, int64, float64...). This type of element is int32. 

In [37]:
#type of elements in the array
a.dtype

dtype('int32')

##### 5.ndarray.itemsize

This function shows the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8). In this case as we got an output that this array type is int32 which means that its itemsize is 4 bytes (32/8=4). 

In [38]:
#size in bytes of each element of the array
a.itemsize

4

## Why is NumPy fast?

Vectorization describes the absence of any explicit looping, indexing or other features in the code, these things are taking place “behind the scenes” in pre-compiled C code. The advantages of vectorized code are:

- vectorized code is easier and more concise to read

- fewer lines of code generally means fewer bugs

- the code resembles standard mathematical notation 

- vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

Broadcasting is the term used to describe the implicit element-by-element behavior of operations. In NumPy all operations, not just arithmetic operations, but also logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion,  they broadcast. You can read more about numpy package here: https://numpy.org/devdocs/user/whatisnumpy.html

## Section 1 - Numpy.random package



As mentioned before, the focus of this project is to focus on pseudorandom numbers as they are a sample of numbers that look close to true random numbers but are generated using some deterministic process. As computers can not produce random numbers itself, they need packages and algorithms that would deal with it. Numpy.random rackage is one of the libraries in Python that generates pseudorandom data. 

Within the package there are many different functions and distributions. You can find more information about numpy.random package and its features here: https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html#

Numpy.random package consists of 4 parts:

##### 1. Simple random data
##### 2. Permutations
##### 3. Distributions
##### 4. Random generator


Each of these sections will be discussed in more details in the text below. In the next paragraph I will talk about the use of Simple random data and permutations. 

## Section 2 - Simple random data and Permutations

Simple random data functions:


- rand(d0, d1, …, dn) - Random values in a given shape.
- randn(d0, d1, …, dn) - Return a sample (or samples) from the “standard normal” distribution.
- randint(low[, high, size, dtype]) - Return random integers from low (inclusive) to high (exclusive).
- random_integers(low[, high, size]) - Random integers of type np.int between low and high, inclusive.
- random_sample([size]) - Return random floats in the half-open interval [0.0, 1.0).
- random([size]) - Return random floats in the half-open interval [0.0, 1.0).
- ranf([size]) - Return random floats in the half-open interval [0.0, 1.0).
- sample([size]) - Return random floats in the half-open interval [0.0, 1.0).
- choice(a[, size, replace, p]) - Generates a random sample from a given 1-D array
- bytes(length) - Return random bytes.

##### numpy.random.rand

This function gives random values in a given shape. It creates an array of the given shape and populates it with given samples 
from a uniform distribution. Numbers 4 and 2 signify 4 rows and 2 columns. 

In [39]:
#import numpy package as np 
import numpy as np
#numpy random function "rand" is defined as b
b = np.random.rand(4,2)

In [40]:
#call the function b
b

array([[0.22427945, 0.85654774],
       [0.6211916 , 0.6261309 ],
       [0.67500644, 0.09665956],
       [0.3199594 , 0.61037469]])

##### numpy.random.randn

This function returns a sample or more samples from the “standard normal” distribution. A single float randomly sampled from the distribution is returned if no argument is provided.

In [41]:
#import numpy package as np
import numpy as np
#numpy random function randn is defined as b
b = np.random.randn()

In [42]:
#call the function b
b

-1.1721652650631251

##### numpy.random.randint

This function returns random integers from low (inclusive) to high (exclusive). It returns random integers from discrete uniform distribution. As you can see in the example below we got 2 arrays with 4 elements in each. All numbers are in range from 0 to 5 (not including 5).

In [43]:
#import numpy package as np
import numpy as np
#numpy random function randint (random integer) is defined as b
b = np.random.randint(5, size=(2, 4))

In [44]:
#call the function b
b

array([[0, 3, 3, 3],
       [4, 4, 4, 2]])

##### numpy.random.random_integers

As we can see above, when using randint function high integer is exclusive, while using random_integers high integer is inclusive. So this function is very similar to randint function. In this case there are 3 rows, each row has 2 elements in it and high integer is included. 

In [2]:
#import numpy package as np
import numpy as np
#numpy random function random_integers is defined as b
c = np.random.random_integers(5, size=(3,2))

  after removing the cwd from sys.path.


In [3]:
#call the function c
c

array([[2, 5],
       [1, 3],
       [1, 2]])

##### numpy.random.random_sample,  numpy.random.random,  numpy.random.ranf,  numpy.random_sample

These 4 functions return floats. Results are from the “continuous uniform” distribution over the stated interval. I will show two examples of these functions. In the first case using random_sample function it gives back random floats in the half-open interval from 0.0 to 1.0. In this case I got back float number: 04779...

In [46]:
#import numpy package as np
import numpy as np
#numpy random function random sample is defined as b
b = np.random.random_sample()

In [47]:
#call the function b
b

0.4779352719841624

In the second example, we got 5 different arrays, all of them are in the range from 0.0 to 1.0.  Results are also chosen from the “continuous uniform” distribution.

In [48]:
#import numpy package as np
import numpy as np
#numpy random function random sample is defined as b
b = np.random.random_sample((5,))

In [49]:
#call the function b
b

array([0.85525413, 0.72798211, 0.68681135, 0.90478692, 0.5735883 ])

##### numpy.random.choice

This function generates a random sample from a given 1-D array. There are 4 different parameters a : 1-D array-like or int, size : int or tuple of ints, optional, replace : boolean, optional, p : 1-D array-like, optional

In [50]:
#import numpy package as np
import numpy as np
#numpy random function choice is defined as b
b = np.random.choice(5, 3)

In [51]:
#call the function b
b

array([0, 0, 4])

##### numpy.random.bytes

This function returns random bytes. 

In [52]:
#import numpy package as np
import numpy as np
#numpy random function bytes is defined as b
b = np.random.bytes(10)

In [53]:
#call the function b
b

b'<R\xb9\x1f\xaf\xc0\xa3r4G'