![Logos](https://s3.amazonaws.com/com.twilio.prod.twilio-docs/images/jupyter_python_numpy.width-808.png)

# Practical Assignment - Programming for Data Analysis 2018
By Simona Vasiliauskaite G00263352



# Numpy.Random Package

In this notebook I will answer and discuss following points:
1. The purpose of Numpy.Random package
2. Explanation of 'Simple random data' and 'Permutations' functions
3. Use and purpose of 'Distributions' functions such as uniform, normal, logistic, geometric, exponential and more.
4. Why use seeds in generating pseudorandom numbers.

## 1. Purpose of Numpy.Random package

Before diving deep into the numpy.random package, here is some background information on Numpy as a package.
It is a Python Package, specialized for building and manipulating large, multidimensional arrays. NumPy has built-in functions for linear alegbra and random number generation. 
It's an important library because a lot of the other Python packages such as SciPy, Matplotlib depend on Numpy to function (to a reasonable extent.)

The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.  Source [Python for Data Analysis by Wes McKinney](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html)


Numpy holds some benefits over Python lists, such as: being more compact, faster access in reading and writing items, being more convenient and more efficient.

**Numpy array** is a powerful N-dimensional array object which is in the form of rows and columns.

In [10]:
# Import Numpy library 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline 

## 1.1 Types of Arrays

**Single-dimensional Numpy Array**

In [2]:
a=np.array([1,2,3])
print(a)

[1 2 3]


**Multi Dimentional Numpy Array**

In [3]:
a=np.array([(1,2,3),(4,5,6)])
print(a)

[[1 2 3]
 [4 5 6]]


## 2. Permutations & Simple Random Data

## 2.1
#### What is permutation?

A permutation is a method to calculate the number of events occurring where order matters.

Use of Permutations
* **permutation(x)**	Randomly permute a sequence, or return a permuted range.
* **shuffle(x)**	Modify a sequence in-place by shuffling its contents.

In [4]:
# A Python program to print all permutations using library function itertools
from itertools import permutations 
  
# Get all permutations of [1, 2, 3] 
p = permutations([1, 2, 3]) 
  
# Print the obtained permutations 
for i in list(p): 
    print(i) 

(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)


#### What does Shuffle function do?

The method shuffle() randomizes the items of a list in place.

In [5]:
# Testing a Python library to shuffle all values
from random import shuffle

x = [12, 15, 77, 298];

# Shuffle and print the outcome
shuffle(x)
print ("Reshuffled values : ", x)

Reshuffled values :  [77, 12, 298, 15]


In [6]:
# Testing shuffle function with strings

names = ["Simona", "Elena", "Pat", "Dave"]

shuffle(names) # shuffle all strings

print ("New shuffled name order : ", names) # print shuffled strings


New shuffled name order :  ['Dave', 'Simona', 'Elena', 'Pat']


## 2.2 

#### Sample Random Data

In [12]:
# Import random function
import random

In [13]:
from random import randint
print(randint(0, 9))

4


**random.sample**

sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.

In [14]:

# Print list of random items of length 3 from the given list. 
list1 = [5, 6, 7, 8, 9, 11]
print("With list:", random.sample(list1, 3)) 
  
# Print list of random items of length 4 from the given string.  
string = "Computer"
print("With string:", random.sample(string, 4)) 
  
# Print list of random items of length 2 from the given tuple. 
tuple1 = ("college" , "work" , "pc" , "study" , "science") 
print("With tuple:", random.sample(tuple1, 2)) 
   
# Print list of random items of length 3 from the given set. 
set1 = {"a", "b", "c", "d", "e"} 
print("With set:", random.sample(set1, 3)) 

With list: [7, 11, 8]
With string: ['r', 'e', 'u', 'p']
With tuple: ['work', 'college']
With set: ['a', 'd', 'b']


**random.randint**

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

Randint accepts two parameters: a lowest and a highest number.

In [15]:
# Generate integers between 1,5

a = random.randint(0, 5)
print(a)

2


**random.choice**

In [22]:
# Generate a random sample from a given 1-D array

letters = ['a', 'b', 'c', 'd', 'e']
print("Random choice:", random.choice(letters)) # Print generated random sample

Random choice: a


## 3. Examples of Distributions Functions

Probability distributions are a fundamental concept in statistics. They are used both on a theoretical level and a practical level.

## 3.1

**Common Data Types**

The data can be discrete or continuous.

* Discrete Data can take only specified values.

* Continuous Data can take any value within a given range. The range may be finite or infinite.

Below is a graph of some of the common distributions and the arrows inidicate that some are very similar and relate to each other. 

I will now compare some common distributions and their relationships with one another.


![Distribution Chart](https://www.johndcook.com/distribution_chart.gif) [Source](https://www.johndcook.com)


I also found this graph below very interesting and I will try to compare my analysis of these distribution relationships to what this graph suggests and draw a conclusion. 

![DistributionProbabilityGraph](https://analyticsbuddhu.files.wordpress.com/2017/02/overview-prob-distr.png)

[Source](https://analyticsbuddhu.com/2017/02/26/how-many-types-of-continuous-probability-distribution/)

## 4. Use of Seed

Random number generation (RNG) is the process by which a string of random numbers may be drawn. The numbers are not completely random for several reasons.

1. They are drawn from a probability distribution. The most common one is the uniform distribution on the domain  0â‰¤x<1 , i.e., random numbers between zero and one. 

2. In most computer applications, the random numbers are actually pseudorandom. They depend entirely on an input seed and are then generated by a deterministic algorithm from that seed. [Source](http://justinbois.github.io/bootcamp/2016/lessons/l26_random_number_generation.html)


To demonstrate that random number generation is deterministic, we will explicitly seed the random number generator

In [12]:
# Seed the RNG
np.random.seed(25)

# Generate random numbers
np.random.random(size=5)

array([ 0.87012414,  0.58227693,  0.27883894,  0.18591123,  0.41110013])

In [13]:
# Re-seed the RNG
np.random.seed(25)

# Generate random numbers
np.random.random(size=5)

array([ 0.87012414,  0.58227693,  0.27883894,  0.18591123,  0.41110013])

The random numbers are exactly the same. If we choose a different seed, we get totally different random numbers.

### Import Numpy

In [14]:
import numpy as np
import matplotlib.pyplot as plt

In [15]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))

<class 'numpy.ndarray'>


In [16]:
print(a.shape)

(3,)


In [17]:
print(a[0], a[1], a[2])

1 2 3


In [18]:
a[0] = 5                  # Change an element of the array
print(a)

[5 2 3]


In [19]:

#Useful website for simple examples


[URL](http://cs231n.github.io/python-numpy-tutorial/)