# Numpy III

## Generating array with random numbers - the numpy.random module

Numpy library has a sub-module called 'random', which is used to generate random numbers for a given distribution. It is especially useful for randomly sampling data for specific experiments.

The module contains functions which
1. enables random sampling from simple data
2. enables random sampling from well known data distributions
3. Permutations and other functions like shuffling and seeding

## 1. Sampling random numbers from simple data

#### rand() function

The rand() function randomly selects numbers from a 'uniform distribution' in the range [0,1), i.e., zero inclusive and 1 exclusive. When we say the distribution is uniform, we mean to say that there is equal probability of any number between 0 and 1 getting selected.

#### randn() function

The randn() function randomly selects numbers from a 'standard normal distribution' with a mean of 0 and a variance of 1, i.e., there is equal chance of a positive or negative number getting selected, with a higher chance of a number closer to 0 getting selected and chances dimming down as the value is farther away from the mean 0.

#### randint() function

The randint() function randomly selects integers from a given range of numbers. Since this selects only integers between the given range of numbers, it is called a 'discrete uniform distribution' as there is equal chance of any integer getting selected.

Examples:

```python
# Importing numpy library
import numpy as np

np.random.rand(5)
>>> array([ 0.93371582,  0.82386466,  0.34771991,  0.59338646,  0.41190981])

np.random.randint(1000.20,5000.50,10)
>>> array([4825, 1466, 4025, 2931, 1693, 2385, 2857, 1767, 2902, 1759])
```
Another example:

```python
# Importing numpy library
import numpy as np

shape_shifter
>>> array([ 0.906423  ,  0.55807204,  0.28928162,  0.47020116,  0.27403332,
>>>         0.94178672,  0.81342077,  0.5859645 ,  0.63569185,  0.84614272,
>>>         0.36454835,  0.63664789])

shape_shifter.max()
>>> 0.94178671566784411

shape_shifter.min()
>>> 0.27403331882439208

shape_shifter.argmax()
>>> 5

shape_shifter.argmin()
>>> 4
```

### Exercise 1

Use the rand(), randn() and randint() function between the range 3 and 8 and store the results in the variable ' rand_a ', 'randn_a' and 'randint_a'. Print the results.

In [7]:
import numpy as np

# Modify the code below

rand_a = np.random.rand(3,4)
print("Result from performing rand:\n", rand_a)
randn_a = np.random.randn(3,5)
print("Result from performing randn:\n", randn_a)
randint_a = np.random.randint(3,8,(2,3))
print("Result from performing randint:\n", randint_a)

Result from performing rand:
 [[0.21779482 0.93473731 0.59637643 0.29573454]
 [0.25740106 0.25733747 0.45701928 0.14911857]
 [0.38294081 0.50829077 0.11020239 0.22333851]]
Result from performing randn:
 [[-0.5934605  -1.22243917  0.28503449 -1.64911353  0.73855491]
 [-0.03145254  0.3604185  -0.3176907  -1.56399525 -0.6930708 ]
 [ 1.18775714 -1.00733733 -1.21765386 -0.05414041  1.01633378]]
Result from performing randint:
 [[7 3 7]
 [4 3 3]]


### Solution code

```python
import numpy as np
rand_a = np.random.rand(3,8)
print("Result from performing rand:", rand_a)
randn_a = np.random.randn(3,8)
print("Result from performing randn:", randn_a)
randint_a = np.random.randint(3, 8)
print("Result from performing randint:", randint_a)
```

### Exercise 2

Generate an array of 5 floating point values using numpy's random functions. Each value in the array should be greater than 1.

In [15]:
import numpy as np                      

# Modify the code below
mixed_array = np.random.randint(1,7,5) + np.random.rand(5)
mixed_array

array([2.79720563, 4.26338421, 3.92972406, 4.93423908, 2.29873506])

### Solution code

```python
floats = np.random.rand(5)
ints = np.random.randint(1,5,5)

mixed_array = ints + floats
print(mixed_array)
```

In [17]:
import numpy as np

uni_dist = np.random.uniform(1,3,10)
uni_dist

array([1.81848053, 1.08792191, 1.33606212, 2.31592748, 2.69174894,
       1.49448647, 1.2088024 , 1.22804363, 2.26191528, 1.66954055])

## 2. Sampling data from data distributions

#### Binomial function

A binomial distribution is a probabilty of having an outcome that either a success or a failure. The distribution has two possible outcome which is success or failure that is obtained from the number of times a survey or an experiment is performed.
The result of this distribuation is either a success or failure (i.e.) when a new drug for a cure is tested, the result either a 
success or failure for the cure. 

The np.random.binomial function pulls the number of the samples specified by the user from the binomial distribution

 ```python   
# Importing numpy library
import numpy as np

# Sampling data from a binomial distribution with 'n' (=1) trials and 'p' (=.3) as probability of success, 3 is number of samples
np.random.binomial(1,0.3,3)
 
# OUTPUT: 
>>> array([0, 0, 1])
```

#### Chisquare function

The Standard Normal Distribution is a normal distribution which has a zero mean and standard deviation 1. The STD curve's center will be at zero on the axis and the standard deviation is measured from the degree by which the measurement deviate from the mean. 
The chi square distribution is the distribution(df) of the sum of the squared standard normal deviates. 

The np.random.chisquare function pulls the number of samples from a chi-square distribution. 

```python
# General syntax to use chisquare function
numpy.random.chisquare(df, size=None)
```

```python
# Importing numpy library
import numpy as np

# Sampling data from a chisquare distribution with mean 3 and variance 2
np.random.chisquare(3,2)

# OUTPUT:
>>> array([0.33570358, 3.53852457])
```

#### Exponential function


The exponential distribution is a continuous distribution. A continuous distribution is a set of all possible value which is infinite and countable. The exponential distribution is the most commonly used continuous distribution. 
The exponential distribution is used to reproduce the result of the time interval between the events (i.e.) the start and the end of the event.

The np.random.exponential function pulls the number of samples specified by the user from an exponential distribution. 


```python
# Importing numpy library
import numpy as np

# Sampling data from exponential distribution where 1 is scale and 2 is number of samples to be drawn
np.random.exponential(1, 2)
     
# OUTPUT:
>>> array([0.18400851, 0.34144262])
```

#### Geometric function

Bionomial distribution has one of two outcomes, either success or failure. but a geometric distribution is performed on number of experiments till the outcome results in success. So this distribution is supported by positives values.

The np.random.geometric function pulls the number of samples specified by the user from a geometric distribution. 

```python
# Importing numpy 
import numpy as np

# Sampling data from a geometric distribution with probability of success as 0.5 and total samples to be 10
geo = np.random.geometric(p=0.5, size=10)  // where p is the probabilty of success in single trial and size is the number of                                                    samples that we would want to pull from the distribution
geo

# OUTPUT:
>>> array([1, 1, 2, 9, 2, 2, 5, 1, 2, 1])  
```

#### Poisson function

Poisson Distribution is a discrete frequency distribution that plots the likelihood of number of times a given event may occur in a specific interval of time.

np.random.poisson function draws samples from a Poisson distribution.

```python
# Importing numpy library
import numpy as np

# Sampling data from a poisson distribution with 3 as expected interval and 8 is the number of samples that we need.
pois = np.random.poisson(3, 8)
pois

# OUTPUT:
>>> array([2, 4, 3, 3, 3, 3, 7, 1])
```


#### Uniform function

Uniform distribution is a continuous distribution which has a constant value throughout. Uniform distribution is defined by the lower or higher chances of an event occuring. 

np.random.uniform function pulls the number of samples specified by the user from a Uniform distribution. 


```python
# Importing numpy library
import numpy as np
    
# Sampling data from a uniform distribution with 1 as lower bound and 3 as upper bound of the data interval
uni_dist = np.random.uniform(1,3,10)
    uni_dist

# OUTPUT:
>>> array([1.98277548, 1.47078083, 2.97779683, 2.13713806, 1.8976292 ,
>>>    2.63856215, 1.04455004, 1.15975864, 2.80673453, 1.45120926])
```    

#### F function 

F Distribution is a distribution which finds the ratio of two samples to check whether they have same variance (i.e.) the means between two sample values has notable difference in them. The values of F are only positive. This distribution is helpful in identifying which experiment is better than others by noting the difference they show. 

np.random.f function pulls the number of samples specified by the user from a Uniform distribution. 


```python
# Importing numpy library
import numpy as np

# Sampling data from a f distribution where 1 is the degree of freedom in numerator and 4 is the degree of freedom in denominator
s = np.random.f(1, 4, 10)
s

# OUTPUT:
>>> array([11.8876891 ,  0.91116938,  1.27070929,  1.46581719,  4.11247805,
>>>    28.53869887,  1.99272372,  0.18135954,  0.066219  ,  1.06589472])
```


Refer the following for better understanding of degree of freedom.

http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-degrees-of-freedom-in-statistics


#### Normal function

Normal distribution is a continuous probability distribution which is also called a bell curve with mean median and mode to be equal. The curve is symmetric and the area under the curve is 1. This distribution helps in finding the percentage of data that falls within the curve.

The np.random.Normal function pulls the number of samples specified by the user from a Normal distribution. 

```python
# Importing numpy library
import numpy as np

# Sampling data from normal distribution where 2 is mean, 0.2 is standard deviation and 3 is the size of the sample we want
s = np.random.normal(2, 0.2, 3)
    
# OUTPUT:
>>> array([2.23873731, 1.99493943, 1.98627681])
```   


### Exercise

Use poisson function to draw samples out of a distribution.

Draw 50 samples with the lambda value as '30' and then store it in a variable ' dist'

In [18]:
import numpy as np

# Modify the code below
dist = np.random.poisson(lam=(30), size=(50))
dist

array([32, 28, 35, 26, 34, 29, 36, 32, 27, 30, 38, 25, 32, 23, 35, 34, 37,
       41, 26, 32, 37, 34, 34, 28, 36, 25, 33, 27, 28, 30, 39, 26, 27, 31,
       24, 22, 23, 33, 34, 25, 31, 33, 34, 39, 34, 38, 36, 25, 42, 32])

### Solution code

```python
dist = np.random.poisson(lam=(30), size=(50))
```

## 3. Permutations and other functions like shuffling and seeding


#### shuffle() function

The shuffle function simply rearranges the contents whose action is similar to the shuffling action done in playing cards.
This function only shuffles the array along the first axis of a multi-dimensional array (i.e.) when we use this shuffle function, the contents along the first axis which is the first row of a multidimensional array will get shuffled but the contents of the array will not change. 

In the below program, a 3*3 array is passed and is passed through the shuffle function 

```python
# Importing numpy library
import numpy as np

# Creating array
arr = np.array([[3, 4, 5], [6, 7, 8], [0, 1, 2]])

# Shuffling array
np.random.shuffle(arr)
arr

# OUTPUT:
>>> array([[6, 7, 8],
>>>        [3, 4, 5],
>>>        [0, 1, 2]])
```

The output is obtained by just reorganising the array which is done along the first row of the matrix.


#### permutation() function

Permuation means the combination or grouping of many different possible things. So, permutation function randomly combines or groups a sequence which returns the permuted result.

Permutation is done when we need to select set of numbers from a list and form an order. For example: If in a class, we want to select first three ranks from a group of 100 students. Fo when we find the first rank student(A) , we leave out that person A as he was already selected and do permutation on the remaining 99 students. and once 2nd rank student is found. these two student's names are left out and permutation is done on the remaining. So in permutation order plays a major role.

Similar to the shuffle function, for a multi-dimensional array, the permutation is done along its first index. 

Permutation and Shuffle will work the same except that shuffle will rearranges the elements in random order whereas permutation will organise in orderly way. Permutation function will be useful when we need to do classification.

For permutation we will have a copy of the shuffled result whereas for the shuffle we get the shuffling with the element itself.

In the below program, we passed an array to the variable 'arr' and permutation function is used. 

```python
# Importing numpy library
import numpy as np

# Creating array
arr = np.array([[3, 4, 5], [6, 7, 8], [0, 1, 2]])

# Permuting the array
np.random.permutation(arr)

# OUTPUT:
>>> array([[0, 1, 2],
>>>        [3, 4, 5],
>>>        [6, 7, 8]])
```

The output above is obtained from the concept of permutation. first array is selected and then the permutation is done on the remaining two row. 

Though the resulting answer might look the same for shuffle and permutation operation. The concept for both will vary. 

#### Seed() function

np.random calls integers or floating point randomly and each time np.random is called, a new random number is generated every time. So, sometimes during testing or when we have a big code, we do not want different numbers being generated randomly which will create confusion as it provides different results based on different random numbers generated. To avoid that, seed function can be used.

When we use seed() function, it starts with the same random number when called each time. See the code below:

```python
# Importing numpy library
import numpy as np

# Setting the seed. It can accept any integer
np.random.seed(0)

# Generating random data
np.random.rand(5)

# OUTPUT:
>>> array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])
```

When the np.random is called again, it generates the same set of values as the previous result.    

```python
# Importing numpy library
import numpy as np

np.random.seed(0)
np.random.rand(5)

# OUTPUT:
>>> array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])
```

In the below program we are not setting the seed, so when np.random is called, it generates a set of random value

```python
# Importing numpy library
import numpy as np

np.random.rand(5)

# OUTPUT:
>>> array([0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606])
```

And when np.random is called again, it generates a different set of values.

```python
# Importing numpy library
import numpy as np

np.random.rand(5)

# OUTPUT:
>>> array([0.0871293 , 0.0202184 , 0.83261985, 0.77815675, 0.87001215])
```

### Exercise

Solve the following questions:

1. Create a list [1,2,3,4,5] and store it in a variable 'a'. Shuffle 'a' and print the result

2. Permute 'a' and print the result

3. Generate a random number from 1 to 10 and the generated number should be same , everytime the code is executed.

In [125]:
import numpy as np

# Modify the code below
np.random.seed(3)
a = [1,2,3,4,5]
np.random.shuffle(a)
print ('A after shuffling is:',a)

np.random.seed(5)
b = np.random.permutation(a)
print('A after permutation is:',b)

np.random.seed(7)
c = np.random.randint(1,10)
print('C with seed:',c)


A after shuffling is: [4, 5, 2, 1, 3]
A after permutation is: [3 4 5 2 1]
C with seed: 5


### Solution code

```python
import numpy as np
a = [1,2,3,4,5]
np.random.shuffle(a)
print ('A after shuffling is: ' , a )
b = np.random.permutation(a)
print('A after permutation is:', b)
np.random.seed(0)
c = np.random.randint(1, 10)
print('C with seed:', c)
```

In [128]:
import numpy as np

np.random.seed(None)
print(np.random.randint(1,5))


np.random.seed(3)
print(np.random.randint(1,5))


3
3
