# Numpy II

## 1. Generating array with random numbers

Numpy library has a sub-module called 'random', which is used to generate random numbers for a given distribution. It is especially useful for randomly sampling data for specific experiments.

Two functions - rand() and randint() can be studied. These functions are most common and intuitive in terms of usage.
* **rand()** function takes an integer as an argument and generates a given number of random values between 0 and 1. The values generated by this function are floating point values.
* **randint()** function takes 3 parameters - lower limit, upper limit and number of values to be generated. As the name suggests, it generates a given number of random integers in a specified range.

Examples:

```python
np.random.rand(5)
>>> array([ 0.93371582,  0.82386466,  0.34771991,  0.59338646,  0.41190981])

np.random.randint(1000.20,5000.50,10)
>>> array([4825, 1466, 4025, 2931, 1693, 2385, 2857, 1767, 2902, 1759])
```


### Exercise

Generate an array of 5 floating point values using numpy's random functions. Each value in the array should be greater than 1. (Pls use seed value)

In [1]:
import numpy as np

# Modify the code below
mixed_array = []

# hint

Use both the rand() and randint() functions and sum them to create an array of random floating values greater than 1. Make sure to set a value greater than 1 for the lower limit of values for randint() function. Use np.random.seed(0) to fix a seed value.

In [2]:
# solution
np.random.seed(0)
floats = np.random.rand(5)
ints = np.random.randint(1,5,5)

mixed_array = ints + floats
print(mixed_array)

[2.5488135  3.71518937 1.60276338 4.54488318 3.4236548 ]


In [3]:
from refactored import unittest
np.random.seed(0)

ref_tmp_var = False
value_check = True

for i in mixed_array:
    if i<=1:
        value_check = False

ref_tmp_var = unittest.test_value(np.random.rand(5)) and value_check
assert ref_tmp_var

## 2. Re-shaping an array

The reshape() function in numpy helps us reshape a given array into an array with a specified new shape. For example,

<img src="../images/numpy_2-reshaping_array.png" width="500">

```python
shape_shifter = np.random.rand(12)
shape_shifter
>>> array([ 0.906423  ,  0.55807204,  0.28928162,  0.47020116,  0.27403332,
>>>         0.94178672,  0.81342077,  0.5859645 ,  0.63569185,  0.84614272,
>>>         0.36454835,  0.63664789])

shape_shifter.shape
>>> (12,)

shape_shifter.reshape(3,4)
>>> array([[ 0.906423  ,  0.55807204,  0.28928162,  0.47020116],
>>>        [ 0.27403332,  0.94178672,  0.81342077,  0.5859645 ],
>>>        [ 0.63569185,  0.84614272,  0.36454835,  0.63664789]])

shape_shifter.reshape(4,3)
>>> array([[ 0.906423  ,  0.55807204,  0.28928162],
>>>        [ 0.47020116,  0.27403332,  0.94178672],
>>>        [ 0.81342077,  0.5859645 ,  0.63569185],
>>>        [ 0.84614272,  0.36454835,  0.63664789]])
```

### Exercise

Change the shape of the given array to 2 rows, 5 columns

In [4]:
# Modify the code below

twor_fivec = np.arange(10)

# hint

Use the **reshape()** function and refer to the usage above

In [5]:
twor_fivec = twor_fivec.reshape(2,5)
twor_fivec

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [6]:
ref_tmp_var = False

ref_tmp_var = unittest.test_value(np.arange(10).reshape(2,5))

assert ref_tmp_var

## 3. Indexing and Selection

The numpy array works like the list data structure and elements can be accessed by using their respective indices. The first element of an array is indexed with a '0' index and subsequent elements are indexed as 1,2,3...and so on, the nth element in the array will have an index of 'n-1'.

<img src="../images/numpy_2-indexing_array.png" width="400">

``` python
n_arr = np.array([1, 7, 4, 3, 3])

n_arr[3:5]
# Selects elements from index '3' to '4' (i.e until, but not including the specified end value)
>>> array([3, 3])

n_arr[:3]
# Absence of start value defaults to index '0' (i.e the first element)
>>> array([1, 7, 4])

n_arr[2:]
# Absence of end value defaults to index 'n-1' (i.e the last element)
>>> array([4, 3, 3])

n_arr[:]
>>> array([1, 7, 4, 3, 3])

n_arr[-1]
# Negative indexing corresponds to counting from the last
>>> 3
```

Elements of a numpy array can also be selected (or conditionally retrieved)  by using a condition in place of an index. When an array is subject to a condition (as we will show below), each element of the array will be validated against the said condition and a boolean array is generated which reflects the satisfaction of the set condition by every element of the array. When an 'array condition' is used in place of an index, the boolean array so generated gets passed to the outer array, and all elements which lie in the 'True' positions of the boolean array get retrieved. The below examples will clear this concept.

```python
array_one = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_one > 10
>>> array([ True, False, False, False, True, False, False, True, False, True])

array_one[array_one>10]
>>> array([16, 17, 15, 14])

array_two = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array_two < 5
>>> array([ True,  True,  True,  True,  True, False, False, False, False, False])

array_two[array_two < 5]
>>> array([0, 1, 2, 3, 4])
```

### Exercise

Retrieve all elements in the given array that are greater than or equal to 25.63

In [7]:
array_three = np.array([46.56311588, 49.66285409, 28.01145694, 15.4632352, 16.36194605, 23.26915095, 36.77562698, 41.97868793, 35.6520983, 24.85098496])

# hint

Refer to the examples above

In [8]:
array_three[array_three >= 25.63]

array([46.56311588, 49.66285409, 28.01145694, 36.77562698, 41.97868793,
       35.6520983 ])

In [9]:
ref_tmp_var = False

ref_tmp_var = unittest.test_value(array_three[array_three >= 25.63])

assert ref_tmp_var

## 4. Re-casting, Broadcasting and Duplicating arrays

Re-casting and broadcasting are two ways to change the values of an array. If one or more values (but not all) of an array are changed, it is called **re-casting**. If all values of the array are changed, it would be called as **broadcasting**. The above scenario where we conditionally extracted elements of array could be modified to conditionally re-cast certain elements of an array. Refer to the below examples:

<img src="../images/numpy_2-recasting_array.png" width="600">

* <b>Re-casting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[3:6] = 100
array_rec
>>> array([ 16,   1,   8, 100, 100, 100,   8,  15,   6,  14])
```

* <b>Broadcasting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[:] = 100
array_rec
>>> array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])
```

* <b>Conditional re-casting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[array_rec>10] = 100
array_rec
>>> array([100,   1,   8,   1, 100,  10,   8, 100,   6, 100])
```

**Duplicating** a numpy array is a tricky thing. As per normal programming routines, the value of a variable can be assigned to another variable, thus creating a copy. See example below:
```python
a = 10
b = 10
print(b)
>>> 10
```
However, when the same logic is used in assigning arrays, the values are not assigned but rather the pointers (or addresses) of original array elements are stored in the new array. It is for this reason that, any change in the second array will also reflect in the first array.
```python
arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1
arr_2[3:6] = 4444
print(arr_1,arr_2)
>>> [1, 2, 3, 4444, 4444, 4444, 7, 8, 9] [1, 2, 3, 4444, 4444, 4444, 7, 8, 9]

or

arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1
arr_2[:] = [1,22,333,4444,55555,666666,7777777,88888888,999999999]
print(arr_1,arr_2)
>>> [1, 22, 333, 4444, 55555, 666666, 7777777, 88888888, 999999999] [1, 22, 333, 4444, 55555, 666666, 7777777, 88888888,
>>>  999999999]
```

Hence, when a separate copy of an array is to be made, then the .copy() function needs to be used so as to create a new copy of the array which can be changed, without affecting the original array.

```python
arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1.copy()
arr_2[3:6] = 4444
print(arr_1,arr_2)
>>> [1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4444, 4444, 4444, 7, 8, 9]
```

### Exercise

Given array 'tran_arr', create two copies of the array - one copy just referencing the values of the original array ('tran_arr') and another copy which duplicates the values of the array using .copy() function. Set 5th element of first copy array to 25, and set 5th element of second copy array to 50. Print all 3 arrays and observe the changes. 

In [10]:
tran_arr = np.arange(1,21)
copy_1 = []
copy_2 = []

# hint

In [11]:
copy_1 = tran_arr
copy_2 = tran_arr.copy()
copy_1[4] = 25
copy_2[4] = 50
print(tran_arr,"\n",copy_1,"\n",copy_2)

[ 1  2  3  4 25  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20] 
 [ 1  2  3  4 25  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20] 
 [ 1  2  3  4 50  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]


In [12]:
ref_tmp_var = False

a1 = [1, 2, 3, 4, 25, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
a2 = [1, 2, 3, 4, 50, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

pass_val = False
if np.array_equal(a1,copy_1) and np.array_equal(a2,copy_2):
    pass_val = True

ref_tmp_var = pass_val

assert ref_tmp_var

## 5. Max, Min, ArgMax, ArgMin

Four simple functions that help a great deal when performing numerical computations on a large array of data is max(), min(), argmax() and argmin().

* max() - can be used to find out what is the maximum value in a given array
* min() - can be used to find out what is the minimum value in a given array
* argmax() - can be used to find out what is the index position of the maximum value in the given array
* argmin () - can be used to find out what is the index position of the minimum value in the given array

```python
shape_shifter
>>> array([ 0.906423  ,  0.55807204,  0.28928162,  0.47020116,  0.27403332,
>>>         0.94178672,  0.81342077,  0.5859645 ,  0.63569185,  0.84614272,
>>>         0.36454835,  0.63664789])

shape_shifter.max()
>>> 0.94178671566784411

shape_shifter.min()
>>> 0.27403331882439208

shape_shifter.argmax()
>>> 5

shape_shifter.argmin()
>>> 4
```

### Exercise

An array is created below. Use the max, min, argmax and argmin functions on the given array and print the results out

In [13]:
# Edit the code below

X = np.array([70, 81, 80, 55, 48, 17, 60, 80, 20, 46])
# max_X = 
# min_X = 
# argmax_X = 
# argmin_X = 

# hint

In [14]:
max_X = X.max()
min_X = X.min()
argmax_X = X.argmax()
argmin_X = X.argmin()
print("Max value is %d,\nMin value is %d,\nMax value index is %d,\nMin value index is %d"
      %(max_X,min_X,argmax_X,argmin_X))

Max value is 81,
Min value is 17,
Max value index is 1,
Min value index is 5


In [15]:
ref_tmp_var = False

ref_tmp_var = unittest.test_value(X.max()) and unittest.test_value(X.min()) and unittest.test_value(X.argmax()) and unittest.test_value(X.argmin())

assert ref_tmp_var

## 6. Numpy.Random module

The 'random' module in the numpy library is used for random sampling of data from given data sets. The module contains functions which
1. enables random sampling from simple data
2. enables random sampling from well known data distributions
3. Permutations and other functions like shuffling and seeding

### Sampling random numbers from simple data

#### rand() function

The rand() function randomly selects numbers from a 'uniform distribution' in the range [0,1], i.e., zero inclusive and 1 exclusive. When we say the distribution is uniform, we mean to say that there is equal probability of any number between 0 and 1 getting selected.

#### randn() function

The randn() function randomly selects numbers from a 'standard normal distribution' with a mean of 0 and a variance of 1, i.e., there is equal chance of a positive or negative number getting selected, with a higher chance of a number closer to 0 getting selected and chances dimming down as the value is farther away from the mean 0.

#### randint() function

The randint() function randomly selects integers from a given range of numbers. Since this selects only integers between the given range of numbers, it is called a 'discrete uniform distribution' as there is equal chance of any integer getting selected.


###  Exercise

Use the rand(), randn() and randint() function between the range 3 and 8 and store the results in the variable ' rand_a ', 'randn_a' and 'randint_a'. Print the results



In [None]:
import numpy as np

# Modify the code below

rand_a = 
print("Result from performing rand:")
randn_a = 
print("Result from performing randn:")
randint_a = 
print("Result from performing randint:")



# Hint

1. use random.rand function.
2. use random.randn. 
3. use random.randint.

In [29]:
# Solution
import numpy as np
rand_a = np.random.rand(3,8)
print("Result from performing rand:", rand_a)
randn_a = np.random.randn(3,8)
print("Result from performing randn:", randn_a)
randint_a = np.random.randint(3, 8)
print("Result from performing randint:", randint_a)


Result from performing rand: [[0.85794562 0.84725174 0.6235637  0.38438171 0.29753461 0.05671298
  0.27265629 0.47766512]
 [0.81216873 0.47997717 0.3927848  0.83607876 0.33739616 0.64817187
  0.36824154 0.95715516]
 [0.14035078 0.87008726 0.47360805 0.80091075 0.52047748 0.67887953
  0.72063265 0.58201979]]
Result from performing randn: [[ 1.59456053  0.23043417 -0.06491034 -0.96898025  0.59124281 -0.7827755
  -0.44423283 -0.34518616]
 [-0.88180055 -0.44265324 -0.5409163  -1.32322737 -0.11279892  0.90734594
   0.81526991  0.22909795]
 [-1.02617878  0.47752547  1.29269823 -0.73145824 -1.60540226  0.98947618
   0.11081461 -0.38093141]]
Result from performing randint: 7


In [None]:
# unit testing

### Permutations and other functions like shuffling and seeding


#### shuffle() function

The shuffle function simply rearranges the contents whose action is similar to the shuffling action done in playing cards.
This function only shuffles the array along the first axis of a multi-dimensional array (i.e.) when we use this shuffle function, the contents along the first axis which is the first row of a multidimensional array will get shuffled but the contents of the array will not change. 

In the below program, a 3*3 array is passed and is passed through the shuffle function 

```python

   import numpy as np
   arr = np.array([[3, 4, 5], [6, 7, 8], [0, 1, 2]])
   np.random.shuffle(arr)
   arr


OUTPUT:
    
   array([[6, 7, 8],
          [3, 4, 5],
          [0, 1, 2]])

```

The output is obtained by jus reorganising the array which is done along the first row of the matrix.


#### permutation() function

Permuation means the combination or grouping of many different possible things. So, permutation function randomly combines or groups a sequence which returns the permuted result.

Permutation is done when we need to select set of numbers from a list and form an order. For example: If in a class, we want to select first three ranks from a group of 100 students. Fo when we find the first rank student(A) , we leave out that person A as he was already selected and do permutation on the remaining 99 students. and once 2nd rank student is found. these two student's names are left out and permutation is done on the remaining. So in permutation order plays a major role.

Similar to the shuffle function, for a multi-dimensional array, the permutation is done along its first index. 

Permutation and Shuffle will work the same except that shuffle will reawrranges the elements in random order whereas permutation will organise in orderly way. Permutation function will be useful when we need to do classification.

For permutation we will have a copy of the shuffled result whereas for the shuffle we get the shuffling with the element itself.

In the below program, we passed an array to the variable 'arr' and permutation function is used. 

```python

    import numpy as np
    arr = np.array([[3, 4, 5], [6, 7, 8], [0, 1, 2]])
    np.random.permutation(arr)
OUTPUT:
    array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
```

The output above is obtained from the concept of permutation. first array is selected and then the permutation is done on the remaining two row. 

Though the resulting answer might look the same for shuffle and permutation operation. The concept for both will vary. 

#### Seed() function

np.random calls integers or floating point randomly and each time np.random is called, a new random number is generated every time. So, sometimes during testing or when we have a big code, we do not want different numbers being generated randomly which will create confusion as it provides different results based on different random numbers generated. To avoid that, seed function can be used.

When we use seed() function, it starts with the same random number when called each time. See the code below:

```python

    import numpy as np
    np.random.seed(0)
    np.random.rand(5)

OUTPUT:

    array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])
    
When the np.random is called again, it generates the same set of values as the previous result.    

    import numpy as np
    np.random.seed(0)
    np.random.rand(5)

OUTPUT:

    array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])


In the below program, When np.random is called, it generates a set of random value

    import numpy as np
    np.random.rand(5)

OUTPUT:

    array([0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606])

When np.random is called again, it generates a different set of values.

    import numpy as np
    np.random.rand(5)

OUTPUT:
    
    array([0.0871293 , 0.0202184 , 0.83261985, 0.77815675, 0.87001215])


```

### Exercise

Solve the following questions:

1. Create a list [1,2,3,4,5] and store it in a variable 'a'. Shuffle 'a' and print the result

2. Permute 'a' and print the result

3. Generate a random number from 1 to 10 and the generated number should be same , everytime the code is executed.

In [74]:
import numpy as np

# Modify the code below

a= 

print ('A after shuffling is:')

b = 
print('A after permutation is:')

c = 
print('C with seed:')

A after shuffling is:  [ 9  7 10 14 11  8 13  0  6  3 12  4  2  5  1]


# Hint

1. use shuffle function to create 15 numbers
2. Store it in a variable 'b' and print like step 1.
3. use randint for generating one integer.


In [27]:
# Solution
import numpy as np
a = [1,2,3,4,5]
np.random.shuffle(a)
print ('A after shuffling is: ' , a )
b = np.random.permutation(a)
print('A after permutation is:', b)
np.random.seed(0)
c = np.random.randint(1, 10)
print('C with seed:', c)




A after shuffling is:  [5, 3, 2, 4, 1]
A after permutation is: [3 4 1 5 2]
C with seed: 6


In [None]:
#unit testing


### Sampling random numbers from well known data distributions

#### Binomial function

A binomial distribution is a probabilty of having an outcome that either a success or a failure. The distribution has two possible outcome which is success or failure that is obtained from the number of times a survey or an experiment is performed.
The result of this distribuation is either a success or failure (i.e.) when a new drug for a cure is tested, the result either a 
success or failure for the cure. 

The np.random.binomial function pulls the number of the samples specified by the user from the binomial distribution

 ```python   

    np.random.binomial(1,0.3 , 3)
 
 OUTPUT: 
 
     array([0, 0, 1])

```

#### Chisquare function

The Standard Normal Distribution is a normal distribution which has a zero mean and standard deviation 1. The STD curve's center will be at zero on the axis and the standard deviation is measured from the degree by which the measurement deviate from the mean. 
The chi square distribution is the distribution(df) of the sum of the squared standard normal deviates. 

The np.random.chisquare function pulls the number of samples from a chi-square distribution. 

>>> numpy.random.chisquare(df, size=None)

```python
 
    import numpy as np
    np.random.chisquare(3,2)

OUTPUT:
    
    array([0.33570358, 3.53852457])
    
```

#### Exponential function


The exponential distribution is a continuous distribution. A continuous distribution is a set of all possible value which is infinite and countable. The exponential distribution is the most comonly used distribution continuous. 
The exponential distribution is used to reproduce the result of the time interval between the events (i.e.) the start and the end of the event.

The np.random.exponential function pulls the number of samples specified by the user from a chi-square distribution. 


```python

     np.random.exponential(1, 2)
     
 OUTPUT:
     
     array([0.18400851, 0.34144262])


```

#### Geometric function

Bionomial distribution has one of two outcomes, either success or failure. but a geometric distribution is performed on number of expriements till the outcome results in success. So this distribution is supported by positives values.

The np.random.geometric function pulls the number of samples specified by the user from a geometric distribution. 

```python
    import numpy as np
    geo = np.random.geometric(p=0.5, size=10)  // where p is the probabilty of success in single trial and size is the number of                                                    samples that we would want to pull from the distribution
    geo

OUTPUT:
    
    array([1, 1, 2, 9, 2, 2, 5, 1, 2, 1])
    
```

#### Poisson function

Poisson Distribution is the distribution that is used for identifying the total number of success events occuring in a given time limit. 
np.random.poisson function draw samples from a Poisson distribution.

```python

    import numpy as np
    pois = np.random.poisson(3, 8) // 3 is expected interval and 8 is the number of samples that we need.
    pois

OUTPUT
    
    array([ 1,  1, 11,  1,  3,  3,  3,  3])
    

```


#### Uniform function

Uniform distribution is a continuous distribution which has a constant value throughout. Uniform distribution is defined by the lower or higher chances of an event occuring. 

np.random.uniform function pulls the number of samples specified by the user from a Uniform distribution. 


```python
    uni_dist = np.random.uniform(1,3,10)
    uni_dist

OUTPUT:
    array([1.98277548, 1.47078083, 2.97779683, 2.13713806, 1.8976292 ,
       2.63856215, 1.04455004, 1.15975864, 2.80673453, 1.45120926])
    
```    

#### F function 

F Distribution is a distribution which finds the ratio of two samples to check whether they have same variance (i.e.) the means between two sample values has notable difference in them. The values of F are only positive. This distribution is helpful in identifying which experiment is better than others by noting the difference they show. 

np.random.f function pulls the number of samples specified by the user from a Uniform distribution. 


```python

    import numpy as np
    s = np.random.f(1, 4, 10) // Where 1 is the degree of freedom in numerator and 4 is the degree of freedom in denominator
    s

OUTPUT:

    array([11.8876891 ,  0.91116938,  1.27070929,  1.46581719,  4.11247805,
       28.53869887,  1.99272372,  0.18135954,  0.066219  ,  1.06589472])

```


Refer the following for better understanding of degree of freedom.

http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-degrees-of-freedom-in-statistics

#### Normal function

Normal distribution is a continuous probability distribution which is also called a bell curve with mean median and mode to be equal. The curve is symmetric and the area under the curve is 1. This distribution helps in finding the percentage of data taht falls within the curve.

The np.random.Normal function pulls the number of samples specified by the user from a Normal distribution. 

```python

    s = np.random.normal(2, 0.2, 3) // Where 2 is mean, 0.2 is standard deviation and 3 is the size of the sample we want.
    
 OUTPUT:
 
    array([2.23873731, 1.99493943, 1.98627681])

 ```   


### Exercise

Identify the type of distribution needs to be used and write the code :

For lambda values of 20, 30 and 40, draw 50 values and then store it in a variable ' dist'

In [76]:
import numpy as np

# Modify the code below
dist = 


array([2.23873731, 1.99493943, 1.98627681])

# Hint
use Poisson distribution.
group lambda as lam=(20., 30., 40.) and size as size=(50, 3)

In [None]:
# Solution

dist = np.random.poisson(lam=(20., 30., 40.), size=(50, 3))

In [None]:
#unit testing