## <center><b>Python for Data Science</b></center>
## <center><b>Lesson 26</b></center>
## <center><b>NumPy -- Part Five</b></center>
## <center><b>Math & Statistics Topics (Notes)</b></center>

<hr style="border:1px solid gray">

![image.png](attachment:image.png)

### <center>[**Link: NumPy Documentation**](https://numpy.org/)</center>

##  <span style="color:green">TABLE OF CONTENTS</span>

1. [**Linear Algebra**](#1)<br> 
a. [**Element-wise operations**](#1a)<br> 
b. [**Operations involving multiple arrays**](#1b)<br> 
c. [**Assignment operators and arrays**](#1c)<br> 
&emsp;<br>
2. [**Trig Functions of Arrays**](#2)<br>
&emsp;<br>
3. [**More Linear Algebra Concepts**](#3)<br>
a. [**Matrix multiplication (row-by-column method)**](#3a)<br>
b. [**Determinants**](#3b)<br>
c. [**Matrix inverses**](#3c)<br>
&emsp;<br>
4. [**Statistics**](#4)<br>
a. [**Maximums and Minimums**](#4a)<br>
b. [**Sums**](#4b)<br>
c. [**Means**](#4c)<br>
d. [**Medians**](#4d)<br>
&emsp;<br>
5. [**NumPy Random Choice**](#5)<br>
&emsp;<br>
6. [**NumPy Random Normal**](#6)<br>

In [1]:
# set up notebook to display multiple output in one cell

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

print('The notebook is set up to display multiple output in one cell.')

The notebook is set up to display multiple output in one cell.


In [2]:
# import NumPy

import numpy as np

<hr style="border:1px solid gray">

<a class="anchor" id="1"></a>
<div class="alert alert-block alert-info">
<b><font size="4">1. Linear Algebra</font></b>
</div>

<a class="anchor" id="1a"></a>
### <span style="color:green"><b>a. Element-wise operations</b></span>

In [4]:
# create/initialize an array

a = np.array([1,2,3,4,5])
print(a)

[1 2 3 4 5]


In [7]:
# element-wise addition
a + 10

[11 12 13 14 15]


In [6]:
# element-wise subtraction
a - 3

array([-2, -1,  0,  1,  2])

In [8]:
# element-wise multiplication

a * 5

array([ 5, 10, 15, 20, 25])

In [9]:
# division element-wise division

a / 2

array([0.5, 1. , 1.5, 2. , 2.5])

In [13]:
# element-wise powers

a ** 3

array([  1,   8,  27,  64, 125], dtype=int32)

<a class="anchor" id="1b"></a>
### <span style="color:green"><b>b. Operations involving multiple arrays</b></span>

In [14]:
# operations involving multiple arrays

a = np.array([1,2,3,4,5])
print(a)

b = np.array([10, 20, 30, 40, 50])
print(b)

a + b
b - a
a * b
b / a
b ** a
(b + a) * (2 ** a)

[1 2 3 4 5]
[10 20 30 40 50]


array([11, 22, 33, 44, 55])

array([ 9, 18, 27, 36, 45])

array([ 10,  40,  90, 160, 250])

array([10., 10., 10., 10., 10.])

array([       10,       400,     27000,   2560000, 312500000], dtype=int32)

array([  22,   88,  264,  704, 1760])

<a class="anchor" id="1c"></a>
### <span style="color:green"><b>c. Assignment operators and arrays</b></span>

In [8]:
# Use same assignment operators as in Python

a = np.array([1,2,3,4,5])
print(a)

print()

a+=50
print(a)

print()

a*=2
print(a)

[1 2 3 4 5]

[51 52 53 54 55]

[102 104 106 108 110]


<a class="anchor" id="2"></a>
<div class="alert alert-block alert-info">
<b><font size="4">2. Trig Functions of Arrays</font></b>
</div>

In [11]:
# take the sine / take the cosine

a = np.array([1,2,3,4,5])
print(a)              # radian measures

print()

x = np.sin(a)
print(x)

print()

y = np.cos(a)
print(y)

[1 2 3 4 5]

[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]

[ 0.54030231 -0.41614684 -0.9899925  -0.65364362  0.28366219]


<b>For a lot more ...</b>

[NumPy Mathematical Functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

<a class="anchor" id="3"></a>
<div class="alert alert-block alert-info">
<b><font size="4">3. More Linear Algebra Concepts</font></b>
</div>

<a class="anchor" id="3a"></a>
### <span style="color:green"><b>a. Matrix multiplication (row-by-column method)</b></span>

In [12]:
# matrix multiplication ... use np.matmul()
# recall: for matrix multiplication to be defined -- # of columns in 1st matrix must equal # of rows in 2nd matrix
# recall: use the "row by column" method

a = np.ones((2,3))
print(a)

print()

b = np.full((3,2), 2)
print(b)

print()

x = np.matmul(a,b)
print(x)

[[1. 1. 1.]
 [1. 1. 1.]]

[[2 2]
 [2 2]
 [2 2]]

[[6. 6.]
 [6. 6.]]


In [13]:
# another matrix multiplication example

c = np.array([[4, 7, 2], [5, 1, 8]])
d = np.array([[1, 4, 8, 9], [6, 6, 2, 5], [7, 3, 2, 8]])

print(c)
print()

print(d)
print()

c.shape
d.shape

print()

x = np.matmul(c, d)
print(x)

[[4 7 2]
 [5 1 8]]

[[1 4 8 9]
 [6 6 2 5]
 [7 3 2 8]]



(2, 3)

(3, 4)


[[ 60  64  50  87]
 [ 67  50  58 114]]


<a class="anchor" id="3b"></a>
### <span style="color:green"><b>b. Determinants</b></span>

In [32]:
# determinants ... a special number associated with a square matrix

x = np.array([[2, 4, 1], [8, 3, -1], [0, 3, 2]])
print(x)

np.linalg.det(x)

[[ 2  4  1]
 [ 8  3 -1]
 [ 0  3  2]]


-21.999999999999986

In [14]:
# second example for determinants

y = np.identity(4)
print(y)

np.linalg.det(y)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


1.0

<a class="anchor" id="3c"></a>
### <span style="color:green"><b>c. Matrix inverses</b></span>

In [34]:
# Inverse of a matrix

a = np.array([[2, 4], [6, 2]])
print(a)

print()

a_inv = np.linalg.inv(a)
print(a_inv)

print()

np.matmul(a, a_inv)

print()

np.matmul(a_inv, a)

[[2 4]
 [6 2]]

[[-0.1  0.2]
 [ 0.3 -0.1]]



array([[1., 0.],
       [0., 1.]])




array([[1.00000000e+00, 0.00000000e+00],
       [2.77555756e-17, 1.00000000e+00]])

## Linear Algebra reference docs ...

[Linear Algebra](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
 
- Determinant
- Trace
- Singular Vector Decomposition
- Eigenvalues
- Matrix Norm
- Inverse
- Etc...

<a class="anchor" id="4"></a>
<div class="alert alert-block alert-info">
<b><font size="4">4. Statistics</font></b>
</div>

In [15]:
stats = np.array([[1,2,3],[4,5,6]])

print(stats)

[[1 2 3]
 [4 5 6]]


<a class="anchor" id="4a"></a>
### <span style="color:green"><b>a. Maximums and Minimums</b></span>

In [16]:
np.min(stats)

1

In [17]:
np.max(stats)

6

In [41]:
np.min(stats, axis = 1)

np.max(stats, axis = 0)

array([1, 4])

array([4, 5, 6])

<a class="anchor" id="4b"></a>
### <span style="color:green"><b>b. Sums</b></span>

In [18]:
print(stats)

np.sum(stats)

[[1 2 3]
 [4 5 6]]


21

In [19]:
print(stats)

np.sum(stats, axis = 0)

[[1 2 3]
 [4 5 6]]


array([5, 7, 9])

In [20]:
print(stats)

np.sum(stats, axis = 1)

[[1 2 3]
 [4 5 6]]


array([ 6, 15])

<a class="anchor" id="4c"></a>
### <span style="color:green"><b>c. Means</b></span>

In [21]:
print(stats)

np.mean(stats)

[[1 2 3]
 [4 5 6]]


3.5

In [46]:
print(stats)

np.mean(stats, axis = 0)

[[1 2 3]
 [4 5 6]]


array([2.5, 3.5, 4.5])

In [47]:
print(stats)

np.mean(stats, axis = 1)

[[1 2 3]
 [4 5 6]]


array([2., 5.])

<a class="anchor" id="4d"></a>
### <span style="color:green"><b>d. Medians</b></span>

In [48]:
print(stats)

np.median(stats)

[[1 2 3]
 [4 5 6]]


3.5

In [22]:
print(stats)

np.median(stats, axis = 0)

[[1 2 3]
 [4 5 6]]


array([2.5, 3.5, 4.5])

In [50]:
print(stats)

np.median(stats, axis = 1)

[[1 2 3]
 [4 5 6]]


array([2., 5.])

<a class="anchor" id="5"></a>
<div class="alert alert-block alert-info">
<b><font size="4">5. NumPy Random Choice</font></b>
</div>

<b>NUMPY RANDOM CHOICE HELPS YOU CREATE RANDOM SAMPLES</b>

One common task in data analysis, statistics, and related fields is taking random samples of data.

You’ll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.

NumPy random choice provides a way of creating random samples with the NumPy system.

<b>NUMPY RANDOM CHOICE GENERATES RANDOM SAMPLES</b>

If you’re working in Python and doing any sort of data work, chances are (lol), you’ll have to create a random sample at some point.

NumPy random choice can help you do just that.

<hr style="border:1px solid gray">

The way that np.random.choice works is you input some items, and the function will randomly choose one or more of them as the output.

<b> Syntax: numpy.random.choice(a, size=None, replace=True, p=None)</b>

Generates a random sample from a given 1-D array

![randomchoice.PNG](attachment:randomchoice.PNG)

<b>THE PARAMETERS OF NUMPY RANDOM CHOICE</b>

There are four parameters for the NumPy random choice function:

1. a
2. size
3. replace
4. p
Let’s discuss each of these individually.

<b>a</b> (REQUIRED)

The a parameter enables us to specify the array of input values … typically a NumPy array.

This is essentially the set of input elements from which we will generate the random sample.

Note that the a parameter is required … you need to provide some array-like structure that contains the inputs to the random selection process.

Also note that the a parameter is flexible in terms of the inputs that it will accept. Typically, we’ll supply a NumPy array of numbers to the a parameter. However, because it is flexible, it will also accept things like Python lists, tuples, and other Python sequences.

Moreover, instead of supplying a sequence like a NumPy array, you can also just provide a number (i.e., an integer). If you provide an integer n, it will create a NumPy array of integers up to but excluding n by using the NumPy arange function. In this case, it’s as if you supplied a NumPy array with the code np.arange(n). 

<b>SIZE</b>

The size parameter describes the size of the output.

Remember that the NumPy random choice function accepts an input of elements, chooses randomly from those elements, and outputs the random selections as a NumPy array.

Because the output of numpy.random.choice is a NumPy array, the array will have a size. 


<b>REPLACE</b>

The replace parameter specifies whether or not you want to sample with replacement. This parameter will take on a value of True or False, with the default value being True.


<b>P</b>

Finally, the p parameter controls the probability of selecting a given item.

By default, each item in the input array has an equal probability of being selected.

But we can change that. We can manually specify the probabilities of the different outcomes. 

Essentially, this is what the p parameter controls: the probabilities of selecting the different input elements.

Note that the p parameter is optional, and if we don’t provide anything, NumPy just treats each outcome as equally likely.

If we do provide something to the p parameter, then we need to provide it in the form of an “array like” object, such as a NumPy array, list, or tuple.

In [23]:
# SELECT A SINGLE RANDOM NUMBER WITH NP.RANDOM.CHOICE

class_sample = np.arange(start = 1, stop = 28)
print(class_sample)

np.random.seed(0)
np.random.choice(a = class_sample)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27]


13

<b>WHY DO WE NEED TO USE NP.RANDOM.SEED?</b>


Before we ran the line of code np.random.choice(a = class_sample), we ran the code np.random.seed(0).

We need np.random.seed because it “seeds” the random number generator for numpy.random.choice.

But what is a “seed” anyway?

The NumPy random choice function operates on the principle of pseudorandom number generation.

When we use a pseudorandom number generator, the numbers in the output approximate random numbers, but are not exactly “random.” In fact, when we use pseudorandom numbers, the output is actually deterministic; the output is actually determined by an initializing value called a “seed.”

Let me say that again: when we set a seed for a pseudorandom number generator, the output is completely determined by the seed.

What that means is that if we use the same seed, a pseudorandom number generator will produce the same output.

<b>NP.RANDOM.CHOICE IS ONLY PSEUDO RANDOM</b>

What this means is that np.random.choice is random-ish. It’s sort of random, in the sense that there will be no discernible relationship between the seed and the output. But you have to remember that using the same seed will produce the same output.

This is actually good, because it makes the results of a pseudorandom function reproducible. If I share my code with you, and you run it with the same seed, you will get the exact same result. This is good for code testing, among other things.

To read more about this concept, check out [NumPy Random Seed](https://www.sharpsightlabs.com/blog/numpy-random-seed/).

In [25]:
# SELECT A SINGLE RANDOM NUMBER WITH NP.RANDOM.CHOICE ... (SHORTHAND SYNTAX)

# np.random.seed(0)
np.random.choice(28)

# When we provide a number to np random choice this way, it will automatically create a NumPy array using NumPy arange. 

21

In [29]:
# SELECT A RANDOM SAMPLE FROM A NUMPY ARRAY

# CREATE A NUMPY ARRAY
array_0_to_99 = np.arange(100)
print(array_0_to_99)

# SELECT A RANDOM SAMPLE FROM THE NUMPY ARRAY
np.random.seed(1)
np.random.choice(array_0_to_99, size = 6)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


array([37, 12, 72,  9, 75,  5])

In [30]:
# PERFORM RANDOM SAMPLING WITH REPLACEMENT

# CREATE A NUMPY ARRAY
array_1_to_6 = np.arange(start = 1, stop =7)
print(array_1_to_6)

# CREATE A RANDOM SAMPLE FROM THE INPUT
np.random.seed(77)
np.random.choice(a = array_1_to_6, size = 3, replace = True)

[1 2 3 4 5 6]


array([5, 5, 4])

In [31]:
# CHANGE THE PROBABILITIES ASSOCIATED WITH OUTCOMES
array_1_to_6 = np.arange(start = 1, stop =7)
print(array_1_to_6)


# weight the probabilities and generate a single value
np.random.choice(a = array_1_to_6, p = [.5,.1,.1,.1,.1,.1])


# GENERATE A RANDOM SAMPLE FROM WEIGHTED INPUTS ... weight the probabilities and generate multiple values
np.random.seed(42)
np.random.choice(a = array_1_to_6, p = [.5,.1,.1,.1,.1,.1], size = 20)

[1 2 3 4 5 6]


1

array([1, 6, 4, 2, 1, 1, 1, 5, 3, 4, 1, 6, 5, 1, 1, 1, 1, 2, 1, 1])

In [33]:
# SELECT A RANDOM SAMPLE FROM LIST OF ITEMS

simple_cards = ['Diamond','Spade','Heart','Club']

# RANDOMLY SELECT ONE ITEM FROM THE PYTHON LIST
np.random.seed(0)
np.random.choice(simple_cards)

np.random.seed(55)
np.random.choice(a = simple_cards, size = 2, replace = False)

'Diamond'

array(['Diamond', 'Club'], dtype='<U7')

# RANDOM SAMPLING IS REALLY IMPORTANT FOR DATA SCIENCE

Random sampling is really important for data science, speaking broadly.

The reason is that random sampling is a key concept and technique in probability. It’s also very important in statistics. Moreover, sampling is also applicable to machine learning and deep learning.

Essentially, random sampling is really important for a variety of sub-disciplines of data science.

<a class="anchor" id="6"></a>
<div class="alert alert-block alert-info">
<b><font size="4">6. NumPy Random Normal</font></b>
</div>

<b>Syntax:random.normal(loc=0.0, scale=1.0, size=None)</b>

If you’re doing any sort of statistics or data science in Python, you’ll often need to work with random numbers. And in particular, you’ll often need to work with normally distributed numbers.

The NumPy random normal function generates a sample of numbers drawn from the normal distribution, otherwise called the Gaussian distribution.
    
The NumPy random normal function enables you to create a NumPy array that contains normally distributed data.

![syntax%20random%20normal.PNG](attachment:syntax%20random%20normal.PNG)

<b>THE PARAMETERS OF THE NP.RANDOM.NORMAL FUNCTION</b>
    
The np.random.normal function has three primary parameters that control the output: 
    
1. loc
2. scale
3. size.

<b>LOC</b>

The loc parameter controls the mean of the function.
    
![random%20normal%20mean.PNG](attachment:random%20normal%20mean.PNG)

This parameter defaults to 0, so if you don’t use this parameter to specify the mean of the distribution, the mean will be at 0.

<b>SCALE</b>

The scale parameter controls the standard deviation of the normal distribution.

![random%20normal%20sd.PNG](attachment:random%20normal%20sd.PNG)

By default, the scale parameter is set to 1.

<b>SIZE</b>

The size parameter controls the size and shape of the output.

Remember that the output will be a NumPy array. NumPy arrays can be 1-dimensional, 2-dimensional, or multi-dimensional (i.e., 3 or more).

The argument that you provide to the size parameter will dictate the size and shape of the output array.

If you provide a single integer, x, np.random.normal will provide x random normal values in a 1-dimensional NumPy array.

You can also specify a more complex output.

For example, if you specify size = (2, 3), np.random.normal will produce a numpy array with 2 rows and 3 columns. It will be filled with numbers drawn from a random normal distribution. You can also create ouput arrays with more than 2 dimensions.

<b>THE NP.RANDOM.RANDN FUNCTION</b>

There’s another function that’s similar to np.random.normal. It’s called np.random.randn.

Just like np.random.normal, the np.random.randn function produces numbers that are drawn from a normal distribution.

The major difference is that np.random.randn is like a special case of np.random.normal. np.random.randn operates like np.random.normal with loc = 0 and scale = 1 ... i.e., np.random.randn generates a sample of numbers drawn from the standard normal distribution

![image.png](attachment:image.png)

In [74]:
# DRAW A SINGLE NUMBER FROM THE STANDARD NORMAL DISTRIBUTION

np.random.normal(loc = 0, scale = 1, size = 1)

np.random.normal(size = 1)

# This code works the same as np.random.normal(loc = 0, scale = 1, size = 1). 
# Remember, if we don’t specify values for the loc and scale parameters, they will default to loc = 0 and scale = 1.

array([-0.93206632])

array([-0.73604081])

In [75]:
# MULTIPLE NUMBERS FROM THE STANDARD NORMAL DISTRIBUTION

np.random.normal(loc = 0, scale = 1, size = (1,5))
np.random.normal(loc = 0, scale = 1, size = (3,2))


array([[-0.71070172,  0.84740619,  0.02706108,  0.51391235, -0.04892722]])

array([[-1.12265789,  0.42764355],
       [ 0.20863975,  1.22363634],
       [-0.06293781, -1.19916794]])

In [76]:
# DRAW A SINGLE NUMBER FROM A NON-STANDARD NORMAL DISTRIBUTION

np.random.normal(loc = 90, scale = 5, size = 1)

# np.random.normal(90, 5, 1)

array([89.74010253])

In [77]:
# DRAW A MULTIPLE NUMBERS FROM A NON-STANDARD NORMAL DISTRIBUTION

np.random.normal(loc = 90, scale = 5, size = (3, 3))

# np.random.normal(90, 5, (3, 3))

array([[ 82.474028  ,  76.98793107,  87.28557681],
       [ 91.80678834, 103.43293852,  90.93078011],
       [ 95.51773874,  83.87256644,  85.64254265]])

In [78]:
# APPROXIMATE SAMPLING DISTRIBUTIONS

np.random.normal(loc = 90, scale = 5, size = 200)

np.random.normal(loc = 90, scale = 5, size = 200).mean()

np.random.normal(loc = 90, scale = 5, size = 200).std()


array([ 79.63899683,  90.39578442,  91.74386542,  87.31160634,
        93.36783079,  98.30750493,  87.40253868,  92.50888729,
        95.28204272,  89.34754808,  85.74603598,  90.6023652 ,
        88.54524458,  88.78851479,  85.12635654,  78.91280243,
        89.99220792,  90.11489144,  98.11671253,  82.78135383,
        94.99860044,  87.01705994,  98.0505379 ,  94.33558371,
        91.33151352,  90.82518157,  89.62309705,  89.91117292,
        87.65611194,  90.08691295,  89.64623268,  93.01655916,
        87.32826133,  93.51085791,  79.80973411,  88.91440566,
        88.68101976,  97.37287958,  94.29220356,  87.15279314,
        95.73145761,  87.04598864,  93.84824513,  97.71321813,
        86.78049698,  94.55609247,  94.93673283,  81.17721226,
        93.18805026, 101.2299048 ,  86.63762253,  97.819036  ,
        90.93367722,  85.63921376,  90.64627285,  86.53231891,
        86.18405321,  92.94957104,  90.57074399,  92.03735447,
        92.86839916,  87.16267996,  92.89462899,  95.78

89.98191998831778

4.887330084674507