# NumPy - Math

In [2]:
import numpy as np
import time

- Starting by importing numpy and time. With time we can seed our random function to generate random arrays of numbers.
- Below we have the syntax for seeding your generator with the current time as well as the syntax for generating a table of random values.

In [3]:
np.random.seed(seed=int(time.time()))
A = np.random.randint(0, 10, [2, 3])
A

array([[5, 2, 3],
       [6, 6, 4]])

## N Dimentional Array Methods

- Unsurprisingly, the sum method will give us the sum of the whole array, the rows, the columns, etc.

In [4]:
A.sum()

26

- Passing a value to the axis parameter indicates the axis we want to add. This gives us a new list of the values added by those axis.

In [5]:
A.sum(axis=0)

array([11,  8,  7])

In [6]:
A.sum(axis=1)

array([10, 16])

- Can do a "cumulative sum" with .cumsum which will add the first value, then the second, then the third.. etc.

In [7]:
A.cumsum()

array([ 5,  7, 10, 16, 22, 26])

- Product: every element multiplied together.

In [8]:
A.prod()

4320

- Can also do an "cumulative product" similar to cumulative sum.

In [9]:
A.cumprod()

array([   5,   10,   30,  180, 1080, 4320])

- Find the minimum value in the table.
- Can also find the minimum in an specific axis.

In [10]:
A.min()

2

In [11]:
A.min(axis=0)

array([5, 2, 3])

- If we need to find the position in the array of the minimum value we have "argmin".
    - "argmax" exists as well.

In [12]:
A.argmin()

1

- Likewise, there are sort methods. 
    - Sort will sort the elements
    - Argsort will sort but return the index of the sort.
    - In other words, the first number returned will be the index of the smalles number, then the second number returned will be the index of the second smallest number, etc.

In [13]:
A.sort()
A

array([[2, 3, 5],
       [4, 6, 6]])

In [14]:
A.argsort(axis=1)

array([[0, 1, 2],
       [0, 1, 2]])

## Methodes for ND Arrays

- Here we are returning the minimum arguments' index along the given axis.

In [15]:
A.argmin(axis=0)

array([0, 0, 0])

- To calculate the exponencial f(x) = e^x {{https://en.wikipedia.org/wiki/Exponential_function}}
    - Run the exp() method, but this method does not belong to A, it belongs to numpy, so the syntax is a little different.

In [16]:
np.exp(A)

array([[  7.3890561 ,  20.08553692, 148.4131591 ],
       [ 54.59815003, 403.42879349, 403.42879349]])

- Similarly, you can take the log of an array with the same syntax.
- Note: If there is a 0 in your set, you'll get an error with log because that's imposible, but the rest of the array will still calculate.

In [17]:
np.log(A)

array([[0.69314718, 1.09861229, 1.60943791],
       [1.38629436, 1.79175947, 1.79175947]])

- Similarly, you can do this for any other math functions as well.

In [18]:
np.sin(A)

array([[ 0.90929743,  0.14112001, -0.95892427],
       [-0.7568025 , -0.2794155 , -0.2794155 ]])

## Statistics

- Some functions we have for tables related to statistics inlude:
    - A.min(axis), A.argmin(axis)
    - A.max(axis), A.argmax(axis)
    - A.mean(axis)
    - A.var(axis)
    - A.stdaxis)

In [19]:
A

array([[2, 3, 5],
       [4, 6, 6]])

In [20]:
A.min()

2

In [21]:
A.argmax()      # This says, "Which index is the largest value?"

4

In [22]:
A.mean(axis=1)

array([3.33333333, 5.33333333])

In [23]:
A.var(axis=1)       # This is for the variance {{https://en.wikipedia.org/wiki/Variance}}

array([1.55555556, 0.88888889])

In [24]:
A.std()             # This finds the standard deviation of a set {{https://en.wikipedia.org/wiki/Standard_deviation}}

1.4907119849998596

- Of note is the function corrcoef - "Correlation Coeficient"
    - More can be found here: {{https://en.wikipedia.org/wiki/Pearson_correlation_coefficient}}
- This is useful for machine leanring because it helps us find the correlation between different rows and columns.

In [25]:
np.corrcoef(A)

array([[1.        , 0.75592895],
       [0.75592895, 1.        ]])

- What do we do with this?
- This method generates a new table of correlation coeficients that tell us the correlation between different lines.
- Note that there are 1 1's and the othe numbers are the same as well.
np.corrcoef(A) =    [[  ,   L1,   L2]
                     [L1,    1, L1L2]
                     [L2, L2L1,    1]]
    - Some resources to understand this:
        {{https://www.youtube.com/watch?v=ugd4k3dC_8Y}}
        {{https://youtu.be/RwFiNlL4Q8g?list=PLO_fdPEVlfKqMDNmCFzQISI2H_nJcEDJq&t=513}}
        
- Of note, we can also only print the specific value of the array we want with [0, 1] for example.

In [26]:
np.corrcoef(A)[0, 1]

0.7559289460184545

- Another very useful method we use in machine learning is np.unique which helps us find data that is present in an array and how many times that object appears in the data.
    - This returns 2 arrays.
    - The first is an array (sorted) with only unique elements appearing once. We assign this one to the variable "values".
    - The second is an array that indicates how many times each element shows up in the data. We assign this one to the variable "counts".

In [27]:
values, counts = np.unique(A, return_counts=True)
print(f"values: {values}")
print(f"counts: {counts}")

values: [2 3 4 5 6]
counts: [1 1 1 1 2]


- As we saw earlier, we can use argsort on this table go get the index of each value in sorted order.
- This allows us to find the indexes of the values in counts from largest to smallest, effectively sorting them.

In [28]:
counts.argsort()

array([0, 1, 2, 3, 4])

In [29]:
print(f"A is: {A}")
print(f"values is: {values}")
values[counts.argsort()]

A is: [[2 3 5]
 [4 6 6]]
values is: [2 3 4 5 6]


array([2, 3, 4, 5, 6])

- Here we can take this concept and display a sorted list of the values in these arrays like this.

In [30]:
for i, j in zip(values[counts.argsort()], counts[counts.argsort()]):
    print(f'value {i} appears {j} times.')

value 2 appears 1 times.
value 3 appears 1 times.
value 4 appears 1 times.
value 5 appears 1 times.
value 6 appears 2 times.


## NAN Corrections

- Sometimes we get data from the real world that is missing data or or has pieces of data that are not neatly configured as we need them.
- These can be seen as NAN values or "Not a number".
- When we have these in our data set, we can delete them or find another way to manage them in the data OR, we can use some specific methods in numpy that are built to work around NaN values.

In [31]:
np.random.seed(seed=int(time.time()))
A = np.random.randn(5, 5)
A[4, 3] = np.nan
A[2, 2] = np.nan
A

array([[-0.1657069 ,  0.58098198, -0.79595588, -0.13082515, -0.57211599],
       [-0.73131704,  0.18485121, -0.22003161, -0.53161091, -0.90212297],
       [-0.37383206,  1.6221193 ,         nan,  0.65048878, -0.48808581],
       [ 0.82960262,  0.54680405, -0.08788553,  0.38077557,  2.34141552],
       [ 0.54172713, -0.09122517,  0.35512869,         nan, -1.66352157]])

In [32]:
print(f"NaN mean: {np.nanmean(A)}")
print(f"NaN var: {np.nanvar(A)}")
print(f"NaN std: {np.nanstd(A)}")

NaN mean: 0.055637316177178646
NaN var: 0.7058867633986917
NaN std: 0.8401706751599295


- Sometimes we'll want to know how many times a NaN is in our data.
- We can count them like this.
    - We can use isnan() to create a boolean mask.
    - With booleans we know that False is 0 and True is 1, so we can just take the sum of this mask and find the total number of Trues therein.

In [33]:
np.isnan(A).sum()

2

- We can also divide by the total size like shown below to find the percentage of our data that is NaN.

In [34]:
np.isnan(A).sum()/A.size

0.08

- Now that we have a mask, we can use that to replace all the NaN values with a 0 if we want.
- We do this by using out mask and re-inserting it into A with the value of 0.

In [35]:
A[np.isnan(A)] = 0
A

array([[-0.1657069 ,  0.58098198, -0.79595588, -0.13082515, -0.57211599],
       [-0.73131704,  0.18485121, -0.22003161, -0.53161091, -0.90212297],
       [-0.37383206,  1.6221193 ,  0.        ,  0.65048878, -0.48808581],
       [ 0.82960262,  0.54680405, -0.08788553,  0.38077557,  2.34141552],
       [ 0.54172713, -0.09122517,  0.35512869,  0.        , -1.66352157]])

## Linear Algebra

- Instructor indicated that this part of the video is for people who know a little bit about linear algebra and want to learn how to use numpy for linear algebra, but this is not necessary.
- I'm going to skip this section for the notes and leave a time link in case I want to come back in the future.
    - Link to tutorial: {{https://youtu.be/RwFiNlL4Q8g?list=PLO_fdPEVlfKqMDNmCFzQISI2H_nJcEDJq&t=1174}}
    - Link to NumPy library on Linear Algebra: {{https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html}}