# Numpy Section Intro

- The core library
- Centra object: The Numpy Array
- Be familiar with Python Lists
- One sentence summary: "Linear algebra and a bit of probability"

## Vectors and Matrices

![01-VectorAndMatrices.jpg](attachment:01-VectorAndMatrices.jpg)

- Vectors (as Series): Are one dimensional
- Matrixes (as DataFrames): Are two dimensional

More at: [Data Structures in Pandas](https://github.com/Victoralm/Data-Manipulation-in-Python---Master-Python-NumPy-and-Pandas/blob/main/Ch05%20-%20%20Python%20Pandas%20DataFrames%20and%20Series/01-PythonPandasDataFramesAndSeries.ipynb)

### Dot Product / Inner Product

![01-DotProduct.jpg](attachment:01-DotProduct.jpg)

[Ref](https://en.wikipedia.org/wiki/Dot_product)

Because we have to multiply corresponding elements from each of the two input
vectors.This implies that, in order for this to be a valid operation, both input
vectors must have the same shape.

### Matrix multiplication (C = AB)

- "Generalized" dot product
  
![01-MatrixMultiplication.jpg](attachment:01-MatrixMultiplication.jpg)

Although it doesn't seem like, it is kind of a more general form of the vector
in a product. When you multiply two matrices together, you actually end up doing
a bunch of mini dot products. You take row one of A and dotted with column one of
B, row one of A and dotted with column two of B, and so on.

Just like with the vector dot product, there are some limitations on the shapes
of the matrices that you want to multiply. In particular, the number of columns
in a mustequal to the number of rows in B. Another way of saying that is the inner
dimensions must match. If A has shape M by N and B has shape N by P, then the
matrix multiplication of A and B will have shape M by P.

[Ref](https://en.wikipedia.org/wiki/Matrix_multiplication#:~:text=For%20matrix%20multiplication%2C%20the%20number,B%20is%20denoted%20as%20AB.)

### Element-Wise Product

- Not so common in Linear Algebra, very common in ML

![01-ElementWiseProduct.jpg](attachment:01-ElementWiseProduct.jpg)

You take each IJ of element in A and multiply it by the IJ of element in B. And
clearly this requires that both A and B have the same shape, which will also be
the shape of the output.

[Ref](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)#:~:text=In%20mathematics%2C%20the%20Hadamard%20product,of%20the%20multiplied%20corresponding%20elements.)

### And many more

- Linear systems: Ax = b
- Inverse: A<sup>-1</sup>
- Determinant: | A |
- Choosing random numbers (e.g. Uniform, Gaussian)

### Applications

1. Linear Regression
2. Logistic Regression
3. Deep Neural Networks
4. K-Means Clustering
5. Density Estimation
6. Principal Componets Analysis
7. Matrix Factorization (Recommender Systems)
8. Support Vector Machines (SVM)
9. Markov Models, HMM
10. Control Systems
11. Game Theory
12. Operations Research
13. Portifolio Optimization

## Numpy Arrays vs Python Lists

In [10]:
import numpy as np

L = [1, 2, 3]

A = np.array([1, 2, 3])

display(L)
display(type(L))

display(A)
display(type(A))

for e in L:
    print(e)

print("Same as above")

for e in A:
    print(e)

[1, 2, 3]

list

array([1, 2, 3])

numpy.ndarray

1
2
3
Same as above
1
2
3


In [11]:
# Adding an element
L.append(4)
display(L)

[1, 2, 3, 4]

In [12]:
# Adding an element (cannot be done with numpy arrays)
A.append(4)
display(A)

AttributeError: 'numpy.ndarray' object has no attribute 'append'

Well, generally speaking, the size of a list can change, but the size of an
array is fixed. There are other ways to kind of add items to an array, but these
actually instantiate a new array. If you're more advanced in programming, maybe
you understand why that might be the case in terms of memory storage and
efficiency.

In [15]:
# Adding an element to a List

L + [5] # Not permanent

[1, 2, 3, 4, 5]

In [16]:
display(L)

[1, 2, 3, 4]

In [17]:
# Adding an element to a numpy array

A + np.array([4]) # Broadcasting: The value 4 was added to each element in the array. Not permanent

array([5, 6, 7])

In [18]:
display(A)

array([1, 2, 3])

In [19]:
# Adding multiple elements to a numpy array

A + np.array([4, 5, 6]) # Broadcasting: The values was added to each respective element in the array. Not permanent

array([5, 7, 9])

In [20]:
display(A)

array([1, 2, 3])

In [21]:
# Adding multiple elements to a numpy array, but with different amount of
# elements

A + np.array([4, 5]) # Cannot be done for arrays with different sizes (other than 1 element)

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [22]:
# Multiplying a numpy array

2 * A # Not permanent

array([2, 4, 6])

In [23]:
display(A)

array([1, 2, 3])

In [25]:
# Multiplying a Python List

2 * L # Repeats the List, same as "L + L". Not permanent

[1, 2, 3, 4, 1, 2, 3, 4]

In [26]:
display(L)

[1, 2, 3, 4]

In [27]:
L2 = []

for e in L:
    L2.append(e + 3)

In [28]:
display(L2)

[4, 5, 6, 7]

[ref: Python lists comprehension](https://www.w3schools.com/Python/python_lists_comprehension.asp)

In [33]:
# List comprehension

L2 = [e + 3 for e in L]

In [35]:
display(L2)

[4, 5, 6, 7]

In [36]:
# List Squared

L**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [38]:
# Numpy array Squared

A**2 # Not permanent

array([1, 4, 9])

In [39]:
display(A)

array([1, 2, 3])

In [42]:
# List squared (correct way)

L2 = []
for e in L:
    L2.append(e**2)

display(L2)

L2 = []
L2 = [e**2 for e in L]
display(L2)

[1, 4, 9, 16]

[1, 4, 9, 16]

Applying most math operations to a numpy array should, very often, operates
element wise.

In [43]:
# Root squaring a numpy array

np.sqrt(A)

array([1.        , 1.41421356, 1.73205081])

In [44]:
np.log(A)

array([0.        , 0.69314718, 1.09861229])

In [45]:
np.exp(A)

array([ 2.71828183,  7.3890561 , 20.08553692])

Another common function in deep learning, is the [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_functions).

In [46]:
np.tanh(A)

array([0.76159416, 0.96402758, 0.99505475])

A neat way to summarize this lecture is that: a list looks like an array, but it
works more like a generic data structure. On the other hand, a NumPy array exists
specifically to do math.

## Dot Product in Numpy

In [49]:
a = np.array([1, 2])
b = np.array([3, 4])

dot = 0

- Direct definition: That is, we want to multiply the two arrays element wise
  and sum those results together. See more [here](https://byjus.com/maths/direct-proportion/#:~:text=Direct%20proportion%20or%20direct%20variation,other%20quantity%20is%20inverted%20here.)

In [50]:
for e, f in zip(a, b):
    dot += e * f

display(dot)

11

- Indexing

In [53]:
dot = 0

for i in range(len(a)): # both, a and b, are expected to have the same size anyway
    dot += a[i] * b[i]

display(dot)

11

In [55]:
# Testing with the Lists direct multiplication

a * b # Does element wise multiplication, the dot product should be the sum of these result

array([3, 8])

In [56]:
np.sum(a * b)

11

One nice thing about NumPy is that a lot of these functions are also instance
methods, so you can call them on the NumPy array object directly.

In [57]:
(a * b).sum()

11

In [61]:
# The specific Numpy function to calculate Dot Products

np.dot(a, b)

11

In [62]:
a.dot(b)

11

In [65]:
# Dot Product using the @ operator
a @ b

11

Now, if you've studied linear algebra or geometry, you know that there's an
alternative definition of the dot product. Namely that the dot product of **a**
and **b** is equal to the magnitude of **a** times the magnitude of **b** times
the cosine of the angle between **a** and **b**. Of course, we don't actually
know the cosine of the angle between **a** and **b**. But we do know all the
other values from which we can calculate the cosine. So that's normally what we
would use this for programmatically.

![alt text](../img/01-DotProductAlternative.jpg)

[Magnitude](https://en.wikipedia.org/wiki/Magnitude_(mathematics))

[Sine and cosine](https://en.wikipedia.org/wiki/Sine_and_cosine)

In [67]:
# Magnitude of the vector

amag = np.sqrt((a * a).sum())

display(amag)

2.23606797749979

Now, being such a fundamental operation, NumPy does already have a function to
do this. It's located in the module linalge, which, as you can tell by its name,
contains linear algebra tools.

In [68]:
np.linalg.norm(a)

2.23606797749979

In [71]:
# Calculating the cosine

cosangle = a.dot(b) / (np.linalg.norm(a) * np.linalg.norm(b))

display(cosangle)

0.9838699100999074

In [74]:
# Calculating the angle using the Numpy function

angle = np.arccos(cosangle)

display(angle)

0.17985349979247847

## Speed test

How fast is Numpy compared to regular Python ?

In [75]:
from datetime import datetime as dt

In [76]:
a = np.random.randn(100)
b = np.random.randn(100)

display(a)
display(b)

T = 100000

array([ 1.26935888,  0.48228084, -0.54872797, -0.65684671,  0.09747269,
       -0.30615744, -2.78584955, -0.38216932, -1.5776625 ,  0.22773272,
        0.64136167, -0.77902198, -0.12344927, -0.0468022 ,  0.02909097,
       -1.29991968,  0.01572334, -1.45622831,  0.47521154,  1.50567134,
       -0.971895  ,  2.31256619, -0.71637905,  0.21440713, -0.36769442,
        0.07881698,  0.47581542,  0.90522198, -0.77460878,  0.03076403,
        0.6940784 , -0.23730942,  1.38718436,  1.44320604,  1.33082132,
       -0.69309083, -0.96805051, -0.76089616,  0.4051621 ,  0.38556569,
       -0.13965165,  0.35201255, -2.20549076,  0.11796191, -1.03439096,
        0.72545449, -0.09340884, -0.59448945, -0.62033226,  0.66302756,
       -1.82607607,  2.18554356,  0.9704127 , -0.29580462, -0.26262151,
        1.56036623, -2.76626777, -1.84109727, -1.06008561, -0.11752354,
       -2.14641587,  0.86392497, -0.24471224, -0.93944584,  2.29730572,
        0.52129066,  0.25359198,  1.80742438, -0.00808354, -0.34

array([-0.01466052, -0.93561081,  0.96840952,  1.61138035,  0.57857073,
       -1.05822985, -1.58563528,  0.72085634, -1.04640349, -0.30963628,
        0.56133979, -0.61385162, -0.86627114,  1.05787145,  1.11079161,
       -1.15537172,  0.09231209, -1.62360388,  1.14858164,  0.24476251,
        0.14174996,  1.48065861, -0.77626174, -0.93314707, -0.10933996,
        0.28246746,  0.0278593 ,  1.18528401, -0.3953894 ,  0.66765185,
       -1.48009919, -0.54238607,  0.68277714,  0.27718801, -0.91687257,
       -0.1666232 ,  0.56990372, -1.02885067,  0.58212789, -0.06429155,
       -0.86147003,  1.07326972, -1.62643121,  1.36398362, -0.2958341 ,
       -0.47702933,  0.67165135,  0.65236672,  0.28593581,  1.3952247 ,
        1.59854134,  0.38141371, -1.08552505, -0.80679218,  0.89074496,
       -0.91747391, -0.75897614,  2.89911544,  1.57475849,  0.57419405,
        1.29027315, -0.49826946,  2.37409895,  0.49536152,  1.31181621,
       -0.66248045,  0.4865664 ,  1.13405111, -0.76901658, -1.60

In [77]:
def slow_dot_product(a, b):
    result = 0
    for e, f in zip(a, b):
        result += e * f
    return result

In [91]:
t0 = dt.now()

for t in range(T):
    slow_dot_product(a, b)

dt1 = dt.now() - t0

display(dt1)

t0 = dt.now()

for t in range(T):
    a.dot(b)

dt2 = dt.now() - t0

display(dt2)

print("dt1 / dt2 =", dt1.total_seconds() / dt2.total_seconds())

datetime.timedelta(microseconds=908326)

datetime.timedelta(microseconds=36945)

dt1 / dt2 = 24.585897956421707


Almost 25x faster.

As an exercise, you might want to try list comprehensions to see if they improve the results for regular Python.

In [98]:
def slow_dot_productLC(a, b):
    result = [e * f for e, f in zip(a, b)]
    return result

In [102]:
t0 = dt.now()

for t in range(T):
    slow_dot_productLC(a, b)

dt1 = dt.now() - t0

display(dt1)

t0 = dt.now()

for t in range(T):
    a.dot(b)

dt2 = dt.now() - t0

display(dt2)

print("dt1 / dt2 =", dt1.total_seconds() / dt2.total_seconds())

datetime.timedelta(microseconds=733151)

datetime.timedelta(microseconds=41149)

dt1 / dt2 = 17.816982186687405


Now, after all this, you might wonder. It's pretty obvious that if I have two
vectors, I'm not going to do a for loop when calling, the dot function is clearly
easier. Sure, it may be faster, and that's nice, but why the heck would you write
this big long for loopin the first place when you can just call asingle function
and it does the same thing? Well, the key is you have to think ahead. Obviously,
in this case, it's pretty clear that you wouldn't use a for loop. But what if you
have an equation like this?

![alt text](../img/01-AvoidingForLoop.jpg)

The natural approach when you see a summation is to do a loop. In fact, if
that's not your natural inclination, then probably something is very strange
about the way you program. And so this isn't a lesson of if you see two vectors
and you want to take the dot product, don't do a for loop. That's already pretty
obvious. Instead, the lesson is really for these more complicated situations
where it's not obvious at all how you can avoid having a for loop. So keeping the
idea in mind that you want to avoid for loops, even if it means a significant
amount of work for you is the real challenge.