# Lab 1: Getting started with Numpy
I am excited about getting going and working on cool stuff, but first let's spend a bit of time learning the basics of the tools we will be using  

## Numpy

Let's spend a few minutes just learning some of the fundamentals of numpy. (pronounced in most circles as num-pie) 

### what is numpy
"Definition"

Let's look at an example. Suppose I start with a little table:

| a  | b | c  |  d | e |
| :---: | :---: | :---: | :---: | :---: |
| 0 | 1 | 2 | 3 | 4 |
|10| 11| 12 | 13 | 14|
|20| 21 | 22 | 23 | 24 |
|30 | 31 | 32 | 33 | 34 |
|40 |41 | 42 | 43 | 44 |

and I simply want to add 10 to each cell:

| a  | b | c  |  d | e |
| :---: | :---: | :---: | :---: | :---: |
| 10 | 11 | 12 | 13 | 14 |
|20| 21| 22 | 23 | 24|
|30| 31 | 32 | 33 | 34 |
|40 | 41 | 42 | 43 | 44 |
|50 |51 | 52 | 53 | 54 |

To make things interesting, instead of a a 5 x5 array, let's make it 1,000x1,000 -- so 1 million cells!

So I construct it in generic Python

In [None]:
     a = [[x + y * 1000 for x in range(1000)] for y in range(1000)]


and then write a little function that will make the addition

In [None]:
def addToArr(sizeof):
    for i in range(sizeof):
        for j in range(sizeof):
            a[i][j] = a[i][j] + 10


Here's how much time it takes to run that function:

In [None]:
%time addToArr(1000)

So about 1/8 of a second. 

Now using Numpy

#### in Numpy

### built in functions
In addition to being faster, numpy has a wide range of built in functions. So, for example, instead of you writing code to calculate the mean or sum or standard deviation of a multidimensional array you can just use numpy:

In [None]:
arr.mean()

In [None]:
arr.sum()

In [None]:
 arr.std()

So not only is it faster, but it minimizes the code you have to write. A win, win.

Let's continue with some basics.

## numpy examined 
So Numpy is a library containing a super-fast n-dimensional array object and a load of functions that can operate on those arrays. To use numpy, we must first load the library into our code and we do that with the statement:


In [None]:
 import numpy as np

Perhaps most of you are saying "fine, fine, I know this already", but let me catch others up to speed. This is just one of several ways we can load a library into Python. We could just say:

In [None]:
 import numpy

and everytime we need to use one of the functions builtin to numpy we would need to preface that function with `numpy` . So for example, we could create an array with


In [None]:
arr = numpy.array([1, 2, 3, 4, 5])

If we got tired of writing `numpy` in front of every function instead of

In [None]:
import numpy

we could write:

In [None]:
from numpy import *

(where that * means 'everything' and the whole expression means import everything from the numpy library).  Now we can use any numpy function without putting numpy in front of it:

In [None]:
np.subtract(arr1, arr2)

In [None]:
np.multiply(arr1, arr2)

In [None]:
np.divide(arr1, arr2)

#### maximum / minimum


In [None]:
arr1 = np.array([[10, 2], [3, 40]])
arr2 = np.array([[1, 20], [30, 4]])
np.maximum(arr1, arr2)

#### these are just examples. There are more unary and binary functions

## Numpy Uber
lets say I have Uber drivers at various intersections around Austin. I will represent that as a set of x,y coordinates.

 | Driver |xPos | yPos |
 | :---: | :---: | :---: |
 | Ann | 4 | 5 |
 | Clara | 6 | 6 |
 | Dora | 3 | 1 |
 | Erica | 9 | 5 |
 
 
 Now I would like to find the closest driver to a customer who is at 6, 3.
 And to further define *closest* I am going to use what is called **Manhattan Distance**. Roughly put, Manhattan distance is distance if you followed streets. Ann, for example, is two blocks West of our customer and two blocks north. So the Manhattan distance from Ann to our customer is `2+2` or `4`. 
 
 First, to make things easy (and because the data in a numpy array must be of the same type), I will represent the x and y positions in one numpy array and the driver names in another:

In [None]:
locations = np.array([[4, 5], [6, 6], [3, 1], [9,5]])
locations

In [None]:
drivers = np.array(["Ann", "Clara", "Dora", "Erica"])

Our customer is at

In [None]:
cust = np.array([6, 3])

now we are going to figure out the distance between each of our drivers and the customer

We don't need to name arrays `arr`, we can name them anything we want. 

In [None]:
ratings = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [None]:
ratings

So far, we've been creating numpy arrays by using Python lists. I can make that more explicit by first creating the Python list and then using it to create the ndarray:

In [None]:
pythonArray = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
sweet = np.array(pythonArray)
sweet

I can also create an of all zeros or all ones directly:

In [None]:
np.zeros(10)

In [None]:
np.ones((5, 2))

### indexing
Indexing elements in ndarrays works pretty much the same as it does in Python. We have already seen one example, here is another example with a one dimensional array:


In [None]:
temperatures = np.array([48, 44, 37, 35, 32, 29, 33, 36, 42])
temperatures[0]

In [None]:
temperatures[3]

and a two dimensional one:

In [None]:
sample = np.array([[10, 20, 30], [40, 50, 60]])
sample[0][1]

For numpy ndarrays we can also use a comma to separate the indices of multi-dimensional arrays:

In [None]:
sample[1,2]

And, like Python you can also get a slice of an array. First, here is the basic Python example:

In [None]:
a = [10, 20, 30, 40, 50, 60]
b = a[1:4]
b

and the similar numpy example:

In [None]:
aarr = np.array(a)
barr = aarr[1:4]
barr

### maybe wacky
But there is a difference between Python arrays and numpy ndarrays. If I alter the array `b` in Python the orginal `a` array is not altered:

In [None]:
b[1] = b[1] + 5

In [None]:
b

In [None]:
a

but if we do the same in numpy:

In [None]:
xydiff = locations - cust
xydiff

NOTE: displaying the results with `xydiff` isn't a necessary step. I just like seeing intermediate results.

Ok. now I am goint to sum the absolute values:

In [None]:
distances =np.abs(xydiff).sum(axis = 1)
distances

So the output is the array `[4, 3, 5, 5]` which shows that Ann is 4 away from our customer; Clara is 3 away and so on.

Now I am going to sort these using `argsort`:

This is a one dimensional array. The position of an element in the array is called the index. The first element of the array is at index 0, the next at index 1 and so on. We can get the item at a particular index by using the syntax:

In [None]:
 arr[0]

In [None]:
arr[3]

In [None]:
<h1 style="color:red">aaaa</h1>

We can create a 2 dimensional array that looks like

      10  20  30
      40  50  60
 
by:


In [None]:
 arr = np.array([[10, 20, 30], [40, 50, 60]])

and we can show the contents of that array just be using the name of the array, `arr`


In [None]:
arr

In [None]:
barr[1] = barr[1] + 5

In [None]:
barr

In [None]:
aarr`

we see that the original array is altered if we modify the slice. This may seem wacky to you, or maybe it doesn't. In any case, it is something you will get used to. For now, just be aware of this. 

## functions on arrays

Numpy has a wide range of array functons. Here is just a sample.

### Unary functions

#### absolute value

In [None]:
arr = np.array([-2, 12, -25, 0])
arr2 = np.abs(arr)
arr2

In [None]:
arr = np.array([[-2, 12], [-25, 0]])
arr2 = np.abs(arr)
arr2               

#### square

In [None]:
arr = np.array([-1, 2, -3, 4])
arr2 = np.square(arr)
arr2

#### squareroot

In [None]:
arr = np.array([[4, 9], [16, 25]])
arr2 = np.sqrt(arr)
arr2

## Binary functions

#### add /subtract / multiply / divide


In [None]:
arr1 = np.array([[10, 20], [30, 40]])
arr2 = np.array([[1, 2], [3, 4]])
np.add(arr1, arr2)

In [None]:
arr = np.array([1, 2, 3, 4, 5])

and to display what `arr` equals

In [None]:
arr = array([1, 2, 3, 4, 5])

This may at first seem like a good idea, but it is considered bad form by Python developers. 

The solution is to use what we initially introduced:

this makes `np` an alias for numpy. so now we would put *np* in front of numpy functions.

In [None]:
 arr = np.array([1, 2, 3, 4, 5])

Of course I could use anything as an alias for numpy:

In [None]:
sorted = np.argsort(distances)
sorted

`argsort` returns an array of sorted indices. So the element at position 1 is the smallest followed by the element at position 0 and so on.

Next, I am going to get the first element of that array (in this case 1) and find the name of the driver at that position in the `drivers` array

In [None]:
drivers[sorted[0]]

<h3 style="color:red">Q2. You Try</h3>
<span style="color:red">Can you put all the above in a function. that takes 3 arguments, the location array, the array containing the names of the drivers, and the array containing the location of the customer. It should return the name of the closest driver.</span>


In [None]:
def findDriver(distanceArr, driversArr, customerArr):
   result = ''
   ### put your code here
   return result
print(findDriver(locations, drivers, cust)) # this should return Clara

### CONGRATULATIONS

Even though this is just an intro to Numpy, I am going to throw some math at you. So far we have been looking at a two dimensional example, x and y (or North-South and East-West) and our distance formula for the distance, Dist between Ann, A and Customer C is

$$ DIST_{AC} = |A_x - C_x | + |A_y - C_y | $$

Now I am going to warp this a bit. In this example, each driver is represented by an array (as is the customer) So, Ann is represented by `[1,2]` and the customer by `[3,4]`. So Ann's 0th element is 1 and the customer's 0th element is 3. And, sorry, computer science people start counting at 0 but math people (and all other normal people) start at 1 so we  can rewrite the above formula as:

$$ DIST_{AC} = |A_1 - C_1 | + |A_2 - C_2 | $$

That's the distance formula for Ann and the Customer. We can make the formula by saying the distance between and two people, let's call them *x* and *y* is


$$ DIST_{xy} = |x_1 - y_1 | + |x_2 - y_2 | $$

That is the formula for  2 dimensional Manhattan Distance. We can imagine a three dimensional case.  

$$ DIST_{xy} = |x_1 - y_1 | + |x_2 - y_2 | + |x_3 - y_3 | $$

and we can generalize the formula to the n-dimensional case.
 
$$ DIST_{xy}=\sum_{i=1}^n |x_i - y_i| $$

Just in time for a five dimensional example:


# The Amazing 5D Music example

Guests went into a listening both and rated the following tunes:

* Janelle Monae Tightrope
* Major Lazer - Cold Water 
* Tim McGraw - Humble & Kind
* Maren Morris - My Church
* Hailee Steinfeld - Starving


Here are the results:

| Guest  | Janelle Monae  | Major Lazer  | Tim McGraw  |  Maren Morris | Hailee Steinfeld| 
|---|---|---|---|---|---|
|  Ann | 4  |  5 | 2  |  1 | 3 |
| Ben  |  3 |  1 |  5 | 4  | 2|
| Jordyn  | 5  |  5 | 2  | 2  | 3|
|  Sam | 4 | 1 | 4 | 4 | 1|
| Hyunseo | 1 | 1 | 5 | 4 | 1 |
| Ahmed | 4 | 5 | 3 |  3 | 1 |

My task is to find out who is closest in musical taste to Ahmed. I'll set up a few numpy arrays.


In [None]:
customers = np.array([[4, 5, 2, 1, 3],
                      [3, 1, 5, 4, 2],
                      [5, 5, 2, 2, 3],
                      [4, 1, 4, 4, 1], 
                      [1, 1, 5, 4, 1]])

customerNames = np.array(["Ann", "Ben", 'Jordyn', "Sam", "Hyunseo"])

ahmed = np.array([4, 5, 3, 3, 1])

<h3 style="color:red">Q3. You Try</h3>
<span style="color:red">Can you write a function findClosest that takes 3 arguments: customers, customerNames, and an array representing one customers ratings and returns the name of the closest customer?</span>


In [None]:
def findClosest(customers, customerNames, x):
   result = ''
   ### your code here   
   return result
findClosest(customers, customerNames, ahmed) # Should return Ann

## Numpy Amazon

We are going to start with the same array:

 
 | Drone |xPos | yPos |
 | :---: | :---: | :---: |
 | Ann | 4 | 5 |
 | Clara | 6 | 6 |
 | Dora | 3 | 1 |
 | Erica | 9 | 5 |
 
 But this time, instead of Uber drivers, think of these as positions of Amazon delivery drones (and of course Amazon gives its delivery drones female names)
 Now we would like to find the closest drone to a customer who is at 7, 1.
 
With the previous example we used Manhattan Distance.  With drones, we can compute the distance as the crow flies -- or Euclidean Distance.  We probably learned how to do this way back in 7th grade when we learned the Pythagorean Theorem which states:

$$c^2 = a^2 + b^2$$

Where *c* is the hypotenuse and *a* and *b* are the two other sides. So, if we want to find *c*:

$$c = \sqrt{a^2 + b^2}$$


If we want to find the distance between two people, *x* and *y* then the formula becomes

$$Dist_{xy} = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2}$$

and for Ann who is at `[4,5]` and our customer who is at `[7,1]` then the formula becomes:

$$Dist_{xy} = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2} = \sqrt{(4-7)^2 + (5-1)^2} =\sqrt{-3^2 + 4^2}  = \sqrt{9 + 16} = \sqrt{25} = 5$$

Sweet!  And to generalize this distance formula:

$$Dist_{xy} = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2}$$

to n-dimensions:

$$Dist_{xy} = \sum_{i=1}^n{\sqrt{(x_i-y_i)^2}}$$


<h4 style="color:red">Q4. You Try</h3>
<span style="color:red">Can you write a function euclidean that takes 3 arguments: customers, customerNames, and an array representing one customers ratings and returns the name of the closest customer?</span>
