![title](https://bids.berkeley.edu/sites/default/files/styles/400x225/public/projects/numpy_project_page.jpg?itok=flrdydei)

# Working with vectors using NumPy

In Data Science, and especially in machine learning, vectors are a very important element. Vectors are a way of representing dimensions that are not possible to visualize with traditional visualizations like graphs and plots. 
The study of vectors and operations with them is based on linear algebra, so I highly recommend you have knowledge of this kind of math before going on. In any case, in this notebook you'll learn some basic functionalities of numpy, with a long homework in the end. To start using numpy, we first have to import the numpy package. To import a package that is already installed on our Python environment, we use the **import** keyword. For commodity, we can set a name for our import using the **as** keyword, so it's easier to access later.

In [2]:
import numpy as np

The first step towards working with numpy arrays,is well, to create them. There are a multitude of ways to do this:

In [12]:
#Creating a numpy array from a list.
thelist = [1,2,5,7]
nplist = np.array(thelist)
print(nplist)
print(type(nplist))

#Create a numpy array from random numbers.
random_list = np.random.random(5)
print(random_list)

#Create a numpy array of integers of equally spaced numbers.
int_list = np.arange(5,20,5,dtype=np.int)
print(int_list)

[1 2 5 7]
<class 'numpy.ndarray'>
[ 0.65912476  0.17512482  0.02149847  0.88961566  0.19205749]
[ 5 10 15]


But now you may be wondering, what's the use of this? Well one of the uses is, remember how to calculate the max or the min of a list, you had to iterate with a for loop? NumPy has alot of built-in functions that will help you do those calculations in no time. Let's look at some of them:

In [15]:
built_list = np.random.random(10)
print(built_list)
print(built_list.max()) #Returns the maximum value of the array
print(built_list.min()) #Returns the minimum value of the array
print(built_list.mean())#Returns the mean of the values of the array

[ 0.94678458  0.1371171   0.02074209  0.62969723  0.55500494  0.31965473
  0.28206598  0.72580073  0.04594773  0.30045881]
0.946784577455
0.0207420942195
0.396327390775


And the list goes on. Some other functions include (taken from the numpy documentation):

**ndarray.ndim** : the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

**ndarray.shape** :the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

**ndarray.size**: the total number of elements of the array. This is equal to the product of the elements of shape.

**ndarray.dtype**: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

And other mathematical functions like sine,cosine, dot product, cross product, etc.

#### Your turn now.
Create an array of random numbers using numpy. Then calculate the mean and the standard deviation of the array. And finally, apply mean normalization to each element of the array. 

**Hint:**  Mean normalization is: For each element in the array, change its value to 

    (the original value - the mean)/ the standard deviation.

In [16]:
#Your code here

## Other operations with NumPy arrays

Some other important operations that we can perform using numpy arrays are:

#### Element wise mathematical operations.

In [27]:
sum_list = np.zeros(10) #Array filled with zeroes
sum_list = sum_list + 1 #Add 1 to every element of the array.

print(sum_list)

sum_list = sum_list * 2
print(sum_list) #Multiplies every element of the array by 2.

print(sum_list**2) #Calculates the square for every element of the array

print(sum_list%2) #Calculates the modulo fir each element of the array

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
[ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
[ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


#### Subsetting, like in lists. 

In [36]:
random_2d =  np.random.random((5,5)) #Make a 2d array of random numbers.

print("The original array")
print(random_2d)

print("All elements after the second row")
print(random_2d[2:])

print("All elements after the third column")
print(random_2d[:,3:])

print("All elements after the second row and column")
print(random_2d[2:,2:])

print("The first element")
print(random_2d[0,0])

The original array
[[ 0.64281489  0.56225797  0.3380681   0.87958594  0.43520121]
 [ 0.39613631  0.50423416  0.74635644  0.90014113  0.60509439]
 [ 0.03660969  0.99404361  0.31592843  0.51309934  0.72402514]
 [ 0.74108907  0.35581193  0.20995168  0.92571765  0.24200304]
 [ 0.99195839  0.08377799  0.05384151  0.68187788  0.49275358]]
All elements after the second row
[[ 0.03660969  0.99404361  0.31592843  0.51309934  0.72402514]
 [ 0.74108907  0.35581193  0.20995168  0.92571765  0.24200304]
 [ 0.99195839  0.08377799  0.05384151  0.68187788  0.49275358]]
All elements after the third column
[[ 0.87958594  0.43520121]
 [ 0.90014113  0.60509439]
 [ 0.51309934  0.72402514]
 [ 0.92571765  0.24200304]
 [ 0.68187788  0.49275358]]
All elements after the second row and column
[[ 0.31592843  0.51309934  0.72402514]
 [ 0.20995168  0.92571765  0.24200304]
 [ 0.05384151  0.68187788  0.49275358]]
The first element
0.642814886529


As a data scientist, you'll be working a lot with 2d arrays like this (basically excel tables and csv files), so knowing how to subset them is crucial to your success.

#### Simulations

Numpy also has neat distributions, like binomial and uniform, to run event simulations. For example, simulate how many times a coin will land on heads after being tossed:

In [45]:
sims = np.ndarray((500,), int)
for i in range(0,500):
    sims[i] = np.random.binomial(500,.5)

print(sims)

[245 257 231 248 254 242 255 229 252 243 267 230 251 259 238 252 256 261
 263 237 246 266 261 247 260 252 251 260 278 248 248 238 264 253 238 243
 286 239 249 242 243 244 241 241 244 241 264 236 259 247 259 257 238 241
 244 249 240 247 241 246 249 243 251 246 260 274 262 243 247 246 243 247
 262 262 224 242 255 239 246 252 228 256 239 246 235 264 247 252 248 250
 245 256 247 239 246 268 239 265 253 239 265 240 267 253 264 259 237 272
 254 247 262 246 249 247 253 243 242 234 248 262 241 248 268 247 260 254
 251 244 259 238 248 258 253 244 249 250 257 245 255 256 243 245 238 250
 263 259 242 254 257 256 263 257 243 252 244 264 235 263 233 276 241 243
 250 263 243 268 237 255 262 236 248 261 256 270 261 258 252 260 251 239
 262 230 258 247 244 260 241 254 229 253 247 254 246 245 249 237 267 249
 271 238 258 252 247 223 241 242 241 242 249 224 246 257 255 237 252 241
 255 266 238 249 247 260 255 243 239 229 232 237 246 264 250 247 242 254
 261 249 252 254 260 256 236 236 266 230 250 271 24

## Exercise

Taken from Harvard's CS109 course. Feel free to consult the numpy documentation if needed: https://docs.scipy.org/doc/numpy-dev/user/index.html

In a gameshow, contestants try to guess which of 3 closed doors contain a cash prize (goats are behind the other two doors). Of course, the odds of choosing the correct door are 1 in 3. As a twist, the host of the show occasionally opens a door after a contestant makes his or her choice. This door is always one of the two the contestant did not pick, and is also always one of the goat doors (note that it is always possible to do this, since there are two goat doors). At this point, the contestant has the option of keeping his or her original choice, or swtiching to the other unopened door. The question is: is there any benefit to switching doors?

First, write a function called simulate_prizedoor. This function will simulate the location of the prize in many games -- see the detailed specification below



In [46]:
"""
Function
--------
simulate_prizedoor

Generate a random array of 0s, 1s, and 2s, representing
hiding a prize between door 0, door 1, and door 2

Parameters
----------
nsim : int
    The number of simulations to run

Returns
-------
sims : array
    Random array of 0s, 1s, and 2s

Example
-------
>>> print simulate_prizedoor(3)
array([0, 0, 2])
"""
#Your code here.

'\nFunction\n--------\nsimulate_prizedoor\n\nGenerate a random array of 0s, 1s, and 2s, representing\nhiding a prize between door 0, door 1, and door 2\n\nParameters\n----------\nnsim : int\n    The number of simulations to run\n\nReturns\n-------\nsims : array\n    Random array of 0s, 1s, and 2s\n\nExample\n-------\n>>> print simulate_prizedoor(3)\narray([0, 0, 2])\n'

Next, write a function that simulates the contestant's guesses for nsim simulations. Call this function simulate_guess. The specs:

In [47]:
"""
Function
--------
simulate_guess

Return any strategy for guessing which door a prize is behind. This
could be a random strategy, one that always guesses 2, whatever.

Parameters
----------
nsim : int
    The number of simulations to generate guesses for

Returns
-------
guesses : array
    An array of guesses. Each guess is a 0, 1, or 2

Example
-------
>>> print simulate_guess(5)
array([0, 0, 0, 0, 0])
"""
#Your code here

'\nFunction\n--------\nsimulate_guess\n\nReturn any strategy for guessing which door a prize is behind. This\ncould be a random strategy, one that always guesses 2, whatever.\n\nParameters\n----------\nnsim : int\n    The number of simulations to generate guesses for\n\nReturns\n-------\nguesses : array\n    An array of guesses. Each guess is a 0, 1, or 2\n\nExample\n-------\n>>> print simulate_guess(5)\narray([0, 0, 0, 0, 0])\n'

Next, write a function, goat_door, to simulate randomly revealing one of the goat doors that a contestant didn't pick.

In [49]:
"""
Function
--------
goat_door

Simulate the opening of a "goat door" that doesn't contain the prize,
and is different from the contestants guess

Parameters
----------
prizedoors : array
    The door that the prize is behind in each simulation
guesses : array
    THe door that the contestant guessed in each simulation

Returns
-------
goats : array
    The goat door that is opened for each simulation. Each item is 0, 1, or 2, and is different
    from both prizedoors and guesses

Examples
--------
>>> print goat_door(np.array([0, 1, 2]), np.array([1, 1, 1]))
>>> array([2, 2, 0])
"""
def possibilites(x):
    return {
        0:np.random.choice([1,2],p = [0.5,0.5]),
        1:2,
        2:np.random.choice([0,2],p = [0.5,0.5]),
        3:0,
        4:np.random.choice([0,1],p = [0.5,0.5])
    }[x]

def fillArray(doors,guesses):
    answer = np.ndarray((len(doors),),int)
    for i in range(0,len(doors)):
        answer[i] = possibilites(doors[i] + guesses[i])
    return answer
        
#Your code here


Write a function, switch_guess, that represents the strategy of always switching a guess after the goat door is opened.

In [50]:
"""
Function
--------
switch_guess

The strategy that always switches a guess after the goat door is opened

Parameters
----------
guesses : array
     Array of original guesses, for each simulation
goatdoors : array
     Array of revealed goat doors for each simulation

Returns
-------
The new door after switching. Should be different from both guesses and goatdoors

Examples
--------
>>> print switch_guess(np.array([0, 1, 2]), np.array([1, 2, 1]))
>>> array([2, 0, 0])

def switch_guess(guesses,goatdoors):
    return fillArray(goatdoors,guesses)
"""
#Your code here

'\nFunction\n--------\nswitch_guess\n\nThe strategy that always switches a guess after the goat door is opened\n\nParameters\n----------\nguesses : array\n     Array of original guesses, for each simulation\ngoatdoors : array\n     Array of revealed goat doors for each simulation\n\nReturns\n-------\nThe new door after switching. Should be different from both guesses and goatdoors\n\nExamples\n--------\n>>> print switch_guess(np.array([0, 1, 2]), np.array([1, 2, 1]))\n>>> array([2, 0, 0])\n\ndef switch_guess(guesses,goatdoors):\n    return fillArray(goatdoors,guesses)\n'

Last function: write a win_percentage function that takes an array of guesses and prizedoors, and returns the percent of correct guesses

In [52]:
"""
Function
--------
win_percentage

Calculate the percent of times that a simulation of guesses is correct

Parameters
-----------
guesses : array
    Guesses for each simulation
prizedoors : array
    Location of prize for each simulation

Returns
--------
percentage : number between 0 and 100
    The win percentage

Examples
---------
>>> print win_percentage(np.array([0, 1, 2]), np.array([0, 0, 0]))
33.333
"""
#Your code here

'\nFunction\n--------\nwin_percentage\n\nCalculate the percent of times that a simulation of guesses is correct\n\nParameters\n-----------\nguesses : array\n    Guesses for each simulation\nprizedoors : array\n    Location of prize for each simulation\n\nReturns\n--------\npercentage : number between 0 and 100\n    The win percentage\n\nExamples\n---------\n>>> print win_percentage(np.array([0, 1, 2]), np.array([0, 0, 0]))\n33.333\n'


Now, put it together. Simulate 10000 games where contestant keeps his original guess, and 10000 games where the contestant switches his door after a goat door is revealed. Compute the percentage of time the contestant wins under either strategy. Is one strategy better than the other?

In [None]:
#Your code here