## Notebook Topic: Numpy Arrays

<ins>Learning Objectives</ins>

1. To do analysis on a numpy array

One of the most important Python libraries in scientific computing is the **numpy** library.  Moreover, the numpy array is one of the most important data structures in Python.  Therefore, it is worth at least one notebook to become comfortable with its features.  

Luckily, we've been working with both the **numpy** library and numpy arrays in previous notebooks!  So you have some familiarity already.  In this notebook, we'll be more intentional about that learning and follow along Chapter 4 in our textbook, *Python for Data Analysis*.

**Section I: Element-Wise Operations**

In [1]:
# import numpy
import numpy as np

# create a list
x = [1.5, -0.1, 3]
y = [0, -3, 6.5]

# can make a list a vector (or 1-dimensional numpy array)
x_np = np.array(x)

# or, can make a matrix from multiple lists (or 2-dimensional numpy array)
mat = np.array([x,y])

In a new code chunk, write out *x_np.shape* and *mat.shape*.  

1. What do you get for each one?  
2. What do you think *shape* attribute gives you, more generically?

Insert your answers here or in a new text box below.

Often, we want to apply operations to each element of a matrix.  

For example, 

* let **A** be a dataset where each row represents a home for sale in Berkshire county and each column represents a feature that contributes to the sale of the home (for example, total number of square feet, number of bedrooms, the age of the house, the age of the roof, etc.).  In the last column of **A**, the price is listed.

* Currently the prices are outdated.  We know inflation has caused the sale price of each home skyrocket.  They are now 10 times greater than the current dataset lists.  

* If there are more than 10 homes listed, we want a nice, easy operation to take care of changing those prices!

In [None]:
# element-wise addition
mat + mat

In [None]:
# element-wise multiplication
10*mat

3. Based on your observations for the outputs in the previous two code chunks, what do you think "element-wise" means?
4. Do you think *mat\*mat* will produce element-wise multiplication too?  Test your hypothesis!

Insert your answers here or in a new text box here.

These operations are the basic arithematic operations.  Other mathematical functions are also applied per element.  Check out Table 4.4.  Give me at least four of your favorite functions and describe them for me here!

**Section II: Create Your Own**

Based on what you observed with the creation of *mat*, try and create your own numpy array with 3 rows and 2 columns.  Call it *data1*.  Use the numbers {1,1,1,1,2,3} in your dataset.

Sometimes you may need "dummy" data.  We can make a dataset of 0's or 1's or random numbers, depending on what serves our needs best.

In [None]:
# create an array of 0's
Z = np.zeros((10,3)) # what are the dimensions of this data?


In [None]:
# create an array of 1's
O = np.ones((3,15)) # how many ones total are in this data?

In [None]:
# create an array of random values from N(0,1) like before
R = np.random.normal(loc = 0, scale = 1, size = (3,5))

Creating an array of random values from N(0,1) can be confusing.  Now that you are more familiar with numpy arrays, try and explain what is happening in the previous code chunk!

**Section III: Subsetting aka Slicing**

In some instances we may only want a subset of a larger dataset.  It helps to know different ways of slicing the data up to access just the portion.

In [None]:
# I want just the second and last elements of x, so I try.
# What do you notice about the output?
x[1:2]

In [None]:
# Now I try
x[1:3]
# What can you say about slicing the array?

For a two-dimensional array, we need a different approach.

State which rows and columns are being produced by each line of code.

In [None]:
R[0:2,1:3]

In [None]:
R[:3,0:1]

In [None]:
R[0:3,3:]

In [None]:
R[2]

In [None]:
R[:]

In [None]:
R[0,2]

In [None]:
R[0][2]

We can use booleans for subsetting as well!  This would be subsetting based on a particular condition.

For example, if I have a data set of names of students, and I want to determine how many students have the same name, I could use booleans to help me decide.

In [None]:
students = np.array(["Mario", "Luigi", "Peach", "Wario", "Mario", "Bowser", "Mario", "Wario"])
gpa = np.array([1.1, 2.0, 3.2, 2.8, 3.3, 3.6, 4.8, 1.9])
data = np.array([students, gpa]).T #this is called transposing; we do this when we want the dimensions swapped

indx = students == "Mario"
np.where(indx)

In the code chunk above, we use *indx* to create an array that says ```True``` wherever the name *Mario* appears and ```False``` otherwise.  Then, we use the numpy function ```np.where``` in order to determine what the location of the ```True```s are.

Now we can use that Boolean information to determine the gpas of the different Marios!

In [None]:
data[indx,]

You could get more creative, but we'll wait until we're dealing with **pandas** data sets.  However, please read Chapter 4 if you are finding this really interesting.

## Conclusion

What is one thing you find useful about numpy arrays that you didn't know before?