# Introduction to Numpy

**Outcomes**
- Understand basics about numpy arrays
- Index into multi-dimensional arrays
- Use universal functions/broadcasting to do element-wise operations on arrays

# Numpy Arrays
Now that we have learned the fundamentals of programming in Python, we will learn how we can use Python to perform the computations required in data science and economics. We call these the “scientific Python tools”.

The foundational library that helps us perform these computations is known as `numpy` (numerical Python). Numpy’s core contribution is a new data-type called an array. An array is similar to a list, but numpy imposes some additional restrictions on how the data inside is organized.

These restrictions allow numpy to
1. Be more efficient in performing mathematical and scientific computations.
1. Expose functions that allow numpy to do the necessary linear algebra for machine learning and statistics.

Before we get started, please note that the convention for importing the numpy package is to use the nickname np

In [1]:
import numpy as np

## What is an Array?
An array is a multi-dimensional grid of values. What does this mean? It is easier to demonstrate than to explain.

In this block of code, we build a 1-dimensional array.

In [2]:
# create an array from a list
x_1d = np.array([1, 2, 3])
print(x_1d)

[1 2 3]


In [3]:
# You can think of a 1-dimensional array as a list of numbers.

# We can index like we did with lists
print(x_1d[0])
print(x_1d[0:2])

1
[1 2]


In [4]:
# Note that the range of indices does not include the end-point, that is

print(x_1d[0:3] == x_1d[:])
print(x_1d[0:2])

[ True  True  True]
[1 2]


In [5]:
# The differences emerge as we move into higher dimensions.
# Next, we define a 2-dimensional array (a matrix)

x_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(x_2d)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Notice that the data is no longer represented as something flat, but rather, as three rows and three columns of numbers.

The first question that you might ask yourself is: “how do I access the values in this array?”

You access each element by specifying a row first and then a column. For example, if we wanted to access the `6`, we would ask for the (1, 2) element.

In [6]:
print(x_2d[1, 2])  # Indexing into two dimensions!

6


Or to get the top left corner…

In [7]:
print(x_2d[0, 0])  # Indexing into two dimensions!

1


To get the first, and then second rows…

In [8]:
print(x_2d[0, :])
print(x_2d[1, :])

[1 2 3]
[4 5 6]


Or the columns…

In [9]:
print(x_2d[:, 0])
print(x_2d[:, 1])

[1 4 7]
[2 5 8]


This continues to generalize, since numpy gives us as many dimensions as we want in an array. For example, we build a 3-dimensional array below.

In [10]:
x_3d_list = [[[1, 2, 3], [4, 5, 6]], [[10, 20, 30], [40, 50, 60]]]
x_3d = np.array(x_3d_list)
print(x_3d)

[[[ 1  2  3]
  [ 4  5  6]]

 [[10 20 30]
  [40 50 60]]]


**Array Indexing**

Now that there are multiple dimensions, indexing might feel somewhat non-obvious. Do the rows or columns come first? In higher dimensions, what is the order of the index? 

Notice that the array is built using a list of lists (you could also use tuples!). Indexing into the array will correspond to choosing elements from each list. First, notice that the dimensions give two stacked matrices, which we can access with

In [11]:
print(x_3d[0])
print(x_3d[1])

[[1 2 3]
 [4 5 6]]
[[10 20 30]
 [40 50 60]]


In the case of the first, it is synonymous with

In [12]:
print(x_3d[0, :, :])

[[1 2 3]
 [4 5 6]]


Let’s work through another example to further clarify this concept with our 3-dimensional array.

Our goal will be to find the index that retrieves the `4` out of `x_3d`.

Recall that when we created `x_3d`, we used the list `[[[1, 2, 3], [4, 5, 6]], [[10, 20, 30], [40, 50, 60]]]`.

Notice that the 0 element of that list is `[[1, 2, 3], [4, 5, 6]]`. This is the list that contains the `4` so the first index we would use is a 0.

In [13]:
print(f"The 0 element is {x_3d_list[0]}")
print(f"The 1 element is {x_3d_list[1]}")

The 0 element is [[1, 2, 3], [4, 5, 6]]
The 1 element is [[10, 20, 30], [40, 50, 60]]


We then move to the next lists which were the 0 element of the inner-most dimension. Notice that the two lists at this level `[1, 2, 3]` and `[3, 4, 5]`.

The 4 is in the second 1 element (index `1`), so the second index we would choose is 1.

In [14]:
print(f"The 0 element of the 0 element is {x_3d_list[0][0]}")
print(f"The 1 element of the 0 element is {x_3d_list[0][1]}")

The 0 element of the 0 element is [1, 2, 3]
The 1 element of the 0 element is [4, 5, 6]


Finally, we move to the outer-most dimension, which has a list of numbers `[4, 5, 6]`.

The 4 is element 0 of this list, so the third, or outer-most index, would be `0`.

In [15]:
print(f"The 0 element of the 1 element of the 0 element is {x_3d_list[0][1][0]}")

The 0 element of the 1 element of the 0 element is 4


Now we can use these same indices to index into the array. With an array, we can index using a single operation rather than repeated indexing as we did with the list x_3d_list[0][1][0].

Let’s test it to see whether we did it correctly!

In [16]:
print(x_3d[0, 1, 0])

4


Success!

We can also select multiple elements at a time – this is called slicing.

If we wanted to have an array with just `[1, 2, 3]` then we would do

In [17]:
print(x_3d[0, 0, :])

[1 2 3]


Notice that we put a `:` on the dimension where we want to select all of the elements. We can also slice out subsets of the elements by doing `start:stop+1`.

Notice how the following arrays differ.

In [18]:
print(x_3d[:, 0, :])
print(x_3d[:, 0, 0:2])
print(x_3d[:, 0, :2])  # the 0  in 0:2 is optional

[[ 1  2  3]
 [10 20 30]]
[[ 1  2]
 [10 20]]
[[ 1  2]
 [10 20]]
