# 1. Setting Up


## Imports

We have used numpy before, importing it as np. This week we'll really dive into it.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from scipy.sparse import csr_matrix

%matplotlib inline

# 2. Overview

`numpy` is a popular python numerical processing library.

`numpy`'s primary data structure is the `numpy.array`. An array will store a sequence of values *of the same type*.

A good analog to a `numpy.array` is the `pandas.Series`. However, the index of an array is always integer values like a `list`.

Here is an example of the syntax to create a `numpy.array`.

In [2]:
x = np.array([1, 2, 3, 4])
print(x)

# If there are mixed types, the array coerces values to the same type (int -> float)
y = np.array([3.14, 2, 3])
print(y)

[1 2 3 4]
[3.14 2.   3.  ]


In [3]:
# The type of the object returned after calling np.array() is a numpy class called an ndarray (n-dimensional array)
print(type(x))
print(type(y))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


# 3. Specialized Arrays

There are some quick ways to create commonly used arrays.

In [4]:
# array initializer takes a list of values
x = np.array([1,2,3])

# arange works like range to make an array of number from start (inclusive) to stop (exclusive)
y = np.arange(1,5)

# Make an array of all 1s of some length
z = np.ones(5)

print(x)
print(y)
print(z)

[1 2 3]
[1 2 3 4]
[1. 1. 1. 1. 1.]


# 4. Increasing Dimensionality

Numpy arrays are great for representing multi-dimensional data efficiently.

In [5]:
# np.ones actually takes a tuple, specifying the rows and columns of the all ones matrix (2D array)
x = np.ones((3,4))
print(x)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


The `reshape` function allows us to take an array and change its shape while maintaining its data.

In [6]:
# Create an array of the values 0 to 20 (exclusive)
x = np.arange(20)
print('Before reshape')
print(x)
print()

# Reshape the array such that it has dimensions 5x4 (5 rows, 4 columns)
y = x.reshape((5,4))
print('After reshape')
print(y)

Before reshape
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

After reshape
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


In [7]:
# What happens if we reshape to a different number of entries?

# Fewer entries
z = x.reshape((6,3))
print(z)

ValueError: ignored

In [8]:
# What happens if we reshape to a different number of entries?

# More entries
z = x.reshape((6,4))
print(z)

ValueError: ignored

# 5. Accessing Data

When we have 2D arrays, we use syntax similar to pandas' `.iloc` to access a specific row or column.

We can use python's "slice" syntax to access multiple rows and columns.

In [9]:
# TODO: create a 5 by 4 array of the integers 1 to 20 (exclusive)
x = np.arange(1, 21).reshape(5, 4)
print('x')
print(x)
print()

x
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]



In [10]:
# What will this cell output?

# Access one value
print('First: x[1,2]')
x1 = x[1,2]
print(x1)
print()

First: x[1,2]
7



In [11]:
# Use slice notation [a:b, c:d], where pre:post has pre inclusive, post exclusive

# Slice first element of tuple
print('Second: x[3:5, 1]')
x2 = x[3:5, 1]
print(x2)
print()

# Slice second element of tuple
print('Third: x[3, 1:3]')
x3 = x[3, 1:3]
print(x3)
print()

# Slice both elements of tuple
print('Fourth: x[0:6, 2:5]')
x4 = x[0:6, 2:5]
print(x4)

Second: x[3:5, 1]
[14 18]

Third: x[3, 1:3]
[14 15]

Fourth: x[0:6, 2:5]
[[ 3  4]
 [ 7  8]
 [11 12]
 [15 16]
 [19 20]]


In [12]:
# Slice notation has a "and everything else" syntax
print('Fifth: x[1, :]')
x5 = x[1, :]
print(x5) # Everything in the first row
print()

print('Sixth: x[1, :]')
x6 = x[:, 3]
print(x6) # Everything in the third column
print()

# Reshape x6 to look like a column
print('Reshaped to look like a column')
print(x6.reshape(5,1))

Fifth: x[1, :]
[5 6 7 8]

Sixth: x[1, :]
[ 4  8 12 16 20]

Reshaped to look like a column
[[ 4]
 [ 8]
 [12]
 [16]
 [20]]


# 6. Shape

As the shape of the numpy.array is so important, accessing it is also useful. The numpy.array has a property called shape which returns a tuple describing the shape of the array.

In [13]:
# What does each of these return? How do you interpret the result?
# Type predictions in chat, wait to send them at once

print('x.shape')
print(x.shape)
print()

print('x1.shape')
print(x1.shape)
print()

print('x2.shape')
print(x2.shape)
print()

print('x3.shape')
print(x3.shape)
print()

print('x4.shape')
print(x4.shape)
print()

x.shape
(5, 4)

x1.shape
()

x2.shape
(2,)

x3.shape
(2,)

x4.shape
(5, 2)



In [14]:
# What happens when we reshape the x4 array?

print('x4')
print(x4)
print()

print('x4.shape')
print(x4.shape)
print()


# Reshape it
x4_r = x4.reshape(2,5)

print('x4_r')
print(x4_r)
print()

print('x4_r.shape')
print(x4_r.shape)
print()

x4
[[ 3  4]
 [ 7  8]
 [11 12]
 [15 16]
 [19 20]]

x4.shape
(5, 2)

x4_r
[[ 3  4  7  8 11]
 [12 15 16 19 20]]

x4_r.shape
(2, 5)



Note, this is *not* the same as the transpose operation! When we reshape we maintain the order of the elements, left to right and top to bottom.

# 7. Functions

Numpy has functions that can be applied to arrays and their subsets! Many of the standard functions we might want to use are supported.

In [24]:
# Reusing x from above (digits 1 - 20 exclusive, in a 5x4 array)
print('x')
print(x)
print()

print('Mean x')
print(np.mean(x))
print()

print('Max third row of x')
# TODO: Find the max of the third row of x. Get ready to type it in chat.
print(np.max(x[2,:]))
print()

print('Min second column of x')
# TODO: Find the min of the second column of x. Get ready to type it in chat.
print(np.min(x[:,1]))
print()

# Technically, we could have also used our knowledge of the data to answer this question without computation.
# We know how the data is distributed across the array; in particular, elements increase left to right and top to bottom.
# Leveraging this knowledge would save us computation in situations with vast, many dimensional arrays.

x
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]

Mean x
10.5

Max third row of x
12

Min second column of x
2



In [25]:
# Another syntax for numpy functions across arrays
print(np.max(x, axis=0))
print(np.max(x, axis=1))

# What are we returning here?

[17 18 19 20]
[ 4  8 12 16 20]


Axis 0 is the rows (down the columns), axis 1 is the columns (down the rows).