In [1]:
import numpy as np

# Numpy

Numpy is a library for representing and working with large and multi-dimensional arrays. Most other libraries in the data-science ecosystem depend on numpy, making it one of the fundamental data science libraries.

Numpy provides a number of useful tools for scientific programming, and in this lesson, we'll take a look at some of the most common.

Convention is to import the `numpy` module as `np`.

`import numpy as np`

## Indexing

Numpy provides an array type that goes above and beyond what Python's built-in lists can do.

We can create a numpy array by passing a list to the `np.array` function:

In [2]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [3]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Referencing elements in numpy arrays at it's most basic is the same as referencing elements in Python lists.

`a[0]`

1

In [4]:
print('a    == {}'.format(a))
print('a[0] == {}'.format(a[0]))
print('a[1] == {}'.format(a[1]))
print('a[2] == {}'.format(a[2]))

a    == [1 2 3]
a[0] == 1
a[1] == 2
a[2] == 3


However, multidimensional numpy arrays are easier to index into. To obtain the element at the second column in the second row, we would write:

`matrix[1, 1]`

5

To get the first 2 elements of the last 2 rows:

`matrix[1:, :2]`

array([[4, 5],
       [7, 8]])

Arrays can also be indexed with a boolean sequence used to indicate which values should be included in the resulting array.

should_include_elements = [True, False, True]
a[should_include_elements]

array([1, 3])

Note that the boolean sequence must the the same length as the array being indexed.

## Vectorized Operations

Another useful feature of numpy arrays is vectorized operations.

If we wanted to add 1 to every element in a list, without numpy, we can't simply add 1 to the list, as that will result in a `TypeError`.



In [5]:
original_array = [1, 2, 3, 4, 5]
try:
    original_array + 1
except TypeError as e:
    print('An Error Occured!')
    print(f'TypeError: {e}')

An Error Occured!
TypeError: can only concatenate list (not "int") to list


Instead, we might write a for loop or a list comprehension:

In [6]:
original_array = [1, 2, 3, 4, 5]
array_with_one_added = []
for n in original_array:
    array_with_one_added.append(n + 1)
print(array_with_one_added)

[2, 3, 4, 5, 6]


In [7]:
original_array = [1, 2, 3, 4, 5]
array_with_one_added = [n + 1 for n in original_array]
print(array_with_one_added)

[2, 3, 4, 5, 6]


Vectorizing operations means that operations are automatically applied to every element in a vector, which in our case will be a numpy array. So if we are working with a numpy array, we can simply add 1:

In [8]:
original_array = np.array([1, 2, 3, 4, 5])
original_array + 1

array([2, 3, 4, 5, 6])

This works the same way for the other basic arithmatic operators as well.


In [9]:
my_array = np.array([-3, 0, 3, 16])

print('my_array      == {}'.format(my_array))
print('my_array - 5  == {}'.format(my_array - 5))
print('my_array * 4  == {}'.format(my_array * 4))
print('my_array / 2  == {}'.format(my_array / 2))
print('my_array ** 2 == {}'.format(my_array ** 2))
print('my_array % 2  == {}'.format(my_array % 2))

my_array      == [-3  0  3 16]
my_array - 5  == [-8 -5 -2 11]
my_array * 4  == [-12   0  12  64]
my_array / 2  == [-1.5  0.   1.5  8. ]
my_array ** 2 == [  9   0   9 256]
my_array % 2  == [1 0 1 0]


Not only are the arithmatic operators vectorized, but the same applies to the comparison operators.

In [10]:
my_array = np.array([-3, 0, 3, 16])

print('my_array       == {}'.format(my_array))
print('my_array == -3 == {}'.format(my_array == -3))
print('my_array >= 0  == {}'.format(my_array >= 0))
print('my_array < 10  == {}'.format(my_array < 10))

my_array       == [-3  0  3 16]
my_array == -3 == [ True False False False]
my_array >= 0  == [False  True  True  True]
my_array < 10  == [ True  True  True False]


Knowing what we know about indexing numpy arrays, we can use the comparison operators to select a certain subset of an array.

For example, we can get all the positive numbers in `my_array` like so:

`my_array[my_array > 0]`

array([ 3, 16])

### In-Depth Example

As another example, we could obtain all the even numbers like this:

`my_array[my_array % 2 == 0]`

array([ 0, 16])

To better understand how this is all working let's go through the above example in a little more detail.

The first expression that gets evaluated is this:

`my_array % 2`

array([1, 0, 1, 0])

Which results in an array of 1s and 0s. Then the array of 1s and 0s is compared to 0 with the `==` operator, producing an array of True or False values.

`result = my_array % 2
result == 0`

array([False,  True, False,  True])

Lastly, we use this array of boolean values to index into the original array, giving us only the values that are evenly divisible by 2.

step_1 = my_array % 2
step_2 = step_1 == 0
step_3 = my_array[step_2]

step_3

array([ 0, 16])

Put another way, here is how the expression is evaluated:

In [12]:
print('1. my_array[my_array % 2 == 0]')
print('    - the original expression')
print('2. my_array[{} % 2 == 0]'.format(my_array))
print('    - variable substitution')
print('3. my_array[{} == 0]'.format(my_array % 2))
print('    - result of performing the vectorized modulus 2')
print('4. my_array[{}]'.format(my_array % 2 == 0))
print('    - result of comparing to 0')
print('5. {}[{}]'.format(my_array, my_array % 2 == 0))
print('    - variable substitution')
print('6. {}'.format(my_array[my_array % 2 == 0]))
print('    - our final result')

1. my_array[my_array % 2 == 0]
    - the original expression
2. my_array[[-3  0  3 16] % 2 == 0]
    - variable substitution
3. my_array[[1 0 1 0] == 0]
    - result of performing the vectorized modulus 2
4. my_array[[False  True False  True]]
    - result of comparing to 0
5. [-3  0  3 16][[False  True False  True]]
    - variable substitution
6. [ 0 16]
    - our final result


## Array Creation

Numpy provides several methods for creating arrays, we'll take a look at several of them.

`np.random.randn` can be used to create an array of specified length of random numbers drawn from the standard normal distribution.

`np.random.randn(10)`

array([-1.63526513,  0.4437877 , -0.026761  ,  0.91365701, -0.19552803,
        0.65391594, -1.3590744 ,  0.01449514, -1.22718349, -0.48087435])

We can also pass a second argument to this function to define the shape of a two dimensional array.

`np.random.randn(3, 4)`

array([[-0.67528597, -1.44504125,  0.63126959,  1.0732026 ],
       [ 1.58057546,  0.67135057,  1.49905094,  0.26424952],
       [-0.21247359,  0.38302284,  0.51563093,  0.23534614]])

If we wish to draw from a normal distribution with mean μ� and standard deviation σ�, we'll need to apply some arithmetic. Recall that to convert from the standard normal distribution, we'll need to multiply by the standard deviation, and add the mean.

In [13]:
mu = 100
sigma = 30

sigma * np.random.randn(20) + mu

array([ 86.28715548, 139.09629272,  50.13060488,  66.42652683,
       122.86004651, 115.3886892 ,  86.8567625 ,  74.48863732,
       149.83909496,  84.68704377,  88.02943748, 100.65759118,
        69.19293333, 131.01162715, 114.89445081, 135.14889143,
       132.32794712, 103.32209517, 149.14116326,  98.34992872])

The `zeros` and `ones` functions provide the ability to create arrays of a specified size full or either 0s or 1s, and the `full` function allows us to create an array of the specified size with a default value.

In [14]:
print('np.zeros(3)    == {}'.format(np.zeros(3)))
print('np.ones(3)     == {}'.format(np.ones(3)))
print('np.full(3, 17) == {}'.format(np.full(3, 17)))

np.zeros(3)    == [0. 0. 0.]
np.ones(3)     == [1. 1. 1.]
np.full(3, 17) == [17 17 17]


We can also use these methods to create multi-dimensional arrays by passing a tuple of the dimensions of the desired array, instead of a single integer value.

`np.zeros((2, 3))`

array([[0., 0., 0.],
       [0., 0., 0.]])


Numpy's `arange` function is very similar to python's builtin `range` function. It can take a single argument and generate a range from zero up to, but not including, the passed number.

`np.arange(4)`

array([0, 1, 2, 3])


We can also specify a starting point for the range:

`np.arange(1, 4)`

As well as a step:

`np.arange(1, 4, 2)`

array([1, 3])


Unlike python's builtin `range`, numpy's `arange` can handle decimal numbers

`np.arange(3, 5, 0.5)`

array([3.0, 3.5, 4.0, 4.5])