In [1]:
import numpy as np

# Numpy

Numpy is a library for representing and working with large and multi-dimensional arrays. Most other libraries in the data-science ecosystem depend on numpy, making it one of the fundamental data science libraries.

Numpy provides a number of useful tools for scientific programming, and in this lesson, we'll take a look at some of the most common.

Convention is to import the `numpy` module as `np`.

`import numpy as np`

## Indexing

Numpy provides an array type that goes above and beyond what Python's built-in lists can do.

We can create a numpy array by passing a list to the `np.array` function:

In [2]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [3]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Referencing elements in numpy arrays at it's most basic is the same as referencing elements in Python lists.

`a[0]`

1

In [4]:
print('a    == {}'.format(a))
print('a[0] == {}'.format(a[0]))
print('a[1] == {}'.format(a[1]))
print('a[2] == {}'.format(a[2]))

a    == [1 2 3]
a[0] == 1
a[1] == 2
a[2] == 3


However, multidimensional numpy arrays are easier to index into. To obtain the element at the second column in the second row, we would write:

`matrix[1, 1]`

5

To get the first 2 elements of the last 2 rows:

`matrix[1:, :2]`

array([[4, 5],
       [7, 8]])

Arrays can also be indexed with a boolean sequence used to indicate which values should be included in the resulting array.

should_include_elements = [True, False, True]
a[should_include_elements]

array([1, 3])

Note that the boolean sequence must the the same length as the array being indexed.

## Vectorized Operations

Another useful feature of numpy arrays is vectorized operations.

If we wanted to add 1 to every element in a list, without numpy, we can't simply add 1 to the list, as that will result in a `TypeError`.



In [5]:
original_array = [1, 2, 3, 4, 5]
try:
    original_array + 1
except TypeError as e:
    print('An Error Occured!')
    print(f'TypeError: {e}')

An Error Occured!
TypeError: can only concatenate list (not "int") to list


Instead, we might write a for loop or a list comprehension:

In [6]:
original_array = [1, 2, 3, 4, 5]
array_with_one_added = []
for n in original_array:
    array_with_one_added.append(n + 1)
print(array_with_one_added)

[2, 3, 4, 5, 6]


In [7]:
original_array = [1, 2, 3, 4, 5]
array_with_one_added = [n + 1 for n in original_array]
print(array_with_one_added)

[2, 3, 4, 5, 6]


Vectorizing operations means that operations are automatically applied to every element in a vector, which in our case will be a numpy array. So if we are working with a numpy array, we can simply add 1:

In [8]:
original_array = np.array([1, 2, 3, 4, 5])
original_array + 1

array([2, 3, 4, 5, 6])

This works the same way for the other basic arithmatic operators as well.


In [9]:
my_array = np.array([-3, 0, 3, 16])

print('my_array      == {}'.format(my_array))
print('my_array - 5  == {}'.format(my_array - 5))
print('my_array * 4  == {}'.format(my_array * 4))
print('my_array / 2  == {}'.format(my_array / 2))
print('my_array ** 2 == {}'.format(my_array ** 2))
print('my_array % 2  == {}'.format(my_array % 2))

my_array      == [-3  0  3 16]
my_array - 5  == [-8 -5 -2 11]
my_array * 4  == [-12   0  12  64]
my_array / 2  == [-1.5  0.   1.5  8. ]
my_array ** 2 == [  9   0   9 256]
my_array % 2  == [1 0 1 0]


Not only are the arithmatic operators vectorized, but the same applies to the comparison operators.

In [10]:
my_array = np.array([-3, 0, 3, 16])

print('my_array       == {}'.format(my_array))
print('my_array == -3 == {}'.format(my_array == -3))
print('my_array >= 0  == {}'.format(my_array >= 0))
print('my_array < 10  == {}'.format(my_array < 10))

my_array       == [-3  0  3 16]
my_array == -3 == [ True False False False]
my_array >= 0  == [False  True  True  True]
my_array < 10  == [ True  True  True False]


Knowing what we know about indexing numpy arrays, we can use the comparison operators to select a certain subset of an array.

For example, we can get all the positive numbers in `my_array` like so:

`my_array[my_array > 0]`

array([ 3, 16])