In [1]:
%autosave 0

Autosave disabled


Numpy makes it easy for us to work with large amounts of numerical data in arrays. Many other libraries (including Pandas) are built on Numpy. It's important to be familiar working with this library in order to understand the functionality of the other libraries.

In [1]:
#standard import
import numpy as np

We create Numpy arrays with the np.array function. Inside the function, we can pass a list as the argument.

In [4]:
array = np.array([1, 2, 3, 4, 5])
array

array([1, 2, 3, 4, 5])

We can index a Numpy array, much like a list!

In [5]:
array[0]

1

In [7]:
array[2:]

array([3, 4, 5])

In [9]:
array[-3:]

array([3, 4, 5])

Vectorized operations allow us to perform a mathematical operation on EVERY number in an array.

Let's compare how we could do the same thing with a list and a Numpy array.

In [10]:
array * 10

array([10, 20, 30, 40, 50])

Boolean masking is an important concept we will apply consistently with Pandas. We will get back an array filled with Boolean values, and use it to MASK (hide or remove) certain values from the output.

In [11]:
my_list = [1, 2, 3, 4, 5]

In [14]:
empty_list = []
for num in my_list:
    empty_list.append(num * 10)

empty_list

[10, 20, 30, 40, 50]

In [20]:
mask = array > 3
mask

array([False, False, False,  True,  True])

In [21]:
array[mask]

array([4, 5])

In [17]:
array[array < 3]

array([1, 2])

We can create arrays from the standard normal distribution (mean of 0, standard deviation of 1) using np.random.randn. We specify the shape of the resulting array as the argument for the function.

In [26]:
st_norm = np.random.randn(5)
st_norm

array([-0.782601  , -0.25583967, -2.02145789,  1.02494015,  0.18926467])

In [28]:
np.random.randn(3, 3, 3)   # 3x3 is a square, 3x3x3 is a cube

array([[[ 0.64574315,  0.45216551,  0.31223875],
        [ 0.55082516, -0.71972943,  0.35958444],
        [-2.08820717, -0.15161525,  0.06007698]],

       [[-0.71926706, -0.64587312,  0.8723028 ],
        [-0.4577982 ,  1.54516534,  0.2254637 ],
        [-0.31352998,  0.11425595, -1.45779671]],

       [[ 1.42352746,  0.92018624,  1.46902375],
        [ 1.24933652, -1.82546651, -0.86076004],
        [-1.45615643, -0.02633984,  1.7665374 ]]])

We can create arrays from other normal distributions using np.random.normal. To this function, we will pass in arguments specifying the mean, the standard deviation, and the size of the resulting array.

In [29]:
my_norm = np.random.normal(30, 5, 5)
my_norm

array([32.16897671, 32.62823488, 31.21602062, 23.69182042, 34.68501953])

We can create arrays from a range using np.arange. We specify the starting value, the stopping value, and the step size.

In [30]:
range_array = np.arange(0, 12, 3)
range_array

array([0, 3, 6, 9])

We can call a number of array methods to return descriptive statistics, such as min, max, mean, standard deviation, and sum.

In [31]:
range_array.sum()

18

In [32]:
range_array.mean()

4.5

In [33]:
range_array.max()

9