In [1]:
%autosave 0

Autosave disabled


Numpy makes it easy for us to work with large amounts of numerical data in arrays. Many other libraries (including Pandas) are built on Numpy. It's important to be familiar working with this library in order to understand the functionality of the other libraries.

In [2]:
#standard import
import numpy as np

We create Numpy arrays with the np.array function. Inside the function, we can pass a list as the argument.

In [3]:
array = np.array([1, 2, 3, 4, 5])
array

array([1, 2, 3, 4, 5])

We can index a Numpy array, much like a list!

In [4]:
array[0] 

1

In [5]:
array[2:] # lower boundary

array([3, 4, 5])

In [6]:
array[-3:] # track backwards and count from same position in the array

# negatively slicing the matrix

array([3, 4, 5])

Vectorized operations allow us to perform a mathematical operation on EVERY number in an array.

Let's compare how we could do the same thing with a list and a Numpy array.

In [7]:
array * 10

array([10, 20, 30, 40, 50])

In [8]:
my_list = [1, 2, 3, 4, 5]

In [9]:
empty_list = []

for num in my_list:
    empty_list.append(num * 10)

empty_list

# this is why we use numpy in a nutshell - subquery lord

[10, 20, 30, 40, 50]

Boolean masking is an important concept we will apply consistently with Pandas. We will get back an array filled with Boolean values, and use it to MASK (hide or remove) certain values from the output.

In [10]:
mask = array > 3
mask

array([False, False, False,  True,  True])

In [11]:
array[mask]
# indexing operation for this array

array([4, 5])

We can create arrays from the standard normal distribution (mean of 0, standard deviation of 1) using np.random.randn. We specify the shape of the resulting array as the argument for the function.

In [12]:
# array < 3 is applied as a mask in this instance to the original array.
array[array < 3]

# boolean values need to match the number of array values to work; otherwise is errors out.

array([1, 2])

In [13]:
st_norm = np.random.randn(5)
st_norm

array([ 0.23602795,  0.39598559,  0.6576173 , -0.86412502,  0.17373996])

In [14]:
np.random.randn(3, 3, 3)

array([[[-0.12659876,  0.11623698,  0.54396262],
        [-1.16081314,  1.42789419,  0.09976167],
        [-1.75226336,  0.09761362, -0.6978081 ]],

       [[-0.21214874, -1.58788892,  2.06043188],
        [ 0.32533879,  0.77796133, -0.72263354],
        [-2.72333141,  0.36786179, -0.44663389]],

       [[ 1.12028307, -1.22883273,  1.37358844],
        [ 0.45035699, -1.01848304,  0.73874556],
        [-0.37444629,  0.94950357,  0.27314189]]])

We can create arrays from other normal distributions using np.random.normal. To this function, **we will pass in arguments specifying the mean, the standard deviation, and the size of the resulting array.**

In [15]:
my_norm = np.random.normal(30, 5, 5)
my_norm

array([34.75304002, 30.07842218, 34.42069703, 36.78330911, 33.56603299])

We can create arrays from a range using np.arange. **We specify the starting value, the stopping value, and the step size.**

In [16]:
range_array = np.arange(0, 12, 3)
range_array

array([0, 3, 6, 9])

We can call a number of array methods to return descriptive statistics, such as min, max, mean, standard deviation, and sum.

In [17]:
range_array.sum()

18

In [18]:
range_array.mean()

4.5

In [20]:
range_array.max()

9