# Python for Data : Numpy Arrays

Python's build in data structures are great for general - purpose programming, but they lack some specialized features we'd like for data analysis. For example adding rows or columns of data in an element-wise fashion and performing math operations on two dimensional tables are common tasks that aren't readily available with Python's base data types.

# Numpy and Array Basics
Numpy implements a data structure called the N-dimensional array or ndarray. ndarrays are similar to lists in that they contain a collection of items that can be accessed via indexes. On the other hand, ndarrays are homogeneous, meaning they can only contain objects of the same type and they can be multi-dimensional, making it easy to store 2-dimensional tables or matrices.

To work with ndarrays, we need to load the numpy library. It is standard practice to load numpy with the alias "np" like so:

In [1]:
import numpy as np

In [2]:
#Create an ndarray by passing a list to np.array() function:
my_list = [1,2,3,4]            #Define a list
my_array = np.array(my_list)   #Pass the list to np.array()
type(my_array)                 #Check the object's type


numpy.ndarray

In [3]:
#To create an array with more than one dimension, pass a nested list to np.array():
second_list = [5,6,7,8]
two_d_array = np.array([my_list, second_list])
print(two_d_array)

[[1 2 3 4]
 [5 6 7 8]]


An ndarray is defined by the number of dimensions it has, the size of each dimension and the type of data it holds. Check the number and size of dimensions of an ndarray with the shape attribute.

In [4]:
two_d_array.shape

(2, 4)

The output above shows that this ndarray is 2-dimensional, since there are two values listed, and the dimensions have length 2 and 4. 

In [5]:
two_d_array.size

8

check the type of the data in an ndarray with the dtype attribute:

In [6]:
two_d_array.dtype

dtype('int64')

Numpy has a variety of special array creation functions. Some handy array creation functions include:

In [7]:
#np.identity() to create a square 2d array with 1's across the diagonal
np.identity(n=10)    #Size of the array

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [8]:
#np.eye() to create a 2d array with 1's across a specified diagonal

np.eye(N = 3,         #Number of rows
       M = 5,         #Number of columns
       k = 1)         #Index of the diagonal (Main diagonal (0) is default)

array([[0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

In [9]:
#np.ones() to create an array filled with ones :
np.ones(shape = [2,4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [10]:
#np.zeros() to create an array filled with zeros:
np.zeros(shape = [4,6])


array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

# Array Indexing and Slicing

Numpy ndarrays offer numbered indexing and slicing syntax that mirrors the syntax for Python lists:

In [11]:
one_d_array = np.array([1,2,3,4,5,6])
one_d_array[3]    #Get the item at index 3

4

In [12]:
one_d_array[3:]  #Get a slice from index 3 to the end

array([4, 5, 6])

In [13]:
one_d_array[::-1]    #Slice backwards to reverse the array

array([6, 5, 4, 3, 2, 1])

If an ndarray has more than one dimension, separate indexes for each dimension with a comma:

In [14]:
#Create a new 2d array
two_d_array = np.array([one_d_array, one_d_array + 6, one_d_array + 12])
print(two_d_array)

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]]


In [15]:
#Slice elements starting at row 2, and column 5
two_d_array[1:,4:]

array([[11, 12],
       [17, 18]])

In [16]:
#Reverse both dimensions (180 degree rotation)
two_d_array[::-1, ::-1]

array([[18, 17, 16, 15, 14, 13],
       [12, 11, 10,  9,  8,  7],
       [ 6,  5,  4,  3,  2,  1]])

# Reshaping Arrays

Numpy has a variety of built in functions to help you manipulate arrays quickly without having to use complicated indexing operations.

Reshape an array into a new array with the same data but different structure with np.reshape():

In [17]:
np.reshape(a=two_d_array,    #Array to reshape
            newshape=(6,3))   #Dimensions of the new array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

Unravel a multi-dimensional into 1 dimension with np.ravel():

In [18]:
np.ravel(a=two_d_array,
        order="C")                  #Use C-style unraveling (by rows)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

In [19]:
np.ravel(a=two_d_array,
        order="F")                   #Use Fortran-style unraveling (by columns)

array([ 1,  7, 13,  2,  8, 14,  3,  9, 15,  4, 10, 16,  5, 11, 17,  6, 12,
       18])

Alternatively, use ndarray.flatten() to flatten a multi-dimensional into 1 dimension and return a copy of the result:

In [20]:
two_d_array.flatten()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

Get the transpose of an array with ndarray.T:

In [21]:
two_d_array.T

array([[ 1,  7, 13],
       [ 2,  8, 14],
       [ 3,  9, 15],
       [ 4, 10, 16],
       [ 5, 11, 17],
       [ 6, 12, 18]])

In [22]:
#Flip an array vertically or horizontally with np.flipud() and np.fliplr() respectively:
np.flipud(two_d_array)

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

In [23]:
np.fliplr(two_d_array)

array([[ 6,  5,  4,  3,  2,  1],
       [12, 11, 10,  9,  8,  7],
       [18, 17, 16, 15, 14, 13]])

In [24]:
np.rot90(two_d_array,
        k=1)      #Number of 90 degree rotations

array([[ 6, 12, 18],
       [ 5, 11, 17],
       [ 4, 10, 16],
       [ 3,  9, 15],
       [ 2,  8, 14],
       [ 1,  7, 13]])

In [25]:
#Shift elements in an array along a given dimension with np.roll():

np.roll(a = two_d_array,
       shift = 2,
       axis = 1
       )

array([[ 5,  6,  1,  2,  3,  4],
       [11, 12,  7,  8,  9, 10],
       [17, 18, 13, 14, 15, 16]])

Leave the axis argument empty to shift on a flattened version of the array (shift across all dimensions):

In [26]:
np.roll( a = two_d_array,
       shift = 2)

array([[17, 18,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16]])

Join arrays along an axis with np.concatenate():

In [27]:
array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])

np.concatenate((two_d_array, array_to_join),         #Arrays to join
              axis = 1)                              #Axis to join upon

array([[ 1,  2,  3,  4,  5,  6, 10, 20, 30],
       [ 7,  8,  9, 10, 11, 12, 40, 50, 60],
       [13, 14, 15, 16, 17, 18, 70, 80, 90]])

# Array Math Operations

Creating and manipulating arrays is nice, but the true power of numpy arrays is the ability to perform mathematical operations on many values quickly and easily. Unlike built in Python objects, you can use math operators like +,-,/ and * to perform basic math operations with ndarrays:

In [28]:
two_d_array + 100   #Add 100 to each element

array([[101, 102, 103, 104, 105, 106],
       [107, 108, 109, 110, 111, 112],
       [113, 114, 115, 116, 117, 118]])