<h1>Chapter III</h1>
<h2>Numerical Python (NumPy)</h2>

<h3>Numerical Python (NumPy) is an open source Python library for scientific computing.</h3>

<p>NumPy provides a host of features that allow a Python programmer to work with high performance
arrays and matrices. NumPy arrays are stored more efficiently than Python
lists and allow mathematical operations to be vectorized, which results in significantly
higher performance than with looping constructs in Python.
pandas builds upon functionality provided by NumPy. The pandas library relies heavily on
the NumPy array for the implementation of the pandas Series and DataFrame objects, and
shares many of its features such as being able to slice elements and perform vectorized
operations. It is therefore useful to spend some time going over NumPy arrays before
diving into pandas.</p>
<p>
In this chapter, we will cover the following topics about NumPy arrays:
<ul>

<li>Installing and importing NumPy</li>
<li>Benefits and characteristics of NumPy arrays</li>
<li>Creating NumPy arrays and performing basic array operations</li>
<li>Selecting array elements</li>
<li>Logical operation on arrays</li>
<li>Slicing arrays</li>
<li>Reshaping arrays</li>
<li>Combining arrays</li>
<li>Splitting arrays</li>
<li>Useful numerical methods of NumPy arrays</li>
</ul>
</p>


In [5]:
import numpy as np

In [3]:
# a function that squares all values
def squares(values):
    result = []
    for v in values:
        result.append(v * v)
    return result

# create 100,000 numbers using python range
to_square = range(100000)

# time how long it takes to repeatedly square them all
%timeit squares(to_square)

16 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


<p>Using NumPy and vectorized arrays, the above example can be rewritten as follows.</p>

In [6]:
# now lets do this with a numpy array
array_to_square = np.arange(0, 100000)

# and time using a vectorized operation
%timeit array_to_square ** 2


89 µs ± 4.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [23]:
last=89E-06
start=16E-03
result = start /last
result # 179.77 Times Faster!

179.77528089887642


<ul><li><h2>Creating NumPy arrays and performing
    basic array operations</h2></li></ul>


<p>A NumPy array can be created using multiple techniques. The following code creates a new NumPy array object from a Python list:</p>

In [3]:
import numpy as np

a1 = np.array([1, 2, 3, 4, 5])
a1

array([1, 2, 3, 4, 5])

In [28]:
# Type of array
type(a1)

numpy.ndarray

In [29]:
# Number of elements
np.size(a1)

5

In [42]:
# 2nd Example
myList = list(range(5))
a2 = np.array(myList)
a2

array([0, 1, 2, 3, 4])

In [43]:
# 3nd Example
myRange = range(5)
a2 = np.array(myRange)
a2

array([0, 1, 2, 3, 4])

In [40]:
# Type of elements of the array
a2.dtype

dtype('int32')

In [41]:
# Initialize array of N elements
a3 = np.array([0] * 10)
a3

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

<ul><li><h3>More efficient way to initialize array with NumPy</h3></li><ul>

In [44]:
# To efficiently create an array of a specific size that is initialized with
# zeros, use the np.zeros() function as shown in the following code:

np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [45]:
# The default is to create floating-point numbers. This can be changed to
# integers using the dtype parameter, as shown in the following example

np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

<p>NumPy provides the np.arange() function to create a NumPy array consisting of
sequential values from a specified start value up to, but not including, the specified end
value:</p>

In [47]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

<p>The np.linspace() function is similar to np.arange(), but generates an array of a
specific number of items between the specified start and stop values:
    </p>

In [58]:
np.linspace(0, 10, 90)

array([ 0.        ,  0.11235955,  0.2247191 ,  0.33707865,  0.4494382 ,
        0.56179775,  0.6741573 ,  0.78651685,  0.8988764 ,  1.01123596,
        1.12359551,  1.23595506,  1.34831461,  1.46067416,  1.57303371,
        1.68539326,  1.79775281,  1.91011236,  2.02247191,  2.13483146,
        2.24719101,  2.35955056,  2.47191011,  2.58426966,  2.69662921,
        2.80898876,  2.92134831,  3.03370787,  3.14606742,  3.25842697,
        3.37078652,  3.48314607,  3.59550562,  3.70786517,  3.82022472,
        3.93258427,  4.04494382,  4.15730337,  4.26966292,  4.38202247,
        4.49438202,  4.60674157,  4.71910112,  4.83146067,  4.94382022,
        5.05617978,  5.16853933,  5.28089888,  5.39325843,  5.50561798,
        5.61797753,  5.73033708,  5.84269663,  5.95505618,  6.06741573,
        6.17977528,  6.29213483,  6.40449438,  6.51685393,  6.62921348,
        6.74157303,  6.85393258,  6.96629213,  7.07865169,  7.19101124,
        7.30337079,  7.41573034,  7.52808989,  7.64044944,  7.75

<p>To create a two-dimensional NumPy array, you can pass in a list of lists as shown in the
following example:</p

In [59]:
# create a 2-dimensional array (2x2)
np.array([[1,2], [3,4]])

array([[1, 2],
       [3, 4]])

A more convenient and efficient means is to use the NumPy array’s .reshape() method to reorganize a one-dimensional array into two dimensions.

In [5]:
m = np.arange(0, 20).reshape(5, 4)
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [61]:
# In a multidimensional array the size counts each and every element
np.size(m)

20

<p>To determine the number of rows in a two-dimensional array, we can pass 0 as another
parameter:</p>

In [63]:
# can ask the size along a given axis (0 is rows)
np.size(m, 0)

5

<p>To determine the number of columns in a two-dimensional array, we can pass 1 as another parameter:</p>

In [64]:
# and 1 is the columns
np.size(m, 1)

4

In [69]:
a1, a2

(array([1, 2, 3, 4, 5]), array([0, 1, 2, 3, 4]))

In [72]:
a1[0], a2[1]

(1, 1)

In [73]:
# multiply numpy array by 2
a1 = np.arange(0, 10)
a1 * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [75]:
# It is also possible to apply a mathematical operator across two arrays:
a1 = np.arange(0,10)
a2 = np.arange(10,20)
a1 + a2


array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [6]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

<p>Get entrire row</p>

In [80]:
m[1,:]

array([4, 5, 6, 7])

Get Entire column

In [79]:
m[:,2]

array([ 2,  6, 10, 14, 18])

In [124]:
# The following code selects columns in position 2 through 3 of the matrix:
m[:,1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14],
       [17, 18]])

In [9]:
# in row positions 3 up to but not including 5, all columns
m[3:5,1:2]

array([[13],
       [17]])

In [8]:
# using a python array, we can select
# non-contiguous rows or columns
m[[1,3,4],:]

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

<h2>Logical operators on arrays</h2>

In [82]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [83]:
a < 2

array([ True,  True, False, False, False])

In [84]:
(a<2) | (a>3)

array([ True,  True, False, False,  True])

<p>NumPy provides the np.vectorize() function, which applies an expression or function to
an array in a vectorized manner. The following code demonstrates the use of
np.vectorize() to apply a function named exp() to each item in the array:</p>

In [88]:
# create a function that is applied to all array elements
def exp (x):
    return x<3 or x>3

# np.vectorize applies the method to all items in an array
np.vectorize(exp)(a)

array([ True,  True,  True, False,  True])

In [89]:
a

array([0, 1, 2, 3, 4])

In [94]:
# boolean select items < 3
r = (a < 3)
a[r]

array([0, 1, 2])

In [95]:
# count how many items are less than 3
np.sum(a<3)

3

In [100]:
a1 = np.arange(0, 5)
a2 = np.arange(5, 0, -1)
a1, a2

(array([0, 1, 2, 3, 4]), array([5, 4, 3, 2, 1]))

In [101]:
a1 < a2

array([ True,  True,  True, False, False])

<p>This also works across multi-dimensional arrays:</p>

In [102]:
# and even multi dimensional arrays
a1 = np.arange(9).reshape(3, 3)
a2 = np.arange(9, 0 , -1).reshape(3, 3)
a1 < a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

<h2>Slicing arrays</h2>

<p>NumPy arrays support a feature called slicing. Slicing retrieves zero or more items from
an array, and the items also don’t need to be sequential, whereas the normal array element
operator [] can only retrieve one value. This is very convenient as it provides an ability to
efficiently select multiple items from an array without the need to implement Python
loops.</p>
<p>Slicing overloads the normal array [] operator to accept what is referred to as a slice
object. A slice object is created using a syntax of start:end:step. Each component of the
slice is optional and, as we will see, this provides convenient means to select entire rows
or columns by omitting the component of the slice.</p>
<p>To begin with the demonstrations, the following code creates a ten-element array and
selects items in zero-based positions from 3 up to, but not including, position 8:</p>

In [10]:
a1 = np.arange(1,10)
a1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
a1[3:8]

array([4, 5, 6, 7, 8])

In [12]:
# The same example with step value
a1[3:8:2]

array([4, 6, 8])

In [13]:
# Slice every other value
a1[:4:2]

array([1, 3])

In [111]:
# Reverse order of array
a1[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1])

<p>Note that the following example is not equivalent to the preceding
example:</p> <p>Value of '1' is missing</p>
<p>In this scenario, the 0 value in the array was not retrieved. This is because the end value is
not inclusive, so when iterating by -1 from 9, NumPy stops at 0 before returning the value
at that position in the array.</p>

In [120]:
a1[8:0:-1]

array([9, 8, 7, 6, 5, 4, 3, 2])

<p>To select all the items starting at a position until the end of the array, simply specify the
start position and leave end unspecified.</p>

In [121]:
a1[8::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1])

In [7]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

<h2>Reshaping arrays</h2>

<p>NumPy makes it simple to change the shape of your arrays. Earlier in this chapter, we
briefly saw the .reshape() method of the NumPy array and how it can be used to reshape
a one-dimensional array into a matrix. It is also possible to convert from a matrix back to
an array. The following example demonstrates this by creating a nine-element array,
reshaping it into a 3 x 3 matrix, and then back to a 1 x 9 array:</p>

In [128]:
# create a 9 element array (1x9)
a = np.arange(0, 9)
# and reshape to a 3x3 2-d array
m = a.reshape(3, 3)
a,m

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 array([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]))

In [132]:
# and we can reshape downward in dimensions too
reshaped = m.reshape(9)
# or reshaped = m.reshape(np.size(m))
reshaped

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

<p>The .reshape() method is not the only means of reorganizing data. Another means is the
.ravel() method that will flatten a matrix to one dimension as shown in the following
example:</p>


In [130]:
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [131]:
m.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

The preceding code has performed the same operation as using the previous .reshape()
example, but without the need to pass the number of items in the matrix.

Even though .reshape() and .ravel() do not change the shape of the original array or
matrix, they do actually return a one-dimensional view into the specified array or matrix.
If you change an element in this view, the value in the original array or matrix is changed.

In [133]:
reshaped = m.reshape(np.size(m))
# ravel into an array
raveled = m.ravel()
# change values in either
reshaped[2] = 1000
raveled[5] = 2000
# and they show as changed in the original
m

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

The .flatten() method functions similarly to .ravel() but instead returns a new array
with copied data instead of a view. Changes to the result do not change the original matrix:

In [135]:
# flattened is like ravel, but a copy of the data,
# not a view into the source
m2 = np.arange(0, 9).reshape(3,3)
flattened = m2.flatten()
# change in the flattened object
flattened[0] = 1000
flattened

array([1000,    1,    2,    3,    4,    5,    6,    7,    8])

In [136]:
# but not in the original
m2

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

The .shape property returns a tuple representing the shape of the array:

In [138]:
m2.shape

(3, 3)

The property can also be assigned a tuple, which will force the array to reshape itself as
specified:

In [140]:
m2.shape = (1,9)
m2

array([[0, 1, 2, 3, 4, 5, 6, 7, 8]])

In [143]:
m2.shape = (3, 3)
m2

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In linear algebra, it is common to transpose a matrix. This can be performed with the
.transpose() method, as shown here:

In [144]:
m2.transpose()

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

Alternatively, this can also be performed with the .T property:

In [145]:
m2.T
m2

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

The .resize() method functions similarly to the .reshape() method, except that while
reshaping returns a new array with data copied into it, .resize() performs an in-place
reshaping of the array.:

In [147]:
m = np.arange(0, 9).reshape(3,3)
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [149]:
m.resize(1, 9)
m

array([[0, 1, 2, 3, 4, 5, 6, 7, 8]])

<h2>Combining Arrays</h2>

Arrays can be combined in various ways. This process in NumPy is referred to as
stacking. Stacking can take various forms, including horizontal, vertical, and depth-wise
stacking. To demonstrate this, we will use the following two arrays (a and b):

In [151]:
# creating two arrays for examples
a = np.arange(9).reshape(3, 3)
b = (a + 1) * 10
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [152]:
b

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Horizontal stacking combines two arrays in a manner where the columns of the second
array are placed to the right of those in the first array. The function actually stacks the two
items provided in a two-element tuple. The result is a new array with data copied from the
two that are specified:


In [153]:
np.hstack((a, b))

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

This functionally is equivalent to using the np.concatenate() function while specifying
axis = 1:

In [154]:
np.concatenate((a, b), axis = 1)

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

Vertical stacking returns a new array with the contents of the second array as appended
rows of the first array:

In [156]:
# vertical stack, adding b as rows after a's rows
np.vstack((a, b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Like np.hstack(), this is equivalent to using the concatenate function, except specifying
axis=0:

In [157]:
# concatenate along axis=0 is the same as vstack
np.concatenate((a, b), axis = 0)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Depth stacking takes a list of arrays and arranges them in order along an additional axis
referred to as the depth:

In [158]:
# dstack stacks each independent column of a and b
np.dstack((a, b))

array([[[ 0, 10],
        [ 1, 20],
        [ 2, 30]],

       [[ 3, 40],
        [ 4, 50],
        [ 5, 60]],

       [[ 6, 70],
        [ 7, 80],
        [ 8, 90]]])

Column stacking performs a horizontal stack of two one-dimensional arrays, making each
array a column in the resulting array:

In [159]:
# set up 1-d array
one_d_a = np.arange(5)
one_d_a

array([0, 1, 2, 3, 4])

In [160]:
# another 1-d array
one_d_b = (one_d_a + 1) * 10
one_d_b

array([10, 20, 30, 40, 50])

In [161]:
# stack the two columns
np.column_stack((one_d_a, one_d_b))

array([[ 0, 10],
       [ 1, 20],
       [ 2, 30],
       [ 3, 40],
       [ 4, 50]])

In [164]:
# stack the two columns
np.column_stack((one_d_a, one_d_b))[0][1]

10

Row stacking returns a new array where each one-dimensional array forms one of the
rows of the new array:

In [167]:
# stack along rows
np.row_stack((one_d_a, one_d_b))

array([[ 0,  1,  2,  3,  4],
       [10, 20, 30, 40, 50]])

<h2>Splitting arrays</h2>

<p>Arrays can also be split into multiple arrays along the horizontal, vertical, and depth axes
using the np.hsplit(), np.vsplit(), and np.dsplit() functions. We will only look at
the np.hsplit() function as the others work similarly.
The np.hsplit() function takes the array to split as a parameter, and either a scalar value
to specify the number of arrays to be returned, or a list of column indexes to split the array
upon.</p>
<p>If splitting into a number of arrays, each array returned will have the same count of
columns. The source array must have a number of columns that is a multiple of the
specified value.</p>
<p>To demonstrate this, we will use the following array with four columns and three rows:</p>

In [169]:
# sample array
a = np.arange(12).reshape(3, 4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [170]:
# horiz split the 2-d array into 4 array columns
np.hsplit(a, 4)

[array([[0],
        [4],
        [8]]),
 array([[1],
        [5],
        [9]]),
 array([[ 2],
        [ 6],
        [10]]),
 array([[ 3],
        [ 7],
        [11]])]

In [171]:
# horiz split the 2-d array into 4 array columns
np.hsplit(a, 4)[0]

array([[0],
       [4],
       [8]])

In [173]:
# horiz split the 2-d array into 2 array columns
np.hsplit(a, 2)

[array([[0, 1],
        [4, 5],
        [8, 9]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11]])]

Also, the following code splits an array along specific columns:

In [174]:
# split at columns 1 and 3
np.hsplit(a, [1, 3])

[array([[0],
        [4],
        [8]]),
 array([[ 1,  2],
        [ 5,  6],
        [ 9, 10]]),
 array([[ 3],
        [ 7],
        [11]])]

Vertical splitting works similarly to horizontal splitting, except against the vertical axis,
which can be seen here:

In [175]:
# new array for examples
a = np.arange(12).reshape(4, 3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

We can split this by 4 and get the four arrays representing the rows:

In [176]:
# split into four rows of arrays
np.vsplit(a, 4)

[array([[0, 1, 2]]),
 array([[3, 4, 5]]),
 array([[6, 7, 8]]),
 array([[ 9, 10, 11]])]