NumPy library contains a collection of tools and techniques that can be used to solve on a computer mathematical models of problems. But the one tool that is most often used is a high-performance multidimensional array object: it’s a powerful data structure that allows you to efficiently compute arrays and matrices.

SciPy is basically NumPy. It’s also one of the core packages for scientific computing that provides mathematical algorithms and convenience functions, but it’s built on the NumPy extension of Python. SciPy and NumPy are often used together.

# Numpy

In [1]:
import numpy as np

Creating A NumPy Array

In [2]:
# Create a 2X2 identity matrix
print(np.eye(2))

# Create a 3X3 identity matrix
print(np.identity(3))

# Uniformly spaced values: spacing
print(np.arange(3, 7, 2))

# Uniformly spaced values: number of samples
print(np.linspace(2, 3, 5))

# Uniformly spaced values: logarithmic spacing 
print(np.logspace(2, 3, 4))

[[ 1.  0.]
 [ 0.  1.]]
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
[3 5]
[ 2.    2.25  2.5   2.75  3.  ]
[  100.           215.443469     464.15888336  1000.        ]


In [3]:
empty_array = np.zeros((5,4))
empty_array

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [4]:
# setting the seed will get the same random array when re-run the program. Otherwise, the array will change every time.
np.random.seed(100) 
rand_array = np.random.rand(5,4)
print('shape of data: [%d, %d]' % rand_array.shape)
rand_array

shape of data: [5, 4]


array([[ 0.54340494,  0.27836939,  0.42451759,  0.84477613],
       [ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
       [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
       [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
       [ 0.81168315,  0.17194101,  0.81622475,  0.27407375]])

Randomly permute a sequence

In [5]:
a = [1, 2, 3, 4, 5, 6, 7, 8]
print(np.random.permutation(a)) # return a permutation of the input list
print(a)

np.random.shuffle(a) # shuffle the list in place
print(a)

[1 3 5 6 2 7 8 4]
[1, 2, 3, 4, 5, 6, 7, 8]
[5, 6, 8, 2, 3, 1, 4, 7]


Indexing and Slicing NumPy Arrays

In [6]:
print(rand_array[2,3])
print('first row', rand_array[0, :]) # Slicing NumPy Arrays, get the first row
print('first col', rand_array[:, 0]) # Slicing NumPy Arrays, get the first column

0.209202122117
first row [ 0.54340494  0.27836939  0.42451759  0.84477613]
first col [ 0.54340494  0.00471886  0.13670659  0.18532822  0.81168315]


Assigning Values

In [7]:
rand_array[0,0] = .4
rand_array

array([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613],
       [ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
       [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
       [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
       [ 0.81168315,  0.17194101,  0.81622475,  0.27407375]])

Accessing data type

In [8]:
rand_array.dtype

dtype('float64')

In [9]:
rand_array.astype(int) # change the data type to int

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

NumPy Array Operations

In [10]:
rand_array[:, -1] + 1 # add 1 to each item in the last columnex

array([ 1.84477613,  1.82585276,  1.20920212,  1.97862378,  1.27407375])

In [11]:
rand_array[:, 1] + rand_array[:, 3] # add the values of two columns 

array([ 1.12314552,  0.94742188,  0.78429545,  1.08700068,  0.44601476])

Broadcasting

Unless the arrays that you’re operating on are the exact same size, it’s not possible to do elementwise operations. In cases like this, NumPy performs broadcasting to try to match up elements. Essentially, broadcasting involves a few steps:

The last dimension of each array is compared.

If the dimension lengths are equal, or one of the dimensions is of length 1, then we keep going.

If the dimension lengths aren’t equal, and none of the dimensions have length 1, then there’s an error.

Continue checking dimensions until the shortest array is out of dimensions.

In [12]:
rand_array1 = np.ones((1,4)) * 1.5 # array with shape [1, 4]
rand_array + rand_array1

array([[ 1.9       ,  1.77836939,  1.92451759,  2.34477613],
       [ 1.50471886,  1.62156912,  2.17074908,  2.32585276],
       [ 1.63670659,  2.07509333,  2.39132195,  1.70920212],
       [ 1.68532822,  1.60837689,  1.71969749,  2.47862378],
       [ 2.31168315,  1.67194101,  2.31622475,  1.77407375]])

NumPy Array Methods

In [13]:
print('sum of the array is %d' % rand_array.sum())
# axis = 0 means along the first axis - row, axis = 1 means along column
print('sum of each row', rand_array.sum(axis = 1))
print('sum of each column', rand_array.sum(axis = 0))

sum of the array is 8
sum of each row [ 1.94766311  1.62288982  1.812324    1.49202639  2.07392266]
sum of each column [ 1.53843681  1.25534974  3.02251087  3.13252854]


Normalize each column in the array

In [14]:
mean = rand_array.mean(axis = 0)
std = rand_array.std(axis = 0)
norm = (rand_array - mean) / std
norm

array([[ 0.32697604,  0.15807268, -0.72057297,  0.68361961],
       [-1.07312982, -0.74985218,  0.26522123,  0.62435188],
       [-0.60562257,  1.87620145,  1.14829046, -1.30698841],
       [-0.43340229, -0.82623951, -1.54057557,  1.1028284 ],
       [ 1.78517865, -0.45818244,  0.84763684, -1.10381148]])

NumPy Array Comparisons

In [15]:
rand_array > .5

array([[False, False, False,  True],
       [False, False,  True,  True],
       [False,  True,  True, False],
       [False, False, False,  True],
       [ True, False,  True, False]], dtype=bool)

Subsetting

In [16]:
large_values = (rand_array[:, 0] > .8) | (rand_array[:, 0] < .2)
print(large_values)
rand_array[large_values, :]

[False  True  True  True  True]


array([[ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
       [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
       [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
       [ 0.81168315,  0.17194101,  0.81622475,  0.27407375]])

Reshaping

In [17]:
np.transpose(rand_array)

array([[ 0.4       ,  0.00471886,  0.13670659,  0.18532822,  0.81168315],
       [ 0.27836939,  0.12156912,  0.57509333,  0.10837689,  0.17194101],
       [ 0.42451759,  0.67074908,  0.89132195,  0.21969749,  0.81622475],
       [ 0.84477613,  0.82585276,  0.20920212,  0.97862378,  0.27407375]])

In [18]:
# turn an array into a one-dimensional representation
rand_array.ravel()

array([ 0.4       ,  0.27836939,  0.42451759,  0.84477613,  0.00471886,
        0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333,
        0.89132195,  0.20920212,  0.18532822,  0.10837689,  0.21969749,
        0.97862378,  0.81168315,  0.17194101,  0.81622475,  0.27407375])

In [19]:
# reshape an array to a certain shape we specify.
rand_array.reshape((2,10))

array([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613,  0.00471886,
         0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333],
       [ 0.89132195,  0.20920212,  0.18532822,  0.10837689,  0.21969749,
         0.97862378,  0.81168315,  0.17194101,  0.81622475,  0.27407375]])

With reshaping, when change the shape of the data but the data itself do not change. When resizing, there is the possibility that the data that is contained within the array will change, depending on the shape that input.

In [20]:
# reshape an array to a certain shape we specify.
np.resize(rand_array, (3,10))

array([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613,  0.00471886,
         0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333],
       [ 0.89132195,  0.20920212,  0.18532822,  0.10837689,  0.21969749,
         0.97862378,  0.81168315,  0.17194101,  0.81622475,  0.27407375],
       [ 0.4       ,  0.27836939,  0.42451759,  0.84477613,  0.00471886,
         0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333]])

Combining

In [21]:
rand_array1 = np.random.rand(3, rand_array.shape[1]) # generate a new random array
combined = np.vstack((rand_array, rand_array1)) # combine the rows of two array
combined

array([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613],
       [ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
       [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
       [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
       [ 0.81168315,  0.17194101,  0.81622475,  0.27407375],
       [ 0.14206538,  0.58138896,  0.47918994,  0.38641911],
       [ 0.44046495,  0.40475733,  0.44225404,  0.03012328],
       [ 0.77600531,  0.55095838,  0.3810734 ,  0.52926578]])

In [22]:
np.concatenate((rand_array, rand_array1), axis=0) # combine the rows of two array

array([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613],
       [ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
       [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
       [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
       [ 0.81168315,  0.17194101,  0.81622475,  0.27407375],
       [ 0.14206538,  0.58138896,  0.47918994,  0.38641911],
       [ 0.44046495,  0.40475733,  0.44225404,  0.03012328],
       [ 0.77600531,  0.55095838,  0.3810734 ,  0.52926578]])

Linear transformation of Arrays

In [23]:
# element-wise operation
rand_array * 2

array([[ 0.8       ,  0.55673877,  0.84903518,  1.68955226],
       [ 0.00943771,  0.24313824,  1.34149817,  1.65170551],
       [ 0.27341318,  1.15018666,  1.78264391,  0.41840424],
       [ 0.37065644,  0.21675378,  0.43939499,  1.95724757],
       [ 1.6233663 ,  0.34388203,  1.6324495 ,  0.54814749]])

In [24]:
# matrix multiplication
multiplication = np.dot(rand_array, rand_array.T)
multiplication

array([[ 1.13135141,  1.01813415,  0.76988182,  1.02428356,  0.9505691 ],
       [ 1.01813415,  1.14673843,  0.84118222,  0.96961086,  0.7985595 ],
       [ 0.76988182,  0.84118222,  1.18764138,  0.48821379,  0.99470041],
       [ 1.02428356,  0.96961086,  0.48821379,  1.0520636 ,  0.61659984],
       [ 0.9505691 ,  0.7985595 ,  0.99470041,  0.61659984,  1.42973251]])

Numpy matrix

Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays) are N-dimensional.

# Linear Algebra With NumPy and SciPy

In [25]:
import scipy.linalg

Matrix: two-dimensional array

Difference between matrices and arrays:
- A matrix is 2-D, while arrays are usually n-D,
- As the functions above already implied, the matrix is a subclass of ndarray,
- Both arrays and matrices have .T(), but only matrices have .H() and .I(),
- Matrix multiplication works differently from element-wise array multiplication, and
- To add to this, the ** operation has different results for matrices and arrays

In [26]:
rand_mat = np.matrix(rand_array)
rand_mat

matrix([[ 0.4       ,  0.27836939,  0.42451759,  0.84477613],
        [ 0.00471886,  0.12156912,  0.67074908,  0.82585276],
        [ 0.13670659,  0.57509333,  0.89132195,  0.20920212],
        [ 0.18532822,  0.10837689,  0.21969749,  0.97862378],
        [ 0.81168315,  0.17194101,  0.81622475,  0.27407375]])

Operations are different for matrix and array

In [27]:
a=np.array([[4, 3], [2, 1]])
b=np.array([[1, 2], [3, 4]])
print(a * b) # element-wise
print(np.dot(a, b)) # matrix multiplication
print(a**2) # each component squared element-wise

[[4 6]
 [6 4]]
[[13 20]
 [ 5  8]]
[[16  9]
 [ 4  1]]


In [28]:
c=np.mat([[4, 3], [2, 1]])
d=np.mat([[1, 2], [3, 4]])
print(c * d) # matrix multiplication
print(c**2) # matrix product c * c

[[13 20]
 [ 5  8]]
[[22 15]
 [10  7]]


In [29]:
# Transposition
print(rand_mat.T)

# Conjugate transposition
print(rand_mat.H)

# Inverse
print(rand_mat.I)

# Array
print(rand_mat.A)

[[ 0.4         0.00471886  0.13670659  0.18532822  0.81168315]
 [ 0.27836939  0.12156912  0.57509333  0.10837689  0.17194101]
 [ 0.42451759  0.67074908  0.89132195  0.21969749  0.81622475]
 [ 0.84477613  0.82585276  0.20920212  0.97862378  0.27407375]]
[[ 0.4         0.00471886  0.13670659  0.18532822  0.81168315]
 [ 0.27836939  0.12156912  0.57509333  0.10837689  0.17194101]
 [ 0.42451759  0.67074908  0.89132195  0.21969749  0.81622475]
 [ 0.84477613  0.82585276  0.20920212  0.97862378  0.27407375]]
[[ 0.65001493 -1.08681523 -0.40081632  0.17704704  0.94507933]
 [ 1.433743   -2.01102475  1.7169227   0.3931375  -1.07379534]
 [-0.97254181  1.45370134  0.09624299 -0.57169011  0.58514234]
 [ 0.3898441   0.26563312 -0.27269157  0.59913231 -0.28453038]]
[[ 0.4         0.27836939  0.42451759  0.84477613]
 [ 0.00471886  0.12156912  0.67074908  0.82585276]
 [ 0.13670659  0.57509333  0.89132195  0.20920212]
 [ 0.18532822  0.10837689  0.21969749  0.97862378]
 [ 0.81168315  0.17194101  0.81622475

Computes the inverse of a square matrix with SciPy.

In [30]:
arr = np.array([[1, 2], [3, 4]])
iarr = scipy.linalg.inv(arr)
iarr

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

In [31]:
arr = np.array([[3, 2], [6, 4]])
scipy.linalg.inv(arr) # Computing the inverse of a singular matrix (its determinant is zero) will raise LinAlgError

LinAlgError: singular matrix

Norm and Determinant

In [32]:
square_mat = np.matrix(np.random.rand(4, 4))
print('norm of the matrix is', scipy.linalg.norm(square_mat))
print('determinant of the matrix is', scipy.linalg.det(square_mat))

norm of the matrix is 2.26384106282
determinant of the matrix is -0.2149201838167175


Eigenvalues and Eigenvectors

In [33]:
la, v = scipy.linalg.eig(square_mat)
print('eigenvalues', la)
print('first eigenvector', v[:, 0])

eigenvalues [ 1.51508859+0.j  0.92412322+0.j  0.71073537+0.j -0.21597395+0.j]
first eigenvector [-0.78102921 -0.42565283 -0.2466372  -0.38468577]


Singular Value Decomposition (SVD)

In [34]:
# Singular Value Decomposition
U,s,Vh = scipy.linalg.svd(rand_array) 
print(s)

[ 2.14438576  0.95167517  0.60771547  0.27227438]
