## Overview

- How to import NumPy
- How to create multidimensional NumPy ndarrays using various methods
- How to access and change elements in ndarrays
- How to load and save ndarrays
- How to use slicing to select or change subsets of an ndarray
- Understand the difference between a view and a copy an of ndarray
- How to use Boolean indexing and set operations to select or change subsets of an ndarray
- How to sort ndarrays
- How to perform element-wise operations on ndarrays
- Understand how NumPy uses broadcasting to perform operations on ndarrays of different sizes.

NumPy is included with Anaconda.

For this article I'll be using numpy 1.13.3.

- The core of NumPy is an N-dimential array type.
- multidimensional array data structures that can represent vectors and matrices
- large number of optimized built-in mathematical functions

In [2]:
!conda list numpy

# packages in environment at /Users/cjimti/anaconda3:
#
# Name                    Version                   Build  Channel
numpy                     1.13.3           py36ha9ae307_4  
numpy-base                1.14.3           py36ha9ae307_2  
numpydoc                  0.8.0                    py36_0  


# Resources

- [NumPy Manual](https://docs.scipy.org/doc/numpy-1.13.0/contents.html)
- [NumPy User Guide](https://docs.scipy.org/doc/numpy-1.13.0/user/index.html)
- [NumPy Reference](https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html#reference)
- [Scipy Lectures](http://www.scipy-lectures.org/intro/numpy/index.html)


In [3]:
import numpy as np

Generate one hundred million random numbers.

In [4]:
x = np.random.random(100000000)

Finding the mean with regular built-in Python 3.

In [5]:
%%timeit -n 1
sum(x) / len(x)

6.62 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Finding the mean with Numpy.

In [6]:
%%timeit -n 1
np.mean(x)

46 ms ± 4.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Creating NumPy Arrays

In [7]:
import numpy as np

x = np.array([1,2,3,4,5])

In [8]:
print(x)
print(type(x))

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [9]:
x.dtype

dtype('int64')

In [10]:
x.shape

(5,)

In [11]:
y = np.array([[1,1,1],[2,2,2],[3,3,3]])

In [12]:
y.shape

(3, 3)

In [13]:
y.size

9

- Rank N is the number if dimentions.
- Rank 1 array is flat
- All elements must be of the same type

Save and load an array.

In [14]:
np.save('some_array', y)

In [15]:
yy = np.load('some_array.npy')
print(yy)

[[1 1 1]
 [2 2 2]
 [3 3 3]]


### Built-in Functions to Create ndarrays

In [16]:
zeros = np.zeros((3,4))
print(zeros)

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [17]:
print(zeros.dtype)

float64


In [18]:
myn = np.full((2,2), 5)
print(myn)

[[5 5]
 [5 5]]


In [19]:
# what we used as a constant
print(myn.dtype)

int64


Identity matrix is a square with ones along is diagonal and zeros everwhere else

In [20]:
ident = np.eye(8)
print(ident)

[[ 1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.]]


In [21]:
diag = np.diag([2,4,5,6,8,10,12])
print(diag)

[[ 2  0  0  0  0  0  0]
 [ 0  4  0  0  0  0  0]
 [ 0  0  5  0  0  0  0]
 [ 0  0  0  6  0  0  0]
 [ 0  0  0  0  8  0  0]
 [ 0  0  0  0  0 10  0]
 [ 0  0  0  0  0  0 12]]


In [22]:
# arange with stop only
ar_stop = np.arange(14)
print(ar_stop)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13]


In [23]:
# arange rank 1 array with start and stop
ar_ss = np.arange(5,10)
print(ar_ss)

[5 6 7 8 9]


In [24]:
# arange rank 1 array with start, stop and step
ar_sss = np.arange(2,100,10)
print(ar_sss)

[ 2 12 22 32 42 52 62 72 82 92]


In [25]:
# rank 1 array of evenly spaced floats
lsp = np.linspace(0, 20, 30)
print(lsp)

[  0.           0.68965517   1.37931034   2.06896552   2.75862069
   3.44827586   4.13793103   4.82758621   5.51724138   6.20689655
   6.89655172   7.5862069    8.27586207   8.96551724   9.65517241
  10.34482759  11.03448276  11.72413793  12.4137931   13.10344828
  13.79310345  14.48275862  15.17241379  15.86206897  16.55172414
  17.24137931  17.93103448  18.62068966  19.31034483  20.        ]


In [26]:
# rank 1 array of evenly spaced floats excluding endpoint
lspe = np.linspace(0, 20, 30, endpoint=False)
print(lspe)

[  0.           0.66666667   1.33333333   2.           2.66666667
   3.33333333   4.           4.66666667   5.33333333   6.           6.66666667
   7.33333333   8.           8.66666667   9.33333333  10.          10.66666667
  11.33333333  12.          12.66666667  13.33333333  14.          14.66666667
  15.33333333  16.          16.66666667  17.33333333  18.          18.66666667
  19.33333333]


In [27]:
# reshape
r1 = np.arange(10)
print(r1)

r2 = np.reshape(r1, (2,5))
print(r2)

[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
 [5 6 7 8 9]]


In [28]:
# combine using functions
r2b = np.arange(10).reshape(2,5)
print(r2b)

[[0 1 2 3 4]
 [5 6 7 8 9]]


In [29]:
# create a 4x4 rank 2 array 
np.random.random((4,4))

array([[ 0.88594678,  0.0419105 ,  0.20020519,  0.57954225],
       [ 0.30186062,  0.34069731,  0.01526394,  0.8937293 ],
       [ 0.40411583,  0.55458393,  0.96808681,  0.66675887],
       [ 0.40085971,  0.94131935,  0.97772401,  0.58866584]])

In [30]:
xx = np.random.randint(0,100,(5,5))
print(xx)

[[12 82 95 76 84]
 [55 38 66 87  8]
 [81 26 87 90 79]
 [88 37 27 16 97]
 [75 37 59 98 29]]


In [31]:
# random numbers drawn from probability distributions
# mean of 0 and standard deviation of .5
xa = np.random.normal(0,0.5, size=(10,4))

print(xa)
print(f'mean: {xa.mean()}')
print(f' std: {xa.std()}')

[[-0.24382788  0.17027355  0.38391686 -0.12594951]
 [-0.16702321  0.07295548 -0.05549999  0.52401363]
 [-0.40960325  0.05987577  0.18253092 -0.37751977]
 [ 0.74185349 -0.45420612 -0.22627199 -0.06697274]
 [ 0.3491049  -0.00443179 -0.03751445  0.59195286]
 [-0.17702606  0.49502859 -0.5362266  -0.10000391]
 [ 0.25976116 -1.26516021  0.83992969 -0.03165471]
 [-1.00654172  0.88203221  0.04580769  0.46579482]
 [ 0.55703551  0.80667716  0.06339081 -0.3753196 ]
 [-0.68192791  0.3825767   0.10822668  0.35510455]]
mean: 0.04987904062889002
 std: 0.4719668968868103


## Accessing, Deleting, and Inserting Elements Into ndarrays

In [32]:
# access value
print(xa[0][0])

# numpy style (rows, columns)
print(xa[0,0])

# delete column 0
xad = np.delete(xa, 0, axis=1)
print(xad)
print(f' size: {xad.size}')
print(f'shape: {xad.shape}')

# delete row 0, 1 and 2
xadr = np.delete(xad, [0,2], axis=0)
print("\n", xadr)
print(f' size: {xadr.size}')
print(f'shape: {xadr.shape}')

-0.243827876697
-0.243827876697
[[ 0.17027355  0.38391686 -0.12594951]
 [ 0.07295548 -0.05549999  0.52401363]
 [ 0.05987577  0.18253092 -0.37751977]
 [-0.45420612 -0.22627199 -0.06697274]
 [-0.00443179 -0.03751445  0.59195286]
 [ 0.49502859 -0.5362266  -0.10000391]
 [-1.26516021  0.83992969 -0.03165471]
 [ 0.88203221  0.04580769  0.46579482]
 [ 0.80667716  0.06339081 -0.3753196 ]
 [ 0.3825767   0.10822668  0.35510455]]
 size: 30
shape: (10, 3)

 [[ 0.07295548 -0.05549999  0.52401363]
 [-0.45420612 -0.22627199 -0.06697274]
 [-0.00443179 -0.03751445  0.59195286]
 [ 0.49502859 -0.5362266  -0.10000391]
 [-1.26516021  0.83992969 -0.03165471]
 [ 0.88203221  0.04580769  0.46579482]
 [ 0.80667716  0.06339081 -0.3753196 ]
 [ 0.3825767   0.10822668  0.35510455]]
 size: 24
shape: (8, 3)


In [33]:
# simple append
ar = np.arange(10)
print(ar)

ar2 = np.append(ar, [10,11])
print("\n",ar2)

# reshape
arnd = ar2.reshape(3,4)
print("\n",arnd)

# append a row
arnd2 = np.append(arnd, [[12,13,14,15]], axis=0)
print("\nappend a row:\n",arnd2)

# append a column
arnd3 = np.append(arnd2, [[0],[0],[0],[0]], axis=1)
print("\nappend a col:\n",arnd3)

[0 1 2 3 4 5 6 7 8 9]

 [ 0  1  2  3  4  5  6  7  8  9 10 11]

 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

append a row:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

append a col:
 [[ 0  1  2  3  0]
 [ 4  5  6  7  0]
 [ 8  9 10 11  0]
 [12 13 14 15  0]]


In [34]:
# insert into array
params = np.array([0,0,0,0])
print(params)

# insert at index 2
params2 = np.insert(params, 2, [1,1,1])
print(params2)

[0 0 0 0]
[0 0 1 1 1 0 0]


In [35]:
zeros = np.full((5,5), 0)
print(zeros)

# insert row
zr = np.insert(zeros, 1, [1,2,3,4,5], axis=0)
print("\ninstert a row:\n",zr)

# insert row
zr2 = np.insert(zr, 2, 5, axis=1)
print("\ninstert a column:\n",zr2)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

instert a row:
 [[0 0 0 0 0]
 [1 2 3 4 5]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

instert a column:
 [[0 0 5 0 0 0]
 [1 2 5 3 4 5]
 [0 0 5 0 0 0]
 [0 0 5 0 0 0]
 [0 0 5 0 0 0]
 [0 0 5 0 0 0]]


In [40]:
# stacking arrays

x = np.full((5,5), 6)
y = np.full((5,5), 7)

vs = np.vstack((x, y))

print("\nvstack:\n",vs)

[[6 6 6 6 6]
 [6 6 6 6 6]
 [6 6 6 6 6]
 [6 6 6 6 6]
 [6 6 6 6 6]
 [7 7 7 7 7]
 [7 7 7 7 7]
 [7 7 7 7 7]
 [7 7 7 7 7]
 [7 7 7 7 7]]


In [42]:
hs = np.hstack((x, y))

print("\nhstack:\n",hs)


hstack:
 [[6 6 6 6 6 7 7 7 7 7]
 [6 6 6 6 6 7 7 7 7 7]
 [6 6 6 6 6 7 7 7 7 7]
 [6 6 6 6 6 7 7 7 7 7]
 [6 6 6 6 6 7 7 7 7 7]]


## Slicing Arrays

In [43]:
import numpy as np

1. ndarray[start:end]
2. ndarray[start:]
3. ndarray[:end]

We should note that in methods one and three, the end index is excluded. We should also note that since ndarrays can be multidimensional, when doing slicing you usually have to specify a slice for each dimension of the array.

In [44]:
X = np.arange(1, 21).reshape(4,5)
print(X)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]


In [46]:
z = X[1:4, 2:5]
print(z)

[[ 8  9 10]
 [13 14 15]
 [18 19 20]]


In [51]:
zz = X[:, 1:2] # zz is a view of X
print(zz)

[[ 2]
 [ 7]
 [12]
 [17]]


In [55]:
zz[0][0] = 0
print(zz)

[[ 0]
 [ 7]
 [12]
 [17]]


In [56]:
print(X)  # X was also changed

[[ 1  0  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]


In [58]:
# copy an array
Y = X.copy()
Y[0][0] = 0
print(Y)

[[ 0  0  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]


In [59]:
print(X)

[[ 1  0  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]


In [63]:
indices = np.array([1,3])
print(indices)

[1 3]


In [65]:
z = X[:, indices] # all rows for columns 1 and 3
print(z)

[[ 0  4]
 [ 7  9]
 [12 14]
 [17 19]]


In [66]:
YY = np.diag(X)
print(YY)

[ 1  7 13 19]


In [67]:
YY = np.diag(X, k=1)
print(YY)

[ 0  8 14 20]


In [68]:
YY = np.diag(X, k=-1)
print(YY)

[ 6 12 18]


In [70]:
X = np.array([[2,2,1],[3,3,1],[5,5,5]])
print(X)

[[2 2 1]
 [3 3 1]
 [5 5 5]]


In [75]:
print(np.unique(X))

[1 2 3 5]


## Boolean Indexing, Set Operations, and Sorting

In [76]:
import numpy as np

In [77]:
# boolean indexing
X = np.arange(25).reshape(5,5)
print(X)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


In [79]:
print(X|X > 10)

[[False False False False False]
 [False False False False False]
 [False  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]


In [83]:
print(X[(X > 10) & (X < 20)])

[11 12 13 14 15 16 17 18 19]


In [110]:
T = np.arange(1,26).reshape(5,5)
print(T)

print(T[T % 2 < 1])

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
[ 2  4  6  8 10 12 14 16 18 20 22 24]


## Arithmetic operations and Broadcasting

In [116]:
import numpy as np

x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
print(x)
print(y)

[1 2 3 4]
[5 6 7 8]


In [117]:
# elementwise operation
print(x + y)
print(np.add(x,y))

[ 6  8 10 12]
[ 6  8 10 12]


In [167]:
print(x - y)
print(np.subtract(x,y))
print(x * y)
print(np.multiply(x,y))
print(x / y)
print(np.divide(x,y))


[-4 -4 -4 -4]
[-4 -4 -4 -4]
[ 5 12 21 32]
[ 5 12 21 32]
[ 0.2         0.33333333  0.42857143  0.5       ]
[ 0.2         0.33333333  0.42857143  0.5       ]


In [119]:
X = np.array([1,2,3,4]).reshape(2,2)
print(X)

[[1 2]
 [3 4]]


In [120]:
Y = np.array([5,6,7,8]).reshape(2,2)
print(Y)

[[5 6]
 [7 8]]


In [121]:
print(X / Y)

[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]


In [123]:
print(np.sqrt(x))
print(np.exp(x))
print(np.power(x,2))

[ 1.          1.41421356  1.73205081  2.        ]
[  2.71828183   7.3890561   20.08553692  54.59815003]
[ 1  4  9 16]


In [125]:
print(X)

[[1 2]
 [3 4]]


In [127]:
print("Average of all numbers:", X.mean())

Average of all numbers: 2.5


In [128]:
print("Average of all columns:", X.mean(axis=0))
print("Average of all rows:", X.mean(axis=1))

Average of all columns: [ 2.  3.]
Average of all rows: [ 1.5  3.5]


In [129]:
# add to every element (broadcasting)
print(2 + X)

[[3 4]
 [5 6]]


In [146]:
ff = np.arange(4).reshape(2,2)
gg = np.array([1,1])
print(ff)
print(gg)

print("Broadcasing:")
print(ff + gg)

[[0 1]
 [2 3]]
[1 1]
Broadcasing:
[[1 2]
 [3 4]]


In [152]:
ii = np.array([1,1,1,1,1,1,1,1,1]).reshape(3,3)
print(ii)

[[1 1 1]
 [1 1 1]
 [1 1 1]]


In [156]:
jj = np.array([1,2,3]).reshape(3,1)
print(jj)

[[1]
 [2]
 [3]]


In [157]:
print(ii + jj)

[[2 2 2]
 [3 3 3]
 [4 4 4]]


In [162]:
X = np.full((4,4), 0) + np.array([1,2,3,4])
print(X)

[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]


# mean normalization
Therefore, when you perform mean normalization your data will not only be scaled but it will also have an average of zero.

In [180]:
X = np.random.randint(5000, size=(10,10))
print(X)

[[4281 2373 2764 3742  381 2246 2547 4704 2485  662]
 [4858 2164 3738 1470 3643 4116 2863 3891 1485 3426]
 [4205  737 3509 3375 3226 3972 1215 4522 1885 1711]
 [4497 1948 2841 2307 4465 4804 3735  928 2845 3533]
 [1582 3670 4196 4086  838 2407 1100  573 4562 3149]
 [2489 3336 3144  441 3746 4258 3840  745 3197 2616]
 [4131 2957 3821  465 1321 2692 1065 4465 1154 4063]
 [3576 3800 2843 1097 1111  234 1430 2881  184 3344]
 [1773 3177   54 1536 3900 4371 4676 1773 4391 2519]
 [ 674 1577 3205  435 2401 1858 1097 4856 4444  181]]


In [181]:
print(X.mean(axis=0)) # avg of columns

[ 3206.6  2573.9  3011.5  1895.4  2503.2  3095.8  2356.8  2933.8  2663.2
  2520.4]


In [182]:
# Average of the values in each column of X
ave_cols = X.mean(axis=0)

# Standard Deviation of the values in each column of X
std_cols = X.std(axis=0)

print(ave_cols)
print(std_cols)

[ 3206.6  2573.9  3011.5  1895.4  2503.2  3095.8  2356.8  2933.8  2663.2
  2520.4]
[ 1384.12103517   936.17246808  1084.16615424  1334.50209442  1406.7242658
  1370.58504297  1294.93148854  1683.72639107  1432.81539634  1220.21524331]


In [183]:
# substract the mean from each column then divide by the standard deviation
//X_norm = (X - ave_cols) / std_cols
print(X)
//print(X_norm)

SyntaxError: invalid syntax (<ipython-input-183-7dea2e353f31>, line 2)

In [190]:
X = np.array([[1,2,3,4],[2,3,4,5],[1,2,3,4],[2,3,4,5]])
print(X)

[[1 2 3 4]
 [2 3 4 5]
 [1 2 3 4]
 [2 3 4 5]]


In [191]:
# min value of each column
print(X.min(axis=0))

[1 2 3 4]


In [192]:
# max value of each row
print(X.max(axis=1))

[4 5 4 5]


In [194]:

print("Average min value of each column:",X.min(axis=0).mean())
print("Average max value of each column:",X.max(axis=0).mean())

2.5
