<div style="text-align: right">
by Igor Vustianiuk <br>
Odessa ML Club
</div>

<center> <h1>Numpy Cheat Sheet</h1> </center>

* [Built-in Constants](#Built-in-Constants)<br>
* [Array creation](#Array-creation)<br>
* [Basic attributes](#Basic-attributes)<br>
* [Broadcasting and universal functions](#Broadcasting-and-universal-functions)<br>
* [Mathematical functions](#Mathematical-functions)<br>
* [Logical functions](#Logical-functions)<br>
    * [Basic logical operators](#Basic-logical-operators)<br>
    * [array_equal and array_equiv](#array_equal-and-array_equiv)<br>
    * [isclose and allclose](#isclose-and-allclose)<br>
    * [all and any](#all-and-any)<br>
* [Other useful functions]()<br>
    * [reshape](#reshape)<br>
    * [count_nonzero](#count_nonzero)<br>
    * [clip](#clip)<br>
    * [ravel and squeeze](#ravel-and-squeeze)<br>
    * [nditer](#nditer)<br>
* [Random sampling](#Random-sampling)<br>
    * [Setting the seed for random number generator](#Setting-the-seed-for-random-number-generator)<br>
    * [Shuffling the array](#Shuffling-the-array)<br>
    * [Sampling from an array](#Sampling-from-an-array)<br>
* [Indexing](#Indexing)<br>

In [1]:
import numpy as np

<center> <h3>Built-in Constants</h3> </center>

In [2]:
np.e     # Euler's constant: 2.71828...
np.inf   # positive infinity as defined by IEEE 754
np.nan   # Not A Number as defined by IEEE 754
np.pi    # Good old pi: 3.1415...

3.141592653589793

<center> <h3>Array creation</h3> </center>

In [3]:
# Arrays from lists
a1 = np.array([0, 1, 2])
a2 = np.array([[0, 1, 2, 3], [4, 5, 6, 7]])

# Empty arrays of given shape (filled with whatever lives in memory)
a3 = np.empty((3,))
a4 = np.empty((2, 4))
a5 = np.empty_like(a1)  # same shape and dtype as a1

# Zeros
a6 = np.zeros((2, 4))
a7 = np.ones((2, 4))
a8 = np.zeros_like(a1)  # same shape and dtype as a1
a9 = np.ones_like(a1)   # same shape and dtype as a1

# Identity matrix (square matrix, ones on diagonal, zeros everywhere else)
a10 = np.identity(n=3)

# Random arrays
N = 10**4  # resulting shape: can be int or tuple of ints
a11= np.random.random(size=N)  # same as uniform(low=0, high=1)
a12 = np.random.uniform(low=-1, high=1, size=N)
a13 = np.random.normal(loc=0, scale=3, size=N)
a14 = np.random.binomial(n=100, p=0.3, size=N)
a15 = np.random.randint(low=0, high=5, size=N) # high value is exclusive

# Explicit data type specification
a16 = np.zeros(10, dtype=np.int32)

# Python's `range` extended to floats
# right end might be included in some cases when step is float
a17 = np.arange(3)               # [0, 1, 2]
a18 = np.arange(1, 5, 2)         # [1, 3]
a19 = np.arange(0.5, 0.8, 0.1)   # [0.5, 0.6, 0.7, 0.8]
a20 = np.arange(0.5, 0.8, 0.17)  # [0.5, 0.67]

# Mesh grid
X = np.array([1, 2, 3])
Y = np.array([4, 5])
M = np.meshgrid(X, Y)

<center> <h3>Basic attributes</h3> </center>

[top](#Numpy-Cheat-Sheet)

```
>>> a = np.array([0, 1, 2, 3, 4, 5], dtype=np.float16)
>>> b = np.array([
      [ 0, 1,   2, 3],
      [ 4, 5,   6, 7],
      [ 8, 9,  10, 11],
      [12, 13, 14, 15]
    ])

>>> print(a.dtype)     # float16
>>> print(a.ndim)      # 1
>>> print(a.shape)     # (6,)
>>> print(a.size)      # 6
>>> print(a.itemsize)  # 4
>>> print(a.nbytes)    # 24

>>> print(b.dtype)     # int32
>>> print(b.ndim)      # 2
>>> print(b.shape)     # (4, 4)
>>> print(b.size)      # 16
>>> print(b.itemsize)  # 4
>>> print(b.nbytes)    # 64
```

<center> <h3>Broadcasting and universal functions</h3> </center>

A **universal function** (or **ufunc** for short) is a function that operates on ndarrays in an element-by-element fashion, supporting **array broadcasting**.

Each universal function takes array inputs and produces array outputs by performing the core function element-wise on the inputs. Standard broadcasting rules are applied so that inputs not sharing exactly the same shapes can still be usefully operated on.

Broadcasting effectively unifies the shape of all input arrays. The *final* shape is defined by two rules:
1. final `ndim` is equal to the maximum `ndim` among input arrays; all input arrays with `ndim` smaller than the largest `ndim` among them, have 1’s prepended to their shapes;
2. the size in each dimension of the final shape is the maximum of all the input sizes in that dimension.

If input arrays have shapes (4,), (1, 4) and (3, 1). Then $\text{ndim} = \max(1, 2, 2) = 2$. According to the 1st rule first array's shape will be transformed to (1, 3). The *final shape* as defined by 2nd rule is $\left( \max(1, 1, 3), \max(4, 4, 1) \right) = (3, 4) $.

The third rule defines whether input arrays can be transformed to arrays of the final shape:
3. an input array can be transformed if its size in any particular dimension either matches the final size for this dimension or equals 1;

Finally the fourth rule defines how the transformation should be done:
4. if an input array has a dimension size of 1 in its shape (after prepending ones if required by the 1st rule), the first data entry in that dimension will be used for all calculations along that dimension.

In [4]:
A = np.array([1, 2, 3, 4])
print(A.ndim, A.shape)

B = np.array([[1, 2, 3, 4]])
print(B.ndim, B.shape)

C = np.array([[1], [2], [3]])
print(C.ndim, C.shape)

Z = np.broadcast(A, B, C)
print(Z.ndim, Z.shape)

1 (4,)
2 (1, 4)
2 (3, 1)
2 (3, 4)


In [5]:
U, V, W = np.broadcast_arrays(A, B, C)
print(A)
print(U)
print()
print(B)
print(V)
print()
print(C)
print(W)
print()

[1 2 3 4]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

[[1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

[[1]
 [2]
 [3]]
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]



In [6]:
np.broadcast_to(A, (3, 4))

array([[1, 2, 3, 4],
       [1, 2, 3, 4],
       [1, 2, 3, 4]])

Let's see an example of arrays that can't be broadcasted. For arrays with shapes (3,) and (1,4)  the final shape must be (1,4) and the intermediate shape of the first array is (1,3) violating the 3d rule of broadcasting.

In [7]:
# A = np.array([1, 2, 3])
# B = np.array([[1, 2, 3, 4]])
# np.broadcast(A, B)

It should be noticed that universal functions are usually much faster than equivalent code that uses Python's built-in for loops.

In [8]:
N = 10**5
A = np.random.random(size=N)
B = np.random.random(size=N)

C1 = np.zeros(N)
for i in range(N):
    C1[i] = A[i] + B[i]
C2 = A + B
assert(np.array_equal(C1, C2))  # Yeah, the results are the same. What about speed?

In [9]:
%%timeit
for i in range(N):
    C1[i] = A[i] + B[i]

49.8 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [10]:
%%timeit
C2 = A + B

109 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


To avoid problems with broadcasting many ufuncs have parameter `keepdims` which forces their output to have same shape as their input.

In [11]:
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A.sum(axis=0))
print(A.sum(axis=0, keepdims=True))
print()
print(A.sum(axis=1))
print(A.sum(axis=1, keepdims=True))

[5 7 9]
[[5 7 9]]

[ 6 15]
[[ 6]
 [15]]


<center> <h3>Mathematical functions</h3> </center>

[top](#Numpy-Cheat-Sheet)
    
Here's a list of most common mathematical functions available in `numpy`. All the functions listed below work element-wise: they are applied to each element of an array. They are called `universal functions` or `ufuncs`.

* standard arithmetic:

| short operator | equivalent ufunc |
| --- | --- |
| + | `add` |
| - | `subtract` (binary), `negative` (unary) |
| * | `multiply` |
| / | `divide` |
| ** | `power` |

* absolute value: `abs`
* trigonometry: `sin`, `cos`, `tan`, `arcsin`, `arccos`, `arctan`
* hyperbolic: just add 'h': `sinh`, `cosh`, `tanh`, `arcsinh`, `arccosh`, `arctanh`
* angles: `arctan2`, `deg2rad`, `rad2deg` (1 rad = 180 deg / pi)
* min/max in array: `min(x)`, `max(x)`; same as `x.min()`, `x.max()`
* min/max between arrays: `minimum(x, y)`, `maximum(x, y)` where `x` and `y` are arrays: `minimum([1, 2], [2, 1]) --> [2, 2]`)
* standard logarithms: `log` (base-e), `log2`, `log10`
* exponent: `exp`
* roots: `sqrt`, `cbrt`
* tests for nan/inf: `isfinite`, `isinf`, `isnan`, `isneginf`, `isposinf`
* for available linear algebra functions see [np.linalg](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)

In [12]:
# Dot product
x = np.array([1, 2, 3])
y = np.array([-3, 2, 1])
print(np.dot(x, y))

4


In [13]:
# Matrix-vector and matrix-matrix multiplication
A = np.array([[1, 2, 3], [4, 5, 6]])  
x_left = np.array([1, 1])                
x_right = np.array([1, 1, 1])         
B = np.identity(3, dtype=np.int32)

print(x_left @ A)
print(np.matmul(x_left, A))
print(np.dot(x_left, A))
print()
print(A @ x_right)
print(np.matmul(A, x_right))
print(np.dot(A, x_right))
print()
print(A @ B)
print(np.matmul(A, B))
print(np.dot(A, B))  # two above methods are recomended instead of this

[5 7 9]
[5 7 9]
[5 7 9]

[ 6 15]
[ 6 15]
[ 6 15]

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]


<center> <h3>Logical functions</h3> </center>

[top](#Numpy-Cheat-Sheet)

<center> <h4>Basic logical operators</h4> </center>

| short operator | equivalent ufunc |
| --- | --- |
| < | less |
| ⩽ | less_equal |
| > | greater |
| ⩾ | greater_equal |
| == | equal |
| != | not_equal |
| ~ | logical_not |
| & | logical_and |
| | | logical_or |
| ^ | logical_xor |

In [14]:
np.random.seed(33)

a = np.random.randint(1, 5, size=6)
b = np.random.randint(1, 5, size=6)
print('a:', a)                            
print('b:', b)
print()

print('a == b:        ', a == b)
print('a < b:         ', a < b)
print('a % 2 == 0:    ', a % 2 == 0)
print('a % 2 != b % 2:', a % 2 != b % 2)
print()

c1 = np.array([False, False, True, True])
c2 = np.array([False, True, False, True])
print('c1:', c1)
print('c2:', c2)
print('~c1:', ~c1)
print('c1 | c2:', c1 | c2)
print('c1 & c2:', c1 & c2)
print('c1 ^ c2:', c1 ^ c2)

a: [1 4 1 3 3 2]
b: [2 4 3 2 4 3]

a == b:         [False  True False False False False]
a < b:          [ True False  True False  True  True]
a % 2 == 0:     [False  True False False False  True]
a % 2 != b % 2: [ True False False  True  True  True]

c1: [False False  True  True]
c2: [False  True False  True]
~c1: [ True  True False False]
c1 | c2: [False  True  True  True]
c1 & c2: [False False False  True]
c1 ^ c2: [False  True  True False]


<center> <h4>array_equal and array_equiv</h4> </center>

In [15]:
x = np.array([1, 2, 3])
y = np.array([1, 2, 3])
print(x is y)                 # two different objects!
print(x == y)                 # element-wise comparison (equal/unequal)
print(np.array_equal(x, y))   # are all elements equal?

False
[ True  True  True]
True


`array_equiv(a1, a2)` returns True if input arrays are shape consistent and all elements equal.

**Shape consistent** means they are either the same shape, or one input array can be broadcasted to create the same shape as the other one.

In [16]:
x = np.array([1, 2, 3])
y1 = x.reshape(1, -1)
y2 = x.reshape(-1, 1)

print(np.array_equiv(x, y1))
print(np.array_equiv(x, y2))

True
False


<center> <h4>isclose and allclose</h4> </center>

Floating-point arithmetic can hurt the equality. `numpy`'s way to compare floating-point arrays is given by `isclose` and `allclose` functions.

`allclose` works element-wise and `i`-th element equals True if

$$ \left|x[i] - y[i]\right| <= \text{atol} + \text{rtol} \left|y[i]\right| $$

where user can specify the values of `atol`, `rtol` and default values are

$$ \text{atol} = 10^{-5}, \text{rtol} = 10^{-8} $$

It is strongly not recommended to use this function with default values for numbers much less than one.

<br>

`isclose` returns True if all values returned by `allclose` are True.

In [17]:
np.random.seed(33)

x = 10 * np.random.random(size=5)
y1 = 10 * np.random.random(size=5)
y2 = (1 + 1e-5) * x
y3 = y2.copy()
y3[2] += 1
print(x)
print(y1)
print(y2)
print()

print(np.isclose(x, y1))
print(np.allclose(x, y1))
print()
print(np.isclose(x, y2))
print(np.allclose(x, y2))
print()
print(np.isclose(x, y3))
print(np.allclose(x, y3))

[2.48510127 4.49975421 4.10940803 2.60299691 8.70395688]
[1.85039927 0.19661425 9.53252032 6.80450805 4.86588127]
[2.48512613 4.49979921 4.10944912 2.60302294 8.70404392]

[False False False False False]
False

[ True  True  True  True  True]
True

[ True  True False  True  True]
False


<center><h4>all and any</h4></center>

`all` tests whether all array elements along a given axis evaluate to True.

`any` tests whether at least one of array elements along a given axis evaluates to True.

If `axis=None` then both functions work with all array elements (not any particular axis).

In [18]:
a = np.array([1, 2, 2, 2, 3])
print(np.all(a % 2 == 0))
print(np.any(a % 2 == 0))

False
True


<center> <h3>Other useful functions</h3> </center>

[top](#Numpy-Cheat-Sheet)

<center><h4>reshape</h4></center>

In [19]:
a = np.array([0, 1, 2, 3, 4, 5])
b = np.array([
      [ 0, 1,   2, 3],
      [ 4, 5,   6, 7],
      [ 8, 9,  10, 11],
      [12, 13, 14, 15]
    ])

print(a.reshape((2, 3)), end='\n\n')  # same as np.reshape(array, shape)
print(a.reshape(-1, 1), end='\n\n')   
print(a.reshape((1, -1)), end='\n\n')

[[0 1 2]
 [3 4 5]]

[[0]
 [1]
 [2]
 [3]
 [4]
 [5]]

[[0 1 2 3 4 5]]



<center><h4>count_nonzero</h4></center>

In [20]:
x = np.random.binomial(n=1, p=0.3, size=10)
print(x)
print(np.count_nonzero(x))

[1 0 0 0 0 1 1 0 0 0]
3


<center><h4>clip</h4></center>

In [21]:
a = np.array([0, 3, 4, 5, 8])
print(a.clip(2, 6))

[2 3 4 5 6]


<center><h4>ravel and squeeze</h4></center>

`ravel(x)` flattens the array to 1-dimensional (returns new array).
`squeeze(x)` removes single-dimensional entries from the shape of an array (returns view).

In [22]:
x1 = np.array([[1, 2, 3], [4, 5, 6]])
x2 = x1.reshape(1, 1, 2, 3, 1)

print(x1)
print(x1.ravel())
print()

print(x2)
print(x2.ravel())
print(x2.squeeze())

[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]

[[[[[1]
    [2]
    [3]]

   [[4]
    [5]
    [6]]]]]
[1 2 3 4 5 6]
[[1 2 3]
 [4 5 6]]


<center><h4>nditer</h4></center>

In [23]:
a = np.array([[1, 2, 3], [4, 5, 6]])

for x in a:
    print(x)
print()

for x in np.nditer(a):  # for 1d arrays it's faster to use `for x in a`
    print(x)

[1 2 3]
[4 5 6]

1
2
3
4
5
6


<center> <h3>Random sampling</h3> </center>

[top](#Numpy-Cheat-Sheet)

<center> <h4>Setting the seed for random number generator</h4> </center>

In [24]:
# Always make your research reproducible! For random sampling this means setting the `seed`

# Without setting the seed
print(np.random.randint(1, 6, size=5))
print(np.random.randint(1, 6, size=5))
print()

# With setting the seed
np.random.seed(33)
print(np.random.randint(1, 6, size=5))
np.random.seed(33)
print(np.random.randint(1, 6, size=5))

[3 4 3 2 5]
[4 4 4 2 3]

[5 1 3 3 2]
[5 1 3 3 2]


<center> <h4>Shuffling the array</h4> </center>

In [25]:
np.random.seed(42)
x = np.random.randint(1, 11, size=5)
print(x)
print()
y = np.random.randint(1, 11, size=(4, 3))
print(y)

[7 4 8 5 7]

[[10  3  7]
 [ 8  5  4]
 [ 8  8  3]
 [ 6  5  2]]


In [26]:
# `shuffle(x)` performs in-place random shuffling of `x` along the first axis

np.random.shuffle(x)
print(x)
print()

np.random.shuffle(y)  # Notice that rows are intact, only their order is changed
print(y)

[7 8 7 4 5]

[[ 8  5  4]
 [ 8  8  3]
 [10  3  7]
 [ 6  5  2]]


In [27]:
# `permutation(x)` performs random shuffling of `x` along the first axis and returns new array

x1 = np.random.permutation(x)
print(x1)
print(x)  # Notice that `x` is the same as before
print()

print(np.random.permutation(5))  # same as passing np.arange(5)

[7 7 5 8 4]
[7 8 7 4 5]

[1 2 4 0 3]


<center> <h4>Sampling from an array</h4> </center>

In [28]:
# Generating poker combinations
import itertools

all_suits = ['♦', '♣', '♥', '♠']
all_ranks = ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K', 'A']
cards = itertools.product(all_suits, all_ranks)  # cartesian product of suits and ranks
cards = [x + y for (x, y) in cards]              # all possible cards

for i in range(5):
    print(np.random.choice(cards, replace=False, size=7))

['♦6' '♥K' '♦Q' '♦2' '♥J' '♥8' '♥4']
['♦5' '♠J' '♥7' '♠A' '♣2' '♥6' '♠2']
['♣8' '♠4' '♠10' '♦9' '♠Q' '♣4' '♣K']
['♥10' '♣2' '♣6' '♦J' '♦8' '♦A' '♦9']
['♥9' '♥K' '♣9' '♠Q' '♠A' '♦J' '♣3']


In [29]:
# Sampling with replacement
data = np.array(range(1, 6))
for i in range(5):
    print(np.random.choice(data, size=10))  # replace=True is the default value

[1 3 1 5 2 2 2 3 5 1]
[4 1 4 1 5 4 3 1 1 4]
[3 3 5 3 3 3 2 5 1 4]
[1 5 4 5 3 4 3 1 1 4]
[4 5 5 3 4 1 5 5 1 5]


In [30]:
# Sampling with non-uniform probabilities

# Two coins are thrown independently. Head is 0, tail is 1.
# Probabilities of sum: P(0) = P(2) = 0.25, P(1) = 0.5

vals = np.array([0, 1, 2])
prob = np.array([0.25, 0.5, 0.25])
N = 10000
for i in range(5):
    s = np.random.choice(vals, size=N, p=prob)
    p0 = np.count_nonzero(s == 0) / N
    p1 = np.count_nonzero(s == 1) / N
    p2 = np.count_nonzero(s == 2) / N
    print('p0 = {:.4f},  p1 = {:.4f},  p2 = {:.4f}'.format(p0, p1, p2))

p0 = 0.2445,  p1 = 0.5085,  p2 = 0.2470
p0 = 0.2495,  p1 = 0.5079,  p2 = 0.2426
p0 = 0.2422,  p1 = 0.5036,  p2 = 0.2542
p0 = 0.2504,  p1 = 0.5026,  p2 = 0.2470
p0 = 0.2448,  p1 = 0.5021,  p2 = 0.2531


<center> <h3>Indexing</h3> </center>

[top](#Numpy-Cheat-Sheet)

```
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> b = np.array([
      [ 0, 1,   2, 3],
      [ 4, 5,   6, 7],
      [ 8, 9,  10, 11],
      [12, 13, 14, 15]
    ])
```

* element-wise:
    ```
    >>> a[1]     # 1
    
    >>> b[1][1]  # 5
    >>> b[1, 1]  # 5
    >>> b[1]     # [4, 5, 6, 7]
    >>> b[4]     # IndexError: out of bounds
    ```
* slices:
    ```
    >>> a[::-1]      # [5, 4, 3, 2, 1, 0]
    >>> a[1::2]  # [1, 3, 5]
    
    >>> b[:, 0]      # [0, 4], 1d-array
    >>> b[0, :]      # [0, 1, 2, 3], 1d-array
    >>> b[1:3, :2]   # [[4, 5], [8, 9]], 2d-array
    >>> b[-2:, -2:]  # [[10, 11], [14, 15]], 2d-array
    ```
* logical masks:
    ```
    >>> a % 2 == 0        # [True, False, True, False, True, False]
    >>> a[a % 2 == 0]     # [0, 2, 4]
    
    >>> b[b**2 % 7 == 1]  # [1, 6, 8, 13, 15]
    ```
* fancy:
    * accepts one array for each dimension of the original array
    * broadcasts their shapes to common shape S
    * the result of indexing has shape S
    * what happens next? see the example below
    ```
    >>> ix = np.array([-1, 1, 1, -1])
    >>> a[ix]   # [5, 1, 1, 5]    
    
    >>> ix = np.array([[1, 2, 3], [-1, -2, -3]])
    >>> a[ix]   # [[1, 2, 3], [5, 4, 3]]    
    
    >>> ix_1 = [1, 1, 2]
    >>> ix_2 = [1, 1, 2]
    >>> b[ix_1, ix_2]   # [b[1,1], b[1,1], b[2,2]] = [5, 5, 10]
    
    >>> ix_1 = [[0, 1], [2, 3]]
    >>> ix_2 = [1, 2]  -->  [[1, 2], [1, 2]]
    >>> b[ix_1, ix_2] = [[b[0,1], b[1,2]], [b[2,1], b[3,2]]] = [[1, 6], [9, 14]]
    ```
* four methods described above can be mixed: each axis can have its own method
    * play with it

In [31]:
# Indexing can be used not only for values extraction but also to modify an array:
x = np.array([[1, 2, 3], [4, 5, 6]])
x[x % 2 == 1] = 50
print(a)

[[1 2 3]
 [4 5 6]]


In [32]:
# It should be noticed that slicing returns a `view` of the original array
# and modification of `view` modifies the array
x = np.array([[1, 2, 3], [4, 5, 6]])
y = x[0, :2]
y[0] = 100
print(y)
print(x)

[100   2]
[[100   2   3]
 [  4   5   6]]


In [34]:
# To extract new array with slicing `.copy()` method can be used
x = np.array([[1, 2, 3], [4, 5, 6]])
y = x.copy()[0, :2]
y[0] = 100
print(y)
print(x)

[100   2]
[[1 2 3]
 [4 5 6]]
