In [None]:
![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [3]:
import sys
import numpy as np

## Basic Numpy Arrays

In [4]:
a = np.array([1,2,3,4])
b = np.array([0,0.5,1,1.5,2])

In [19]:
a[::2]

array([1, 3])

In [23]:
a[::3]

array([1, 4])

In [20]:
b[::2]

array([0., 1., 2.])

In [24]:
b[::3]

array([0. , 1.5])

In [27]:
a[1::]

array([2, 3, 4])

In [28]:
b[3::]

array([1.5, 2. ])

In [6]:
a[0], a[1]

(1, 2)

In [9]:
a[:]

array([1, 2, 3, 4])

In [11]:
a[0:4]

array([1, 2, 3, 4])

In [29]:
b[0], b[3], b[-3]

(0.0, 1.5, 1.0)

In [12]:
b[[0, 2, -1]]

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [5]:
a

array([1, 2, 3, 4])

In [6]:
a.dtype

dtype('int64')

In [8]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [9]:
b.dtype

dtype('float64')

In [11]:
np.array([1, 2, 3, 4], dtype=np.float64)

array([1., 2., 3., 4.])

In [13]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [15]:
c = np.array(['a', 'b', 'c'])

In [16]:
c.dtype

dtype('<U1')

In [21]:
d = np.array([{'a': 1}, sys])

In [22]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [17]:
A = np.array([
    [1,2,3],
    [4,5,6]
])

In [18]:
A.shape

(2, 3)

In [19]:
A.ndim

2

In [20]:
A.size

6

In [21]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [22]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [29]:
B.shape

(2, 2, 3)

In [30]:
B.ndim

3

In [31]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [32]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

In [33]:
C.dtype

dtype('O')

In [34]:
C.shape

(2,)

In [35]:
C.size

2

In [None]:
type(C[0])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [32]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [37]:
A[1]

array([4, 5, 6])

In [23]:
A[1][0]

4

In [None]:
# A[d1, d2, d3, d4]

In [36]:
A[2,0]

7

In [25]:
A[1, 1]

5

In [40]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [37]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [42]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [43]:
A[:2, 2:]

array([[3],
       [6]])

In [44]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [45]:
A[1] = np.array([10, 10, 10])

In [46]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [47]:
A[2] = 99

In [48]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [49]:
a = np.array([1, 2, 3, 4])

In [50]:
a.sum()

10

In [51]:
a.mean()

2.5

In [52]:
a.std()

1.118033988749895

In [53]:
a.var()

1.25

In [54]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [55]:
A.sum()

45

In [56]:
A.mean()

5.0

In [57]:
A.std()

2.581988897471611

In [58]:
A.sum(axis=0)

array([12, 15, 18])

In [59]:
A.sum(axis=1)

array([ 6, 15, 24])

In [60]:
A.mean(axis=0)

array([4., 5., 6.])

In [61]:
A.mean(axis=1)

array([2., 5., 8.])

In [62]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [63]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [66]:
a = np.arange(4)

In [67]:
a

array([0, 1, 2, 3])

In [68]:
a + 10

array([10, 11, 12, 13])

In [69]:
a * 10

array([ 0, 10, 20, 30])

In [70]:
a

array([0, 1, 2, 3])

In [71]:
a += 100

In [73]:
a

array([100, 101, 102, 103])

In [74]:
l = [0, 1, 2, 3]

In [75]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [76]:
a = np.arange(4)

In [77]:
a

array([0, 1, 2, 3])

In [78]:
b = np.array([10, 10, 10, 10])

In [79]:
b

array([10, 10, 10, 10])

In [80]:
a + b

array([10, 11, 12, 13])

In [81]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [38]:
a = np.arange(4)

In [39]:
a

array([0, 1, 2, 3])

In [44]:
a.dtype

dtype('int64')

In [46]:
a.sum()

6

In [48]:
a.shape

(4,)

In [42]:
a[0], a[-4]

(0, 0)

In [49]:
a.ndim

1

In [50]:
a.mean()

1.5

In [84]:
a[[0, -1]]

array([0, 3])

In [86]:
a[[True, False, False, True]]

array([0, 3])

In [89]:
a

array([0, 1, 2, 3])

In [88]:
a >= 2

array([False, False,  True,  True])

In [90]:
a[a >= 2]

array([2, 3])

In [91]:
a.mean()

1.5

In [92]:
a[a > a.mean()]

array([2, 3])

In [93]:
a[~(a > a.mean())]

array([0, 1])

In [94]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [95]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [51]:
A = np.random.randint(100, size=(3, 3))

In [52]:
A

array([[62, 11, 63],
       [84, 91, 40],
       [32, 18, 99]])

In [53]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([62, 63, 91, 32, 99])

In [99]:
A > 30

array([[ True, False,  True],
       [ True,  True, False],
       [False,  True,  True]])

In [100]:
A[A > 30]

array([71, 42, 40, 94, 85, 36])

In [55]:
B = np.random.randint(1000,size=(10,10))

In [56]:
B

array([[492, 291, 354, 878, 218, 429, 773, 285, 887, 271],
       [555, 593, 941, 559, 521, 414, 675, 435, 394, 146],
       [873, 770, 946, 316, 360, 138, 695, 412, 349, 840],
       [244, 989, 356, 266, 299, 701, 645, 947, 226, 613],
       [234, 723, 151, 739, 766, 607, 760, 815, 161, 810],
       [484, 445, 591, 565, 473, 550, 746, 910, 288, 473],
       [564, 832, 393, 802, 865, 387, 615,  52,  65, 891],
       [658, 346, 712,  48, 583,  67, 899, 939, 835, 490],
       [880, 927, 679, 450, 882, 870, 786,  25, 891, 472],
       [678, 735, 336, 476, 324, 518, 334, 506, 961, 539]])

In [57]:
B.mean()

556.99

In [58]:
B > B.mean()

array([[False, False, False,  True, False, False,  True, False,  True,
        False],
       [False,  True,  True,  True, False, False,  True, False, False,
        False],
       [ True,  True,  True, False, False, False,  True, False, False,
         True],
       [False,  True, False, False, False,  True,  True,  True, False,
         True],
       [False,  True, False,  True,  True,  True,  True,  True, False,
         True],
       [False, False,  True,  True, False, False,  True,  True, False,
        False],
       [ True,  True, False,  True,  True, False,  True, False, False,
         True],
       [ True, False,  True, False,  True, False,  True,  True,  True,
        False],
       [ True,  True,  True, False,  True,  True,  True, False,  True,
        False],
       [ True,  True, False, False, False, False, False, False,  True,
        False]])

In [60]:
C=B[B>B.mean()]

In [63]:
d = C/B.mean()

In [66]:
e = d.max()/d.mean()/(d.min()/d.max())

In [67]:
e

2.2524770613130856

In [69]:
d.var()

0.049333074935040785

In [70]:
d.std()

0.22211050163159954

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [71]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [72]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [103]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [104]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [105]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [106]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [107]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [73]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [109]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [110]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [112]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [111]:
np.dtype(float).itemsize

8

### Lists are even larger

In [None]:
# A one-element list
sys.getsizeof([1])

In [None]:
# An array of one element in numpy
np.array([1]).nbytes

### And performance is also important

In [117]:
l = list(range(100000))

In [118]:
a = np.arange(100000)

In [119]:
%time np.sum(a ** 2)

CPU times: user 1.06 ms, sys: 279 µs, total: 1.34 ms
Wall time: 701 µs


333328333350000

In [120]:
%time sum([x ** 2 for x in l])

CPU times: user 36.1 ms, sys: 0 ns, total: 36.1 ms
Wall time: 35.5 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random` 

In [74]:
np.random.random(size=2)

array([0.64438262, 0.31996962])

In [75]:
np.random.normal(size=2)

array([-1.58907959, -0.50394716])

In [76]:
np.random.rand(2, 4)

array([[0.84626735, 0.99626253, 0.86320977, 0.20565857],
       [0.32819679, 0.64771704, 0.22499288, 0.6609838 ]])

---
### `arange`

In [77]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [78]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [79]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

In [81]:
np.arange(23,1000,33)

array([ 23,  56,  89, 122, 155, 188, 221, 254, 287, 320, 353, 386, 419,
       452, 485, 518, 551, 584, 617, 650, 683, 716, 749, 782, 815, 848,
       881, 914, 947, 980])

In [84]:
np.arange(-34,-2, 0.73928)

array([-34.     , -33.26072, -32.52144, -31.78216, -31.04288, -30.3036 ,
       -29.56432, -28.82504, -28.08576, -27.34648, -26.6072 , -25.86792,
       -25.12864, -24.38936, -23.65008, -22.9108 , -22.17152, -21.43224,
       -20.69296, -19.95368, -19.2144 , -18.47512, -17.73584, -16.99656,
       -16.25728, -15.518  , -14.77872, -14.03944, -13.30016, -12.56088,
       -11.8216 , -11.08232, -10.34304,  -9.60376,  -8.86448,  -8.1252 ,
        -7.38592,  -6.64664,  -5.90736,  -5.16808,  -4.4288 ,  -3.68952,
        -2.95024,  -2.21096])

---
### `reshape`

In [85]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [86]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [87]:
f = np.arange(-34,-2, 0.73928)

In [88]:
f.size

44

In [89]:
f.shape

(44,)

In [95]:
g = f.reshape(4,11)

In [99]:
g.reshape(11,4)

array([[-34.     , -33.26072, -32.52144, -31.78216],
       [-31.04288, -30.3036 , -29.56432, -28.82504],
       [-28.08576, -27.34648, -26.6072 , -25.86792],
       [-25.12864, -24.38936, -23.65008, -22.9108 ],
       [-22.17152, -21.43224, -20.69296, -19.95368],
       [-19.2144 , -18.47512, -17.73584, -16.99656],
       [-16.25728, -15.518  , -14.77872, -14.03944],
       [-13.30016, -12.56088, -11.8216 , -11.08232],
       [-10.34304,  -9.60376,  -8.86448,  -8.1252 ],
       [ -7.38592,  -6.64664,  -5.90736,  -5.16808],
       [ -4.4288 ,  -3.68952,  -2.95024,  -2.21096]])

---
### `linspace`

In [100]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [101]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [102]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

In [103]:
np.linspace(2,10,50)

array([ 2.        ,  2.16326531,  2.32653061,  2.48979592,  2.65306122,
        2.81632653,  2.97959184,  3.14285714,  3.30612245,  3.46938776,
        3.63265306,  3.79591837,  3.95918367,  4.12244898,  4.28571429,
        4.44897959,  4.6122449 ,  4.7755102 ,  4.93877551,  5.10204082,
        5.26530612,  5.42857143,  5.59183673,  5.75510204,  5.91836735,
        6.08163265,  6.24489796,  6.40816327,  6.57142857,  6.73469388,
        6.89795918,  7.06122449,  7.2244898 ,  7.3877551 ,  7.55102041,
        7.71428571,  7.87755102,  8.04081633,  8.20408163,  8.36734694,
        8.53061224,  8.69387755,  8.85714286,  9.02040816,  9.18367347,
        9.34693878,  9.51020408,  9.67346939,  9.83673469, 10.        ])

In [104]:
np.linspace(2,10,50,False)

array([2.  , 2.16, 2.32, 2.48, 2.64, 2.8 , 2.96, 3.12, 3.28, 3.44, 3.6 ,
       3.76, 3.92, 4.08, 4.24, 4.4 , 4.56, 4.72, 4.88, 5.04, 5.2 , 5.36,
       5.52, 5.68, 5.84, 6.  , 6.16, 6.32, 6.48, 6.64, 6.8 , 6.96, 7.12,
       7.28, 7.44, 7.6 , 7.76, 7.92, 8.08, 8.24, 8.4 , 8.56, 8.72, 8.88,
       9.04, 9.2 , 9.36, 9.52, 9.68, 9.84])

In [105]:
h = np.linspace(666, 1033, 32, False)

In [106]:
h.size

32

In [108]:
h.dtype

dtype('float64')

In [110]:
h = h.reshape(8,4)

In [115]:
i = h[h<h.mean()]

In [120]:
i[i>i.mean()] = i.mean()

In [125]:
i.mean()

729.078125

---
### `zeros`, `ones`, `empty`

In [126]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [127]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [128]:
np.zeros((3, 3), dtype=np.int)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.zeros((3, 3), dtype=np.int)


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [129]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [130]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [131]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [132]:
np.empty((2, 2))

array([[7.74860419e-304, 7.74860419e-304],
       [0.00000000e+000, 0.00000000e+000]])

---
### `identity` and `eye`

In [133]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [135]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [136]:
np.identity(2)

array([[1., 0.],
       [0., 1.]])

In [134]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [137]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [139]:
np.eye(8,3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [138]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [140]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [142]:
"Hello World"[3]

'l'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)