![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [13]:
import sys
import numpy as np

## Basic Numpy Arrays

In [2]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [4]:
a = np.array([1, 2, 3, 4])

In [3]:
b = np.array([0, .5, 1, 1.5, 2])

In [None]:
a[0], a[1]

(1, 2)

In [None]:
a[0:]

array([1, 2, 3, 4])

In [None]:
a[1:3]

array([2, 3])

In [None]:
a[1:-1]

array([2, 3])

In [None]:
a[::2]

array([1, 3])

In [None]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [6]:
b[0], b[2], b[-1] #This just call the number in the numpy array

(np.float64(0.0), np.float64(1.0), np.float64(2.0))

In [32]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [31]:
b[[0, 2, -1]] #Create another numpy array, cụ thể hơn là mảng numpy này chứa các phần tử của b tại 0,2 và -1

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [8]:
a

array([1, 2, 3, 4])

In [9]:
a.dtype

dtype('int64')

In [10]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [11]:
b.dtype

dtype('float64')

In [15]:
np.array([1, 2, 3, 4], dtype=np.float64) #Cannot used np.float like the video because it has been updated.

array([1., 2., 3., 4.])

In [None]:
np.array([1, 2, 3, 4], dtype=np.int8) #Transform it into int 8 (smaller) for performance

array([1, 2, 3, 4], dtype=int8)

In [17]:
c = np.array(['a', 'b', 'c'])

In [18]:
c.dtype

dtype('<U1')

In [21]:
d = np.array([{'a': 1}, sys]) #There is no point in doing this, just for demonstration. Numpy is usually used for number, dated, boolena.

In [20]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [22]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [23]:
A.shape

(2, 3)

In [None]:
A.ndim

2

In [None]:
A.size

6

In [25]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [26]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [27]:
B.shape

(2, 2, 3)

In [28]:
B.ndim

3

In [29]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [30]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
]) #Basically, this is wrong.

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [None]:
C.dtype

dtype('O')

In [None]:
C.shape

(2,)

In [None]:
C.size

2

In [None]:
type(C[0])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [33]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [34]:
A[1] #Called out the row

array([4, 5, 6])

In [35]:
A[1][0]

np.int64(4)

In [None]:
# A[d1, d2, d3, d4]

In [36]:
A[1, 0]

np.int64(4)

In [37]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [38]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [39]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [None]:
A[:2, 2:]

array([[3],
       [6]])

In [40]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [41]:
A[1] = np.array([10, 10, 10])

In [42]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [43]:
A[2] = 99

In [44]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [45]:
a = np.array([1, 2, 3, 4])

In [46]:
a.sum()

np.int64(10)

In [47]:
a.mean()

np.float64(2.5)

In [48]:
a.std()

np.float64(1.118033988749895)

In [49]:
a.var()

np.float64(1.25)

In [50]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [51]:
A.sum()

np.int64(45)

In [52]:
A.mean()

np.float64(5.0)

In [53]:
A.std()

np.float64(2.581988897471611)

In [54]:
A.sum(axis=0)

array([12, 15, 18])

In [55]:
A.sum(axis=1)

array([ 6, 15, 24])

In [56]:
A.mean(axis=0)

array([4., 5., 6.])

In [57]:
A.mean(axis=1)

array([2., 5., 8.])

In [58]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [59]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [61]:
a = np.arange(4)

In [62]:
a

array([0, 1, 2, 3])

In [63]:
a + 10

array([10, 11, 12, 13])

In [64]:
a * 10

array([ 0, 10, 20, 30])

In [None]:
a

In [65]:
a += 100

In [None]:
a

In [66]:
l = [0, 1, 2, 3]

In [67]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [68]:
a = np.arange(4)

In [69]:
a

array([0, 1, 2, 3])

In [None]:
b = np.array([10, 10, 10, 10])

In [None]:
a + b

In [None]:
a * b

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [70]:
a = np.arange(4)

In [71]:
a

array([0, 1, 2, 3])

In [72]:
a[[0, -1]]

array([0, 3])

In [73]:
a[[True, False, False, True]]

array([0, 3])

In [74]:
a >= 2

array([False, False,  True,  True])

In [75]:
a[a >= 2]

array([2, 3])

In [76]:
a.mean()

np.float64(1.5)

In [77]:
a[a > a.mean()]

array([2, 3])

In [78]:
a[~(a > a.mean())]

array([0, 1])

In [79]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [80]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [81]:
A = np.random.randint(100, size=(3, 3))

In [82]:
A

array([[41, 61, 10],
       [65, 75, 68],
       [27, 16, 56]])

In [86]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]  #Actually doing this takes alot of time, people usually type out the condition and then mask it back to the numpy array.

array([41, 10, 75, 27, 56])

In [87]:
A > 30

array([[ True,  True, False],
       [ True,  True,  True],
       [False, False,  True]])

In [88]:
A[A > 30]

array([41, 61, 65, 75, 68, 56])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [89]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [90]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [91]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [92]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [93]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [95]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [96]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [98]:
# An integer in Python is > 24bytes
sys.getsizeof(1) #Which mean you need 28 bytes to contain an interger 1

28

In [99]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [103]:
np.dtype(np.int8).itemsize

1

In [104]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [101]:
np.dtype(float).itemsize

8

### Lists are even larger

In [105]:
# A one-element list
sys.getsizeof([1])

64

In [106]:
# An array of one element in numpy
np.array([1]).nbytes

8

### And performance is also important

In [113]:
l = list(range(1000)) #This is a list (in python)

In [112]:
a = np.arange(1000) #This is a numpy array

In [111]:
%time np.sum(a ** 2) #It execute with much less time in numpy

CPU times: user 111 µs, sys: 0 ns, total: 111 µs
Wall time: 103 µs


np.int64(332833500)

In [109]:
%time sum([x ** 2 for x in l])

CPU times: user 157 µs, sys: 18 µs, total: 175 µs
Wall time: 247 µs


332833500

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random`

In [114]:
np.random.random(size=2)

array([0.93727873, 0.04297916])

In [115]:
np.random.normal(size=2)

array([-0.61360731, -0.83895094])

In [116]:
np.random.rand(2, 4)

array([[0.91818678, 0.01985932, 0.68388668, 0.38797139],
       [0.29939467, 0.83413884, 0.97645327, 0.99073055]])

---
### `arange`

In [117]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [118]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [119]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [120]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [121]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [122]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [123]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [124]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [None]:
np.zeros(5)

In [None]:
np.zeros((3, 3))

In [None]:
np.zeros((3, 3), dtype=np.int)

In [None]:
np.ones(5)

In [None]:
np.ones((3, 3))

In [None]:
np.empty(5)

In [None]:
np.empty((2, 2))

---
### `identity` and `eye`

In [125]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [126]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [127]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [128]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [129]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [130]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)