<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library (Thư viện tính toán số)

NumPy (Numerical Python) is one of the 'core packages' (Những gói cốt lõi) for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major 'contributions' (những đóng góp) are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)



In [4]:
import sys
import numpy as np

## Creating Numpy Arrays from Python Lists
Tạo mảng Numpy từ danh sách Python

In [7]:
# Tạo mảng với phần tử là 1, 2, 3, 4
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

Unlike Python lists, NumPy is 'constrained' : 'ràng buộc' to arrays that all contain the same type (Cùng loại dữ liệu). If types do not 'match': khớp, NumPy will upcast if possible (here, integers are up-cast to floating point)

In [11]:
np.array([3.14, 4, 2, 3])
#Khi có vừa float vừa int thì tất cả trong phần tử sẽ chuyển sang float

array([3.14, 4.  , 2.  , 3.  ])

In [13]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Unlike Python lists, NumPy arrays can explicitly be **multi-dimensional** (Đa chiều)

In [16]:
[range(i, i + 3) for i in [2, 4, 6]]

[range(2, 5), range(4, 7), range(6, 9)]

In [18]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

## Creating Arrays from Scratch
Tạo mảng từ đầu

### `zeros`, `ones`, `full`, `arange`, `linspace`

In [22]:
np.zeros(10, dtype=int)
#Tạo mảng size là 10, 
#Zeros() : các element = 0

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [24]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [28]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [30]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [34]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 6)

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

### `random` 

In [36]:
np.random.seed(0)  
# seed for reproducibility
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276]])

In [38]:
# Tạo một mảng 3x3 gồm các giá trị ngẫu nhiên có phân phối chuẩn
# với giá trị trung bình là 0 và độ lệch chuẩn là 1
np.random.normal(0, 1, (3, 3))

array([[ 1.26611853, -0.50587654,  2.54520078],
       [ 1.08081191,  0.48431215,  0.57914048],
       [-0.18158257,  1.41020463, -0.37447169]])

In [40]:
np.random.randint(10, size=3)

array([0, 1, 9])

In [42]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[9, 0, 4],
       [7, 3, 2],
       [7, 2, 0]])

In [44]:
#numpy.random.random: đối số hình dạng là một bộ dữ liệu duy nhất.
np.random.random((3,5))
np.random.rand(3,5)

array([[0.36371077, 0.57019677, 0.43860151, 0.98837384, 0.10204481],
       [0.20887676, 0.16130952, 0.65310833, 0.2532916 , 0.46631077],
       [0.24442559, 0.15896958, 0.11037514, 0.65632959, 0.13818295]])

### `eye`, `empty`

In [46]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [48]:
np.eye(3, dtype='int8')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int8)

In [52]:
# Tạo một mảng 5 số nguyên chưa được khởi tạo
# Các giá trị sẽ là bất kỳ giá trị nào đã tồn tại ở vị trí bộ nhớ đó
np.empty(5)

array([0.2, 0.4, 0.6, 0.8, 1. ])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## NumPy Array Attributes

In [58]:
x3 = np.random.randint(10, size=(3, 4, 5))
# Three-dimensional array : mảng 3 chiều
x3

array([[[9, 7, 3, 2, 3],
        [9, 7, 7, 5, 1],
        [2, 2, 8, 1, 5],
        [8, 4, 0, 2, 5]],

       [[5, 0, 8, 1, 1],
        [0, 3, 8, 8, 4],
        [4, 0, 9, 3, 7],
        [3, 2, 1, 1, 2]],

       [[1, 4, 2, 5, 5],
        [5, 2, 5, 7, 7],
        [6, 1, 6, 7, 2],
        [3, 1, 9, 5, 9]]])

In [60]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [62]:
print("dtype:", x3.dtype)

dtype: int32


- `itemsize`, liệt kê kích thước (tính bằng byte) của từng phần tử mảng và 
- `nbytes`, liệt kê tổng kích thước (tính bằng byte) của mảng

In [64]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 4 bytes
nbytes: 240 bytes


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Indexing & Slicing
### One-dimensional subarray

In [68]:
x1 = np.random.randint(20, size = 6) # One-dimensional array

In [70]:
x1

array([18,  0,  9, 11, 17,  9])

In [72]:
x1[4], x1[-1]

(17, 9)

### Slicing:
`x[start:stop:step]`

In [78]:

x1[:3] #First 3 Element

array([18,  0,  9])

In [82]:
x1[4:] 

array([17,  9])

In [84]:
x1[::2]  # every other element, every 2 step

array([18,  9, 17])

### Multi-dimensional array

In [86]:
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array

In [88]:
x2

array([[0, 6, 0, 4],
       [8, 4, 3, 3],
       [8, 8, 7, 0]])

In [90]:
x2[2,0]

8

In [92]:
x2[2,0] = 11
# Gán vị trí x2[h2, c0] = 11

In [94]:
x2

array([[ 0,  6,  0,  4],
       [ 8,  4,  3,  3],
       [11,  8,  7,  0]])

In [96]:
x2[:2, :3]  # two rows, three columns

array([[0, 6, 0],
       [8, 4, 3]])

In [98]:
print(x2[:, 0])  # first column of x2

[ 0  8 11]


In [100]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [106]:
x = np.array([1, 2, 3])
x.shape

(3,)

In [104]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Concatenation and Splitting
Nối và tách mảng

In [109]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
#concatenate

array([1, 2, 3, 3, 2, 1])

In [111]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [113]:
#nối dọc theo trục đầu tiên
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [115]:
# nối dọc theo trục thứ hai (không có chỉ mục)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [117]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vstack() : xếp chồng các mảng theo chiều dọc
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [119]:
# hstack() : xếp chồng các mảng theo chiều ngang
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

### Splitting of arrays

In [121]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [123]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [125]:
a = np.array([1, 2, 3, 4])

In [129]:
print('sum =',a.sum())

sum = 10


In [131]:
print('Gía trị trung bình: mean =',a.mean())

Gía trị trung bình: mean = 2.5


In [133]:
a.std()

1.118033988749895

In [139]:
a.var()

1.25

In [143]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [145]:
A.sum()

45

In [147]:
A.mean()

5.0

In [149]:
A.std()

2.581988897471611

In [151]:
#Tông các cột
A.sum(axis=0)

array([12, 15, 18])

In [153]:
#Tổng các hàng
#axis = 1: Hàng; = 0: Cột
A.sum(axis=1)

array([ 6, 15, 24])

In [155]:
A.mean(axis=0)

array([4., 5., 6.])

In [157]:
A.mean(axis=1)

array([2., 5., 8.])

In [159]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [161]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations
Hoạt động phát sóng và Vector hóa

Broadcasting is simply 'a set of rules' : 'tập hợp có nguyên tắc' for applying binary ufuncs (e.g., addition (+), subtraction(-), multiplication(x), etc.) on arrays of different sizes.

![image-broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

In [None]:
a = np.arange(3)

In [164]:
a

array([1, 2, 3, 4])

In [166]:
a + 5 #Broadcasting & Vectorized operations
# +5 từng element

array([6, 7, 8, 9])

In [168]:
a * 10
# x10 từng element

array([10, 20, 30, 40])

In [170]:
a

array([1, 2, 3, 4])

In [172]:
a += 100

In [174]:
a

array([101, 102, 103, 104])

In [178]:
l = [0, 1, 2, 3]

In [180]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [None]:
a = np.arange(4)

In [182]:
a

array([101, 102, 103, 104])

In [186]:
b = np.array([10, 10, 10, 10])

In [188]:
b

array([10, 10, 10, 10])

In [190]:
a + b

array([111, 112, 113, 114])

In [192]:
a * b

array([1010, 1020, 1030, 1040])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Sorting Arrays

np.sort uses an quicksort algorithm


In [194]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

In [196]:
#Một hàm liên quan là argsort, 
#thay vào đó hàm này trả về các chỉ mục của phần tử được sắp xếp
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


### Sorting along rows or columns
NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument

In [198]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]


In [200]:
# sort each column of X
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [202]:
# sort each row of X
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

### Partial Sorts: Partitioning
Sắp xếp một phần: Phân vùng

In [205]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)

array([2, 1, 3, 4, 6, 5, 7])

In [207]:
np.partition(X, 2, axis=1)

array([[3, 4, 6, 7, 6, 9],
       [2, 3, 4, 7, 6, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 9, 5]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra - Đại số tuyến tính

In [212]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [214]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [216]:
#cách 1
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [218]:
#Cách 2
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [220]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [226]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [228]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])