<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [24]:
import numpy as np

## <b> Creating numpy arrays from python List

In [25]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [26]:
[3.14, 4, 2, 3] # List python

[3.14, 4, 2, 3]

In [27]:
np.array([3.14, 2, 3, 4]) # Numpy array sẽ chuyển các item thành float
# numpy nó sẽ chuyển kiểu dữ liệu từ thấp sang cao cụ thể là interger sang float

array([3.14, 2.  , 3.  , 4.  ])

In [28]:
np.array([3, 2, 3, 4], dtype = "float32")

array([3., 2., 3., 4.], dtype=float32)

In [29]:
a1 = np.array([1, 2, 3, 4])

In [30]:
type(a1)

numpy.ndarray

In [31]:
a2 = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [32]:
type(a2)

numpy.ndarray

In [33]:
a2.shape

(2, 3)

In [34]:
a2.ndim

2

In [35]:
a2.size

6

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## <b> Creating numpy arrays from scratch

### <b> zeros, ones, full, arrange, linespace

In [36]:
np.zeros((2, 4), dtype="int32")

array([[0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int32)

In [37]:
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [38]:
# Create an array filled with a linear sequence
# Starting at 0 ending at 20, stepping by 2.
# (This is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [39]:
np.full((3, 5), 6.9)

array([[6.9, 6.9, 6.9, 6.9, 6.9],
       [6.9, 6.9, 6.9, 6.9, 6.9],
       [6.9, 6.9, 6.9, 6.9, 6.9]])

In [40]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### <b> random

In [41]:
np.random.random((4, 4))

array([[7.29589830e-02, 6.18297619e-01, 5.11216056e-01, 2.29895619e-01],
       [1.88384292e-01, 9.59294688e-01, 6.55825273e-04, 3.26072260e-01],
       [8.59009093e-01, 9.26942395e-01, 5.79534177e-02, 4.45192391e-01],
       [9.35522809e-01, 9.31969963e-01, 2.98652750e-01, 4.45921828e-02]])

In [42]:
np.random.random((4, 4))

array([[0.5239957 , 0.1175687 , 0.10716581, 0.07711544],
       [0.99698404, 0.05724748, 0.6553778 , 0.69008679],
       [0.66341729, 0.23386164, 0.88719276, 0.24740501],
       [0.39293621, 0.91246331, 0.74823733, 0.42055716]])

In [43]:
# Seed for reproducibility
np.random.seed(0)
np.random.random((4, 4))

array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318],
       [0.4236548 , 0.64589411, 0.43758721, 0.891773  ],
       [0.96366276, 0.38344152, 0.79172504, 0.52889492],
       [0.56804456, 0.92559664, 0.07103606, 0.0871293 ]])

In [None]:
# np.random.normal(loc=0.0, scale=1.0, size=None)
# loc: Giá trị trung bình (mean) của phân phối.
# scale: Độ lệch chuẩn (standard deviation).
# size: Kích thước của mảng kết quả.
np.random.normal(0, 1, (3, 3))

array([[ 0.04575852, -0.18718385,  1.53277921],
       [ 1.46935877,  0.15494743,  0.37816252],
       [-0.88778575, -1.98079647, -0.34791215]])

In [47]:
# np.random.randint(low, high=None, size=None)
# low: Giá trị nhỏ nhất (bao gồm).
# high: Giá trị lớn nhất (không bao gồm).
# size: Kích thước mảng kết quả.
np.random.randint(0, 10, (4, 5))

array([[5, 5, 0, 1, 5],
       [9, 3, 0, 5, 0],
       [1, 2, 4, 2, 0],
       [3, 2, 0, 7, 5]], dtype=int32)

In [None]:
# hàm np.random.rand() dùng để sinh ra các số thực ngẫu nhiên trong khoảng [0, 1) với phân phối đều (uniform distribution).
# np.random.rand(d0, d1, ..., dn)
# d0, d1, ..., dn: Kích thước của mảng kết quả.
np.random.rand(2, 3)

array([[0.73926358, 0.03918779, 0.28280696],
       [0.12019656, 0.2961402 , 0.11872772]])

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## <b> Array indexing and Slicing

### <b> One-dimensional array

In [50]:
x1 = np.random.randint(20, size = 6)

In [51]:
x1

array([17, 18, 14,  9,  1,  4], dtype=int32)

In [53]:
x1[4], x1[0], x1[-1]

(np.int32(1), np.int32(17), np.int32(4))

### <b> Multi-dimensional array

In [54]:
x2 = np.random.randint(10, size = (3, 4))

In [55]:
x2

array([[6, 8, 2, 3],
       [0, 0, 6, 0],
       [6, 3, 3, 8]], dtype=int32)

In [57]:
x2[0, 2]

np.int32(2)

In [58]:
x2[1, 2] = 9
x2

array([[6, 8, 2, 3],
       [0, 0, 9, 0],
       [6, 3, 3, 8]], dtype=int32)

### <b> Slicing
x[start:stop:step]

In [59]:
x1

array([17, 18, 14,  9,  1,  4], dtype=int32)

In [60]:
x1[0:3]

array([17, 18, 14], dtype=int32)

In [61]:
x1[2:4]

array([14,  9], dtype=int32)

In [62]:
# Every other element, every 2 step
x1[::2]

array([17, 14,  1], dtype=int32)

In [63]:
x2

array([[6, 8, 2, 3],
       [0, 0, 9, 0],
       [6, 3, 3, 8]], dtype=int32)

In [64]:
# two rows, thress columns
x2[:2, :3]

array([[6, 8, 2],
       [0, 0, 9]], dtype=int32)

In [65]:
x2[:, :2]

array([[6, 8],
       [0, 0],
       [6, 3]], dtype=int32)

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### <b> Reshaping of arrays and transpose

In [76]:
# np.arange(start, stop, step, dtype=None)
# start: Giá trị bắt đầu (mặc định là 0)
# stop: Giá trị kết thúc (không bao gồm)
# step: Bước nhảy giữa các giá trị (mặc định là 1)
# dtype: Kiểu dữ liệu (tùy chọn)
grid = np.arange(1, 10)
grid

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [77]:
grid.shape # Trả về 9 hàng

(9,)

In [78]:
grid.reshape((3,3))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [79]:
x = np.array([1, 2, 3])

In [80]:
x.shape

(3,)

In [81]:
x.reshape((1, 3)).shape

(1, 3)

In [86]:
# Transpose
x = grid.reshape((3,3))
x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [85]:
x.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## <b> Array concatenation and Splitting

### <b> Array concatenation

In [87]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

In [89]:
np.concatenate((x, y))

array([1, 2, 3, 4, 5, 6])

In [90]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [96]:
x=np.concatenate((grid, grid)) # axis= 0 by default
x

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [95]:
x.shape

(4, 3)

In [97]:
y = np.concatenate((grid, grid), axis=1)
y

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [98]:
y.shape

(2, 6)

In [99]:
#vstack
x= np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

In [100]:
np.vstack((x, grid))

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [102]:
y

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [None]:
# hstack
np.hstack((y, grid))

array([[1, 2, 3, 1, 2, 3, 9, 8, 7],
       [4, 5, 6, 4, 5, 6, 6, 5, 4]])

### <b> Splitting array

In [104]:
x = np.array([1, 2, 3, 99, 69, 3, 2, 1])
x

array([ 1,  2,  3, 99, 69,  3,  2,  1])

In [109]:
x1, x2, x3 = np.split(x, [3, 5])

In [106]:
x1

array([17, 18, 14,  9,  1,  4], dtype=int32)

In [110]:
x2

array([99, 69])

In [111]:
x3

array([3, 2, 1])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

![image-broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

In [112]:
a = np.arange(3)
a

array([0, 1, 2])

In [113]:
a + 5 # Broadcasting

array([5, 6, 7])

In [115]:
b = np.ones((3, 3), dtype="int")
b

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [117]:
a + b

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [119]:
c = np.arange(3).reshape((3, 1))
c

array([[0],
       [1],
       [2]])

In [121]:
a + c

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Manipulating and Comparing Arrays

### Aggregation
Aggregation = performing the same operation on a number of things.


In [122]:
listNumber = [1, 2, 3, 5]

In [123]:
ll = np.array(listNumber)
ll

array([1, 2, 3, 5])

In [124]:
sum(ll)

np.int64(11)

In [125]:
np.sum(ll)

np.int64(11)

In [128]:
# Create a massive numpy array
massive_array = np.random.random(10000)
massive_array[:5]
massive_array.shape

(10000,)

In [129]:
%timeit sum(massive_array) # python built-in function sum()
%timeit np.sum(massive_array) # Numpy's np.sum()

787 μs ± 3.78 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
4.73 μs ± 9.21 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [130]:
np.mean(massive_array)

np.float64(0.5052346124189445)

In [131]:
np.max(massive_array)

np.float64(0.999685002639177)

In [132]:
np.min(massive_array)

np.float64(0.0002078643980524264)

In [133]:
dogHeight = [600, 470, 170, 430, 300]
dogHeight = np.array(dogHeight)
np.std(dogHeight)

np.float64(147.32277488562318)

In [134]:
np.var(dogHeight)

np.float64(21704.0)

In [135]:
np.sqrt(np.var(dogHeight))

np.float64(147.32277488562318)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Sorting Arrays


In [136]:
x = np.array([2, 1, 4, 3, 5])

In [137]:
np.sort(x)

array([1, 2, 3, 4, 5])

In [138]:
# A related function is argsort, which instead returns the indices of the sorted elements
np.argsort(x)

array([1, 0, 3, 2, 4])

### Sorting along rows or columns
NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument

In [144]:
rand = np.random.RandomState(42)
matA = rand.randint(0, 10, (4, 6))
matA

array([[6, 3, 7, 4, 6, 9],
       [2, 6, 7, 4, 3, 7],
       [7, 2, 5, 4, 1, 7],
       [5, 1, 4, 0, 9, 5]], dtype=int32)

In [None]:
np.sort(matA, axis=0) # sort by column

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]], dtype=int32)

In [None]:
np.sort(matA, axis=1) # sort by row

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]], dtype=int32)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [149]:
A = np.array([[1, 2, 3],
             [4, 5, 6],
             [7, 8, 9]])

In [151]:
B = np.array([[6, 5],
             [4, 3],
             [2, 1]])

In [None]:
# A (3x3) dot product (tích vô hướng) B (3x2)
# Kết quả sẽ là 1 ma trận 3x2
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [153]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [154]:
# B (3x2) dot A (3x3)
# Transpose B thành (2x3)
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [155]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

### Dot product (Tích vô hướng) Example

In [159]:
# Number of jar solds
np.random.seed(0)
salesAmount = np.random.randint(20, size = (5, 3))
salesAmount

array([[12, 15,  0],
       [ 3,  3,  7],
       [ 9, 19, 18],
       [ 4,  6, 12],
       [ 1,  6,  7]], dtype=int32)

In [160]:
import pandas as pd

weeklySales = pd.DataFrame(salesAmount, 
                           index=["Mon", "Tues", "Wed", "Thurs", "Fri"],
                           columns= ["Almond Butter", "Peanut Butter", "Cashew Butter"])
weeklySales

Unnamed: 0,Almond Butter,Peanut Butter,Cashew Butter
Mon,12,15,0
Tues,3,3,7
Wed,9,19,18
Thurs,4,6,12
Fri,1,6,7


In [161]:
# Create price array
prices = np.array([10, 8, 12])

In [162]:
butterPrice = pd.DataFrame(prices.reshape(1, 3), index=["Price"], columns= ["Almond Butter", "Peanut Butter", "Cashew Butter"])
butterPrice

Unnamed: 0,Almond Butter,Peanut Butter,Cashew Butter
Price,10,8,12


In [164]:
weeklySales.shape, butterPrice.T.shape

((5, 3), (3, 1))

In [165]:
totalPrice = weeklySales.dot(butterPrice.T)
totalPrice

Unnamed: 0,Price
Mon,240
Tues,138
Wed,458
Thurs,232
Fri,142


In [166]:
weeklySales["Total Price"] = totalPrice
weeklySales

Unnamed: 0,Almond Butter,Peanut Butter,Cashew Butter,Total Price
Mon,12,15,0,240
Tues,3,3,7,138
Wed,9,19,18,458
Thurs,4,6,12,232
Fri,1,6,7,142
