<a href="https://colab.research.google.com/github/LiShun522/tibame_1224/blob/main/01_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy

* 參考 Python DataScience Handbook Chapter 2
* NumPy 是一個用於數值計算的 Python 套件，提供了多種功能，特別適用於處理大規模的數據集合和矩陣運算。在科學計算、機器學習、數據分析等領域有著廣泛的應用。
* https://numpy.org/
* https://numpy.org/doc/stable/user/index.html

In [1]:
# 安裝套件
! pip install numpy



In [2]:
# 宣告並載入套件
import numpy as np
import random

# Numpy的速度

In [3]:
# 產生大量的測試用資料
x = random.sample(range(0, 1000000), 1000000)

### 使用Python原生程式進行大量計算所需時間

In [5]:
import statistics

In [6]:
%%time
statistics.mean(x)

CPU times: user 789 ms, sys: 0 ns, total: 789 ms
Wall time: 826 ms


499999.5

### 使用Numpy進行同樣計算所需的時間

In [7]:
nx = np.array(x)

In [8]:
%%time
nx.mean()

CPU times: user 2.47 ms, sys: 993 µs, total: 3.47 ms
Wall time: 3.39 ms


499999.5

# 建立Numpy Array

In [9]:
# 整數的陣列 array 是 C語言中的 list 稱為 陣列
nx1 = np.array([1,2,3,4,5])
nx1

array([1, 2, 3, 4, 5])

In [10]:
# 浮點數的陣列
# C語言中，陣列只能有數值單一形態，若有浮點數，會自動轉為 浮點數float
# 但包含 str字串 的話，會全部都變為 str
x = [1.0, 2, 3.2, 4, 5]
nx2 = np.array(x)
nx2

array([1. , 2. , 3.2, 4. , 5. ])

In [13]:
# 練習題
# 請建立一個 2~12之間全部偶數的陣列
import numpy as np
even_number = np.arange(2, 13, 2)
even_number

array([ 2,  4,  6,  8, 10, 12])

Python List 中每個元素都可以是不同的資料型態

In [14]:
type(x[0])

int

In [15]:
type(x[1])

int

但是Numpy底層是以原生的C語言array來儲存資料，所以必須同樣的資料型態

In [16]:
# 會這樣顯示是因為使用底層機器語言 bytes越多，越精準，但是越耗能
nx1.dtype

dtype('int64')

In [17]:
nx2.dtype

dtype('float64')

### 單一變數時 C語言與Python的儲存差異
![%E5%9C%96%E7%89%87.png](attachment:%E5%9C%96%E7%89%87.png)
圖片來源 Python Data Science Handbook




### 陣列資料下 Numpy 與 Python的儲存差異
![%E5%9C%96%E7%89%87.png](attachment:%E5%9C%96%E7%89%87.png)
圖片來源 Python Data Science Handbook




In [22]:
# 指定資料型態
x = [1,2,3,4,5]
nx = np.array(x,dtype='float64')
nx, nx.dtype

(array([1., 2., 3., 4., 5.]), dtype('float64'))

In [21]:
x = [1,2.6333,3,4,5]
nx = np.array(x,dtype='int64') # 強制被轉換，且沒有 4捨5入
nx, nx.dtype

(array([1, 2, 3, 4, 5]), dtype('int64'))

In [23]:
# 練習題
# 請修改下面的程式碼，讓nx陣列裡的數值都是整數
x = [1.2, 3.5, 4.2, 2.1]
nx = np.array(x)
nx

array([1.2, 3.5, 4.2, 2.1])

### Numpy的可用數值型態
資料來源 Python Data Science Handbook

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)|
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)|
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)|
| ``int8``      | Byte (-128 to 127)|
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)|
| ``uint8``     | Unsigned integer (0 to 255)|
| ``uint16``    | Unsigned integer (0 to 65535)|
| ``uint32``    | Unsigned integer (0 to 4294967295)|
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)|
| ``float_``    | Shorthand for ``float64``.|
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa|
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa|
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa|
| ``complex_``  | Shorthand for ``complex128``.|
| ``complex64`` | Complex number, represented by two 32-bit floats|
| ``complex128``| Complex number, represented by two 64-bit floats|

In [24]:
# 二維陣列
x = [[1,2,3], [4,5,6], [7,8,9]]
nx = np.array(x)
nx

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [25]:
# 產生固定數值的陣列， 時常用到預設
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [26]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [27]:
np.full(5, 3.1415)

array([3.1415, 3.1415, 3.1415, 3.1415, 3.1415])

In [28]:
# 產生固定數值得多維陣列
np.full((3,4), 2.17)

array([[2.17, 2.17, 2.17, 2.17],
       [2.17, 2.17, 2.17, 2.17],
       [2.17, 2.17, 2.17, 2.17]])

In [None]:
# 練習題
# 請產生一個 5 x 3全部值為 1 的陣列

In [29]:
#產生數列
np.arange(0, 30, 3)

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [30]:
#產生均分的數列
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [31]:
# 練習題
# 請產生 [0, 10, 20, 30, ...., 90, 100] 的陣列
x = np.arange(0, 100+1, 10 )
x

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

In [32]:
# 產生隨機數列
# random.random(size=None)
np.random.random((3, 3)) # 預設 數值在 0~1 之間

array([[0.59293245, 0.83880256, 0.49892759],
       [0.7522317 , 0.22679733, 0.22699851],
       [0.6986727 , 0.1523193 , 0.42452753]])

In [35]:
# 練習題
# 請產生一個 4x5的陣列 裡面的值都隨機的介於 0.2 ~ 0.5 之間
result_array = np.random.random((4, 5)) * 0.3 + 0.2
result_array

array([[0.48914265, 0.46139674, 0.22986846, 0.36310502, 0.30127705],
       [0.46613303, 0.28773296, 0.42265914, 0.25484352, 0.49527554],
       [0.26131834, 0.35548527, 0.2861921 , 0.31213122, 0.29758994],
       [0.43926763, 0.33365357, 0.31368043, 0.46635903, 0.44976782]])

In [34]:
# 練習題
# 請產生一個 4x5的陣列 裡面的值都隨機的介於 0.2 ~ 0.5 之間
result_array = np.random.uniform(low=0.2, high=0.5,size=(4, 5)) * 0.3 + 0.2
result_array

array([[0.26857285, 0.34309822, 0.29034538, 0.30903014, 0.2624314 ],
       [0.34242522, 0.28784416, 0.31341518, 0.29257304, 0.29075514],
       [0.30451848, 0.2833976 , 0.3208955 , 0.2679586 , 0.34269875],
       [0.30226295, 0.28729794, 0.34848966, 0.32313484, 0.34657858]])

In [36]:
#產生符合 高斯 分布的隨機數列
# 第一個參數 mean
# 第二個參數 sd
# random.normal(loc=0.0, scale=1.0, size=None)
np.random.normal(0, 1, (3, 3))

array([[-0.14349152,  1.28766096,  0.98141609],
       [-1.18328183, -0.24258016, -0.08766168],
       [ 0.78117749, -2.46366468, -0.85534567]])

![%E5%9C%96%E7%89%87-4.png](attachment:%E5%9C%96%E7%89%87-4.png)

In [40]:
# 練習題
# 假設男性平均身高為 170公分，標準差為5公分
# 請產生 7 個常態分佈下隨機的男性身高數值
np.random.normal(170, 5, (7))

array([167.75960066, 172.07777271, 167.72726178, 169.47181897,
       164.90979161, 164.40490143, 167.48201368])

In [41]:
# 產生指定範圍內的隨機整數數列
# random.randint(low, high=None, size=None, dtype=int)
np.random.randint(0, 10, (3, 3))

array([[0, 5, 8],
       [9, 8, 7],
       [5, 6, 6]])

In [57]:
# 練習題
# 請模擬一個6面骰子擲出 100 次的結果
np.random.randint(1, 7, (100), int)

array([4, 2, 4, 6, 4, 5, 1, 1, 6, 5, 6, 4, 4, 4, 4, 3, 2, 2, 2, 2, 4, 3,
       3, 1, 6, 5, 1, 5, 5, 6, 5, 5, 2, 1, 3, 1, 1, 6, 4, 6, 1, 6, 6, 5,
       5, 6, 6, 2, 4, 4, 5, 3, 4, 1, 3, 3, 6, 3, 3, 5, 2, 3, 4, 1, 5, 6,
       4, 2, 1, 5, 4, 5, 6, 1, 4, 2, 1, 4, 4, 1, 3, 6, 5, 1, 4, 5, 5, 2,
       4, 6, 3, 5, 3, 4, 6, 6, 2, 1, 3, 2])

In [55]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [58]:
# X * (1/X)

## Numpy數列 基本屬性

In [59]:
nx = np.random.randint(10, size=(3, 4))
nx

array([[0, 0, 2, 2],
       [0, 9, 5, 4],
       [3, 9, 4, 8]])

In [60]:
nx.ndim # 數列維度

2

In [61]:
nx.shape # 陣列大小

(3, 4)

In [62]:
nx.size

12

In [63]:
nx.itemsize

8

In [64]:
nx.nbytes

96

## Numpy取值

In [65]:
x = np.random.randint(0, 10, (3, 4))
x

array([[7, 3, 0, 3],
       [5, 7, 3, 3],
       [9, 9, 2, 4]])

In [66]:
x[0]

array([7, 3, 0, 3])

In [67]:
x[0,2]

0

In [68]:
x[-1]

array([9, 9, 2, 4])

In [69]:
x[1,1] = -4.5 # -4.5 會被強制轉換成 -4
x

array([[ 7,  3,  0,  3],
       [ 5, -4,  3,  3],
       [ 9,  9,  2,  4]])

In [70]:
x[1:3,2:4]

array([[3, 3],
       [2, 4]])

In [71]:
x[1:3,:]

array([[ 5, -4,  3,  3],
       [ 9,  9,  2,  4]])

In [77]:
# 練習題
# xt 為全班這次的考試成績，請取出前三名最高的成績
# 排序可以使用 np.sort(xt)
xt = np.random.randint(0, 100, 10)
print(xt)
xt = np.sort(xt)
print(xt)
xt[-3:]

[55 26 13 99 18  4 78 87 28 34]
[ 4 13 18 26 28 34 55 78 87 99]


array([78, 87, 99])

### 修改數列切片內容會影響到原本的數列

In [80]:
print(x)
y = x[1:3,2:4]
y

[[ 7  3  0  3]
 [ 5 -4  3  3]
 [ 9  9  2  4]]


array([[3, 3],
       [2, 4]])

In [81]:
# 修改切片出來的值 !!! 會直接把原始資料變更
y[0,0] = 1
y

array([[1, 3],
       [2, 4]])

In [82]:
#原本的數列跟著改變 !!!
x

array([[ 7,  3,  0,  3],
       [ 5, -4,  1,  3],
       [ 9,  9,  2,  4]])

### 如果不想要引響原本的資料，可以用複製

In [83]:
y2 = y.copy()
y2[0,0] = 7
y2

array([[7, 3],
       [2, 4]])

In [84]:
y

array([[1, 3],
       [2, 4]])

### 改變數列形狀

In [85]:
x = np.random.randint(0, 10, (3, 4))
x

array([[3, 8, 9, 9],
       [2, 1, 8, 6],
       [0, 0, 6, 7]])

In [86]:
y = x.reshape(12)
y

array([3, 8, 9, 9, 2, 1, 8, 6, 0, 0, 6, 7])

In [87]:
y = x.reshape(2, 6)
y

array([[3, 8, 9, 9, 2, 1],
       [8, 6, 0, 0, 6, 7]])

In [93]:
# 練習題
# 請分別將x改變成 4x3 跟 2x6 的矩陣
x = np.random.randint(0, 10, (3, 4))
print(x)
ans1 = x.reshape(4, 3)
print(ans1)
ans2 = x.reshape(2, 6)
print(ans2)

[[1 9 7 9]
 [5 4 2 6]
 [9 1 1 7]]
[[1 9 7]
 [9 5 4]
 [2 6 9]
 [1 1 7]]
[[1 9 7 9 5 4]
 [2 6 9 1 1 7]]


In [None]:
# 就算修改的數列的形狀，還是一樣會影響原先的資料
y[0] = -1
y

In [None]:
x

In [None]:
x.reshape((2,3,2))

In [None]:
#修改後的數列大小必須一樣
x.reshape((4,4))

## 數列結合

In [None]:
x1 = np.array([1,2,3])
x2 = np.array([4,5,6])
x3 = np.array([x1,x2])
x3

In [None]:
x1 = np.array([1,2,3])
x2 = np.array([4,5,6])
x3 = np.concatenate([x1,x2])
x3

In [None]:
# 修改concatenate後的數列
x3[0] = -1
x3

In [None]:
# 不會引響到原先的數列
x1

In [None]:
# 練習題
# 問題
# 使用下面方法結合成的x3，如果修改值會影響到x1嗎？
x1 = np.array([1,2,3])
x2 = np.array([4,5,6])
x3 = np.array([x1,x2])

In [None]:
# 多維數列的結合
x1 = np.array([[1, 2, 3], [4, 5, 6]])
x3 = np.concatenate([x1,x1])
x3

In [None]:
# 改變結合的方向(軸、維度)
x3 = np.concatenate([x1,x1], axis=1)
x3

In [None]:
# 練習題
# 請將x1跟x2結合成一個 2 x 2 x 7的矩陣
x1 = np.random.randint(0,10,(2,2,3))
print('x1 = ', x1)
x2 = np.random.randint(0,10,(2,2,4))
print('x1 = ', x2)


In [None]:
# 陣列形狀不同時無法用concatenate來結合
x1 = np.array([[1, 2, 3], [4, 5, 6]])
x2 = np.array([7, 8, 9])
print('x1 shape = ', x1.shape)
print('x2 shape = ', x2.shape)
x3 = np.concatenate([x1,x2])
x3

In [None]:
# 練習題
# 請問要如何把 x1 x2 結合成一個 3x3的矩陣？
# 提示 可以使用 reshape()
x1 = np.array([[1, 2, 3], [4, 5, 6]])
x2 = np.array([7, 8, 9])
print('x1 shape = ', x1.shape)
print('x2 shape = ', x2.shape)


In [None]:
x3 = np.vstack([x1,x2])
x3

In [None]:
x1 = np.array([[1, 2, 3], [4, 5, 6]])
x2 = np.array([[7], [8]])
print('x1 shape = ', x1.shape)
print('x2 shape = ', x2.shape)
x3 = np.hstack([x1,x2])
x3

## 數列分割

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
x1, x2, x3 = np.split(x, [3, 7])
print(x1, x2, x3)

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print('upper = \n',upper)
print('lower = \n',lower)

In [None]:
left, right = np.hsplit(grid, [2])
print('left = \n', left)
print('right = \n', right)

# Universal Functions

### 速度差異
即使使用Numpy來處理數據，根據程式寫法速度還是會有很大的差異

In [None]:
x = np.random.random(10000)

In [None]:
%%time
for i in range(100):
    y = np.empty(len(x))
    for n in range(len(x)):
        y[n] = 1/x[n]

In [None]:
%%time
for i in range(100):
    y = 1 / x


### 使用Numpy內建的函數來進行計算

In [None]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

In [None]:
# 四則運算
-(0.5*x + 1) ** 2

In [None]:
np.add(x, 2)

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

In [None]:
# 練習題
# 請計算出 x 數列的平均值跟標準差
# 提示 np.mean()
x = np.random.normal(0,1,100)
x

### 可以直接使用Python的函數

In [None]:
x = np.random.normal(0,1,10000)
x

In [None]:
# python原生函數
%timeit abs(x)
abs(x)

In [None]:
# numpy函數
%timeit np.abs(x)
np.abs(x)

更多的UFunc API參考文件<BR>
https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

# 索引(index)操作

In [None]:
x = np.arange(10)
x

In [None]:
# 這種取值的方法不好
[x[1],x[3],x[5]]

In [None]:
# 正確的索引取值方法
x[[1,3,5]]

In [None]:
# 透過變數傳遞索引
index = [1,3,5]
x[index]

In [None]:
# 練習題
# 找出 x 之中的偶數
x = np.random.randint(0,10,20)
print(x)


In [None]:
# 練習題
# 找出班上同學有達到及格分數的全部成績
score = np.random.randint(0,100,50)
print(score)

### 利用索引來改變值

In [None]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
x

In [None]:
x[i] -= 10
x

In [None]:
# 練習題
# 找出班上同學有達到及格分數的全部成績
# 請根據score的分數，在ispass上填上 '及格' 或是 '不及格'
score = np.random.randint(0,100,50)
print(score)
ispass = np.full(50, '不知道')
ispass

### 如果索引值重複可能會有不可預期的問題

In [None]:
x = np.zeros(10)
x

In [None]:
x[[0,0]] = [1,2]
x

In [None]:
x = np.zeros(10)
i = [2, 3, 3, 4, 4, 4]
x[i] += 1
x

In [None]:
x = np.zeros(10)
np.add.at(x, i, 1)
x

In [None]:
# 使用at()範例
np.random.seed(42)
x = np.random.randn(100)

# compute a histogram by hand
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)

# find the appropriate bin for each x
i = np.searchsorted(bins, x)

# add 1 to each of these bins
np.add.at(counts, i, 1)

import matplotlib.pyplot as plt
plt.step(bins, counts)

## 排序 Sort

In [None]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)
x

In [None]:
x.sort()
x

In [None]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
i

In [None]:
x[i]

### 多維的排序

In [None]:
rand = np.random.RandomState(42)
x = rand.randint(0, 10, (4, 6))
print(x)

In [None]:
np.sort(x)

In [None]:
#指定排序的軸
np.sort(x, axis=0)

## 練習題

In [None]:
# 練習題
# 請算出每位學生的五科總分
score = np.random.randint(0,100,(10,5))
score


In [None]:
# 練習題
# 請算出每位學生的五科平均
score = np.random.randint(0,100,(10,5))
score

# 自主學習內容
請參考 Python DataScience Handbook ch2.4之後的章節