<a href="https://colab.research.google.com/github/HHL43/Generative-AI-HW/blob/main/AI_hw02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 手寫辯識
* 使用CNN+Dense
* loss='categorical_crossentropy'
* optimizer='adamW'
* 正確率 0.99

In [142]:
!pip install gradio



In [143]:
%matplotlib inline

# 標準數據分析、畫圖套件
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# 神經網路方面
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import AdamW
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# 互動設計用
from ipywidgets import interact_manual

# 神速打造 web app 的 Gradio
import gradio as gr

In [144]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 28, 28)/255
x_test = x_test.reshape(10000, 28, 28)/255
x_train = np.expand_dims(x_train, axis=-1)  # (60000, 28, 28) → (60000, 28, 28, 1)
x_test = np.expand_dims(x_test, axis=-1)

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

In [145]:
n = 87
y_train[n]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1.])

In [146]:
model = Sequential()
# 第一個卷積層：32 個 3×3 Filters
model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)))

# 池化層（降低維度）
model.add(MaxPooling2D(pool_size=(2,2)))

# 第二個卷積層：64 個 3×3 Filters
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))

# 再次池化
model.add(MaxPooling2D(pool_size=(2,2)))

# 展平成向量後，進入全連接層
model.add(Flatten())

# 全連接層
model.add(Dense(128, activation='relu'))

# 輸出層（假設是 10 類別分類）
model.add(Dense(10, activation='softmax'))

model.summary()

loss='categorical_crossentropy'：使用one-hot編碼標籤，可以選擇這個損失函數。
optimizer='adam': 這是進行數字圖形辨識時最推薦的優化器，適合各種場景，並且對學習率的選擇較為不敏感。
learning_rate=0.001（對於 Adam 優化器）：
Adam 優化器通常在 learning_rate=0.001 下表現良好，這是深度學習中經常選擇的默認學習率。

In [149]:
model.compile(loss='categorical_crossentropy', optimizer=AdamW(learning_rate=0.001), metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=100 )

Epoch 1/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9819 - loss: 0.0597
Epoch 2/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9877 - loss: 0.0375
Epoch 3/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9935 - loss: 0.0219
Epoch 4/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9942 - loss: 0.0177
Epoch 5/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9962 - loss: 0.0131
Epoch 6/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9968 - loss: 0.0099
Epoch 7/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9978 - loss: 0.0069
Epoch 8/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9975 - loss: 0.0077
Epoch 9/10
[1m600/600[0m [32m━━━━━━━━

<keras.src.callbacks.history.History at 0x7e92eb605190>

In [150]:
loss, acc = model.evaluate(x_test, y_test)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9890 - loss: 0.0380


In [151]:
print(f"測試資料正確率 {acc*100:.2f}%")

測試資料正確率 99.15%


我們 "predict" 放的是我們神經網路的學習結果。做完之後用 argmax 找到數值最大的那一項。

In [152]:
predict = np.argmax(model.predict(x_test), axis=-1)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step


In [153]:
predict

array([7, 2, 1, ..., 4, 5, 6])

不要忘了我們的 `x_test` 每筆資料已經換成 784 維的向量, 我們要整型回 28x28 的矩陣才能當成圖形顯示出來!

In [154]:
def test(測試編號):
    plt.imshow(x_test[測試編號].reshape(28,28), cmap='Greys')
    print('神經網路判斷為:', predict[測試編號])

In [155]:
interact_manual(test, 測試編號=(0, 9999));

interactive(children=(IntSlider(value=4999, description='測試編號', max=9999), Button(description='Run Interact', …

到底測試資料總的狀況如何呢? 我們可以給我們神經網路「總評量」。

In [156]:
score = model.evaluate(x_test, y_test)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9890 - loss: 0.0380


In [157]:
print('loss:', score[0])
print('正確率', score[1])

loss: 0.03118239715695381
正確率 0.9915000200271606


### 7. 用 Gradio 來展示

In [None]:
def resize_image(inp):
    # 圖在 inp["layers"][0]
    image = np.array(inp["layers"][0], dtype=np.float32)
    image = image.astype(np.uint8)

    # 轉成 PIL 格式
    image_pil = Image.fromarray(image)

    # Alpha 通道設為白色, 再把圖從 RGBA 轉成 RGB
    background = Image.new("RGB", image_pil.size, (255, 255, 255))
    background.paste(image_pil, mask=image_pil.split()[3]) # 把圖片粘貼到白色背景上，使用透明通道作為遮罩
    image_pil = background

    # 轉換為灰階圖像
    image_gray = image_pil.convert("L")

    # 將灰階圖像縮放到 28x28, 轉回 numpy array
    img_array = np.array(image_gray.resize((28, 28), resample=Image.LANCZOS))

    # 配合 MNIST 數據集
    img_array = 255 - img_array

    # 拉平並縮放
    img_array = img_array.reshape(1, 784) / 255.0

    return img_array

In [None]:
def recognize_digit(inp):
    img_array = resize_image(inp)
    prediction = model.predict(img_array).flatten()
    labels = list('0123456789')
    return {labels[i]: float(prediction[i]) for i in range(10)}

In [None]:
iface = gr.Interface(
    fn=recognize_digit,
    inputs=gr.Sketchpad(),
    outputs=gr.Label(num_top_classes=3),
    title="MNIST 手寫辨識",
    description="請在畫板上繪製數字"
)

iface.launch(share=True, debug=True)