## TensorFlow 模型建立與訓練
學習於[https://tf.wiki/zh_hans/basic/models.html](https://tf.wiki/zh_hans/basic/models.html)

In [1]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12350650930534084255
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4139778048
locality {
  bus_id: 1
  links {
  }
}
incarnation: 7084047757661007647
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5"
]


* 模型構建: **tf.keras.Model** 和 **tf.keras.layers**
* 模型的loss function: **tf.keras.losses**
* 模型的優化器：**tf.keras.optimizer**
* 模型的評估：**tf.keras.metrics**

In [5]:
import tensorflow as tf

* 計算 y_pred = a * X + b
* X(input):
||||
|---|---|---|
|1|2|3|
|4|5|6|
* y(output):
||
|--|
|10|
|20|

In [7]:
X = tf.constant([[1., 2., 3.], [4., 5., 6.]])
y = tf.constant([[10.], [20.]])

<img src="img/01.png">

* Kernel 和 bias 為層中可以訓練的變數
* **tf.keras.layers.Dense** 為 Fully-connected Layer    
    * units: output的tensor的維度
    * activation: activation function
        * 如果不指定activation function就為線性轉換(AW+b)
        * 還可以為 tf.nn.relu 、 tf.nn.tanh 和 tf.nn.sigmoid
    * use_bias: 是否加入bias
        * 預設為True
    * kernel_initializer 、 bias_initializer
        * weight(權重)矩陣和bias(偏移)向量的initializer
        * 預設為 tf.glorot_uniform_initializer
        * tf.zeros_initializer將變數初始化為0

In [8]:
class Linear(tf.keras.Model):
    """繼承keras的model"""
    def __init__(self):
        super().__init__()
        # Dense 為 Fully-connected Layer
        self.dense = tf.keras.layers.Dense(
            units=1,
            activation=None,
            kernel_initializer=tf.zeros_initializer(),
            bias_initializer=tf.zeros_initializer()
        )
    # 在tf.keras.Model只需overload call()，因為__call__會call call()
    def call(self, input):
        output = self.dense(input)
        return output

In [14]:
model = Linear()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
for i in range(10000):
    with tf.GradientTape() as tape:
        y_pred = model(X)
        # tf.reduce_mean為取tensor裡element的平均
        # tf.square為將tensor裡的element都平方
        loss = tf.reduce_mean(tf.square(y_pred-y))
    # model.variables為取得model中所有的變數(kernel與bias)
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
print(model.variables)

[<tf.Variable 'linear_5/dense_7/kernel:0' shape=(3, 1) dtype=float32, numpy=
array([[6.0670209e-06],
       [1.1111156e+00],
       [2.2222154e+00]], dtype=float32)>, <tf.Variable 'linear_5/dense_7/bias:0' shape=(1,) dtype=float32, numpy=array([1.111109], dtype=float32)>]
