<a href="https://colab.research.google.com/github/2653319/book-example/blob/main/RNN%E8%8B%B1%E6%96%87%E6%96%87%E5%AD%97%E7%94%9F%E6%88%90.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Text generation with an RNN

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/text/tutorials/text_generation"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/text_generation.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/text/blob/master/docs/tutorials/text_generation.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/text/docs/tutorials/text_generation.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

This tutorial demonstrates how to generate text using a character-based RNN. You will work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data ("Shakespear"), train a model to predict the next character in the sequence ("e"). Longer sequences of text can be generated by calling the model repeatedly.

Note: Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware accelerator > GPU*.

This tutorial includes runnable code implemented using [tf.keras](https://www.tensorflow.org/guide/keras/sequential_model) and [eager execution](https://www.tensorflow.org/guide/eager). The following is the sample output when the model in this tutorial trained for 30 epochs, and started with the prompt "Q":

<pre>
QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m
</pre>


雖然有些句子是合乎語法的，但大多數都沒有意義。該模型尚未學習單詞的含義，但考慮：

* 該模型是基於字符的。訓練開始時，模型不知道如何拼寫一個英文單詞，或者那個單詞甚至是一個文本單元。

* 輸出的結構類似於戲劇——文本塊通常以說話者姓名開頭，所有大寫字母與數據集相似。

* 如下所示，該模型在小批量文本（每個 100 個字符）上進行訓練，並且仍然能夠生成具有連貫結構的更長文本序列。

## Setup

### Import TensorFlow and other libraries

In [None]:
import tensorflow as tf

import numpy as np
import os
import time


### 下載莎士比亞數據集

更改以下行以在您自己的數據上運行此代碼。

In [None]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


### 讀取數據

首先，看正文：

##text

In [None]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [None]:
# 查看文本中的前 250 個字符
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [None]:
# 文件中的唯一字符
vocab = sorted(set(text))  #set 將text中 重複的字元刪除  sorted排列
print(f'{len(vocab)} unique characters')  #vocab = text裡不重複的全部字元

65 unique characters


## 處理文本

### 向量化文本

在訓練之前，您需要將字符串轉換為數字表示。

`tf.keras.layers.StringLookup` 層可以將每個字符轉換為數字 ID。它只需要首先將文本拆分為標記。

In [None]:
example_texts = ['abcdefg', 'xyz']

chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

現在創建 `tf.keras.layers.StringLookup` 層：

In [None]:
ids_from_chars = tf.keras.layers.StringLookup(  #根據字典 把字元轉換成標誌
    vocabulary=list(vocab), mask_token=None)


它將標記轉換為字符 ID：





In [None]:
ids = ids_from_chars(chars)  #由於多一個unk標記  用來標記未出現在字典單字  因此陣列向後增加一ㄍ

ids

<tf.RaggedTensor [[40, 41, 42, 43, 44, 45, 46], [63, 64, 65]]>

In [None]:
ids_from_chars.get_vocabulary()

['[UNK]',
 '\n',
 ' ',
 '!',
 '$',
 '&',
 "'",
 ',',
 '-',
 '.',
 '3',
 ':',
 ';',
 '?',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

由於本教程的目標是生成文本，因此反轉此表示並從中恢復人類可讀的字符串也很重要。為此，您可以使用 `tf.keras.layers.StringLookup(..., invert=True)`。

注意：這裡不是傳遞使用 `sorted(set(text))` 生成的原始詞彙表，而是使用 `tf.keras.layers.StringLookup` 層的 `get_vocabulary()` 方法，以便 `[UNK]` 標記是設置方法相同。

In [None]:
chars_from_ids = tf.keras.layers.StringLookup(    #將標誌轉回字元  vocabulary設定字典 invert反轉開啟
  #ids_from_chars.get_vocabulary()取得ids_from_chars字典  因為考慮到UNK標誌問題 所以直接取用  mask_token
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)


該層從 ID 的向量中恢復字符，並將它們作為字符的 `tf.RaggedTensor` 返回：

In [None]:
chars = chars_from_ids(ids) #將標誌轉回字元
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

您可以 `tf.strings.reduce_join` 將字符重新連接成字符串。

In [None]:
tf.strings.reduce_join(chars, axis=-1).numpy()  #合併

array([b'abcdefg', b'xyz'], dtype=object)

In [None]:
def text_from_ids(ids):  #合併
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)


### 預測任務

給定一個字符或一個字符序列，最可能的下一個字符是什麼？這是您訓練模型執行的任務。模型的輸入將是一系列字符，您訓練模型以預測輸出——每個時間步的下一個字符。

由於 RNN 維持一個依賴於先前看到的元素的內部狀態，給定直到此刻計算的所有字符，下一個字符是什麼？


### 創建訓練示例和目標

接下來將文本分成示例序列。每個輸入序列都將包含文本中的“seq_length”字符。

對於每個輸入序列，對應的目標包含相同長度的文本，除了向右移動一個字符。

因此，將文本分解為 `seq_length+1` 的塊。例如，假設 `seq_length` 是 4，我們的文本是“Hello”。輸入序列是“Hell”，目標序列是“ello”。

為此，首先使用 `tf.data.Dataset.from_tensor_slices` 函數將文本向量轉換為字符索引流。

In [None]:
tf.strings.unicode_split(text, 'UTF-8') #一個一個字元 切割text

<tf.Tensor: shape=(1115394,), dtype=string, numpy=array([b'F', b'i', b'r', ..., b'g', b'.', b'\n'], dtype=object)>

##all_ids  
all_ids = 把text切割後轉換成標誌

In [None]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8')) #透過ids_from_chars 將切割後字元 轉換成標誌
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

##ids_dataset

把all_ids 一ㄍ一ㄍ分開

In [None]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids) #一個一個分開

tf.data.Dataset.from_tensor_slices()測試

In [None]:
dataset = tf.data.Dataset.from_tensor_slices([8, 3, 0, 8, 2, 1])

In [None]:
for i in dataset:
  print(i.numpy())

8
3
0
8
2
1


In [None]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

F
i
r
s
t
 
C
i
t
i


In [None]:
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

##sequences
將ids_dataset根據seq_length+1 去分批 (每101個一批)

`batch` 方法可以讓您輕鬆地將這些單個字符轉換為所需大小的序列。

In [None]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True) #依據seq_length去分批id_dataset 
#drop_remainder  表示在最後一批元素少於元素的情況下是否應刪除;默認行為是不刪除較小的批

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


如果您將標記重新加入字符串，則更容易看到這是在做什麼：

In [None]:
for seq in sequences.take(5):  #將標誌轉換成文字  然後合併
  print(text_from_ids(seq).numpy())

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'



對於訓練，您需要一個“（輸入，標籤）”對的數據集。其中“輸入”和
`label` 是序列。在每個時間步，輸入是當前字符，標籤是下一個字符。

這是一個函數，它將序列作為輸入，複製並移動它以對齊每個時間步的輸入和標籤：

##split_input_target
將輸入的句子  頭尾各切出 分成2句

In [None]:
def split_input_target(sequence):  #將句子分成 切出最後一個字和  最前面一個字 兩種
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [None]:
split_input_target(list("Tensorflow"))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

##dataset
sequences為 分批後句子

∇
dataset = 把sequences裡(分批後 每批101ㄍ)的全部句子 把頭尾各切掉 分兩句

In [None]:
dataset = sequences.map(split_input_target)  #map()執行map裡面的函式  把sequences裡(分批後 每批101ㄍ)的全部句子 把頭尾各切掉 分兩句

In [None]:
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


In [None]:
for i in dataset.take(1):
  print(i)

(<tf.Tensor: shape=(100,), dtype=int64, numpy=
array([19, 48, 57, 58, 59,  2, 16, 48, 59, 48, 65, 44, 53, 11,  1, 15, 44,
       45, 54, 57, 44,  2, 62, 44,  2, 55, 57, 54, 42, 44, 44, 43,  2, 40,
       53, 64,  2, 45, 60, 57, 59, 47, 44, 57,  7,  2, 47, 44, 40, 57,  2,
       52, 44,  2, 58, 55, 44, 40, 50,  9,  1,  1, 14, 51, 51, 11,  1, 32,
       55, 44, 40, 50,  7,  2, 58, 55, 44, 40, 50,  9,  1,  1, 19, 48, 57,
       58, 59,  2, 16, 48, 59, 48, 65, 44, 53, 11,  1, 38, 54, 60])>, <tf.Tensor: shape=(100,), dtype=int64, numpy=
array([48, 57, 58, 59,  2, 16, 48, 59, 48, 65, 44, 53, 11,  1, 15, 44, 45,
       54, 57, 44,  2, 62, 44,  2, 55, 57, 54, 42, 44, 44, 43,  2, 40, 53,
       64,  2, 45, 60, 57, 59, 47, 44, 57,  7,  2, 47, 44, 40, 57,  2, 52,
       44,  2, 58, 55, 44, 40, 50,  9,  1,  1, 14, 51, 51, 11,  1, 32, 55,
       44, 40, 50,  7,  2, 58, 55, 44, 40, 50,  9,  1,  1, 19, 48, 57, 58,
       59,  2, 16, 48, 59, 48, 65, 44, 53, 11,  1, 38, 54, 60,  2])>)


### 創建訓練批次

您使用 `tf.data` 將文本拆分為可管理的序列。但在將這些數據輸入模型之前，您需要將數據打亂並打包成批次。

##dataset_2
將原本的dataset打亂

並再度分批 



1.   text : 原始文字檔
2.   all_ids : 切割text字元轉換成標記
3.   ids_dataset : 將all_ids裡的字元標記 一個一個分開
4.   sequences : 將ids_dataset根據seq_length+1 (101) 去分批 (每101個字元標記一批)
5.   dataset : 使用函式split_input_target把分批後的(標記)頭跟尾各拆出來 做成兩個句子(有頭無尾 和 有尾無頭 的句子)
6.   dataset_2 : 將dataset
打亂(內部字元標誌順序)然後在每64個一批(每個皆為100個字元標記 然後每64個一批)


In [None]:
# Batch size
BATCH_SIZE = 64

# 緩衝區大小以打亂數據集
# （TF 數據被設計為與可能無限的序列一起工作，
# 所以它不會嘗試打亂內存中的整個序列。反而，
# 它維護一個緩衝區，在其中對元素進行洗牌）。
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))  #自動優化

dataset

<PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

##dataset_2 打亂後


In [None]:
for i in dataset.take(1):
  print(i)

(<tf.Tensor: shape=(64, 100), dtype=int64, numpy=
array([[ 2, 55, 40, ..., 53,  7,  2],
       [ 1, 28,  2, ..., 35, 54, 60],
       [11,  1, 14, ...,  2, 47, 44],
       ...,
       [ 2, 62, 48, ..., 48, 59, 47],
       [58, 44, 44, ..., 33, 54,  2],
       [62,  2, 59, ...,  2, 40,  2]])>, <tf.Tensor: shape=(64, 100), dtype=int64, numpy=
array([[55, 40, 64, ...,  7,  2, 14],
       [28,  2, 62, ..., 54, 60, 42],
       [ 1, 14, 46, ..., 47, 44,  2],
       ...,
       [62, 48, 59, ..., 59, 47,  2],
       [44, 44,  1, ..., 54,  2, 46],
       [ 2, 59, 47, ..., 40,  2, 52]])>)


## Build The Model

本節將模型定義為 `keras.Model` 子類（有關詳細信息，請參閱[通過子類化製作新層和模型](https://www.tensorflow.org/guide/keras/custom_layers_and_models)）。

該模型分為三層：

* `tf.keras.layers.Embedding`：輸入層。一個可訓練的查找表，它將每個字符 ID 映射到具有“embedding_dim”維度的向量；
* `tf.keras.layers.GRU`：一種大小為 `units=rnn_units` 的 RNN（您也可以在此處使用 LSTM 層。）
* `tf.keras.layers.Dense`：輸出層，帶有 `vocab_size` 輸出。它為詞彙表中的每個字符輸出一個 logit。這些是根據模型的每個字符的對數似然。

In [None]:
# 詞彙表的長度（以字符為單位）
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [None]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [None]:
model = MyModel(
    # 確保詞彙量大小與“StringLookup”層相匹配。
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

對於每個字符，模型查找嵌入，以嵌入作為輸入運行 GRU 一個時間步長，並應用密集層生成預測下一個字符的對數似然的 logits：

![通過模型的數據圖](https://github.com/tensorflow/text/blob/master/docs/tutorials/images/text_generation_training.png?raw=1)


注意：對於訓練，您可以在此處使用 `keras.Sequential` 模型。要稍後生成文本，您需要管理 RNN 的內部狀態。預先包含狀態輸入和輸出選項比稍後重新排列模型架構更簡單。有關詳細信息，請參閱 [Keras RNN 指南](https://www.tensorflow.org/guide/keras/rnn#rnn_state_reuse)。

##試試模型

現在運行模型以查看其行為是否符合預期。

首先檢查輸出的形狀：

In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
  print(input_example_batch[0])
  print(target_example_batch[0])

tf.Tensor(
[48 53  2 64 54 60 57  2 51 48 55 58  7  1 25 48 50 44  2 52 40 53  2 53
 44 62  2 52 40 43 44  9  1  1 14 27 20 18 25 28 11  1 15 44  2 64 54 60
  2 42 54 53 59 44 53 59  7  2 45 40 48 57  2 52 40 48 43 12  1 22 59  2
 48 58  2 59 47 44  2 51 40 62  7  2 53 54 59  2 22  2 42 54 53 43 44 52
 53  2 64 54], shape=(100,), dtype=int64)
tf.Tensor(
[53  2 64 54 60 57  2 51 48 55 58  7  1 25 48 50 44  2 52 40 53  2 53 44
 62  2 52 40 43 44  9  1  1 14 27 20 18 25 28 11  1 15 44  2 64 54 60  2
 42 54 53 59 44 53 59  7  2 45 40 48 57  2 52 40 48 43 12  1 22 59  2 48
 58  2 59 47 44  2 51 40 62  7  2 53 54 59  2 22  2 42 54 53 43 44 52 53
  2 64 54 60], shape=(100,), dtype=int64)


In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


在上面的例子中，輸入的序列長度是“100”，但是模型可以在任何長度的輸入上運行：

In [None]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4,022,850
Trainable params: 4,022,850
Non-trainable params: 0
_________________________________________________________________


要從模型中獲得實際預測，您需要從輸出分佈中採樣，以獲得實際的字符索引。此分佈由字符詞彙表上的 logits 定義。

注意：從這個分佈中 _sample_ 很重要，因為獲取分佈的 _argmax_ 很容易讓模型陷入循環。

嘗試批處理中的第一個示例：

In [None]:
len(example_batch_predictions)

64

In [None]:
example_batch_predictions[0][0]  #單次  1個

<tf.Tensor: shape=(66,), dtype=float32, numpy=
array([ 1.47373527e-02, -1.30255602e-03, -3.31095490e-03,  7.74528785e-03,
        1.31458358e-03, -3.73415393e-03,  1.16314795e-02,  1.66195945e-03,
       -1.79547016e-02, -4.96065523e-03,  5.79529721e-03,  7.91224558e-03,
       -3.64950253e-03, -2.98415823e-03, -1.06501579e-03, -1.18054403e-02,
       -1.52522810e-02,  1.86658674e-03, -5.11689403e-04, -1.65472403e-02,
        3.40337446e-03,  4.06091847e-03, -1.62174031e-02,  1.93740695e-03,
        9.90809500e-03, -3.62373376e-03,  1.08096581e-04,  1.04385102e-03,
       -4.79410822e-03,  3.93873407e-03, -3.30301351e-03,  1.19510842e-02,
        3.03774403e-04,  2.19397270e-03, -8.77941772e-03,  1.95144489e-03,
        6.07894408e-03, -1.19723601e-03,  6.87311869e-03, -6.06578030e-03,
        3.54992109e-03,  9.24142019e-04, -2.89068383e-04,  5.83804073e-03,
       -7.07369717e-03,  5.27616823e-03,  2.30620545e-03, -5.39336959e-03,
       -6.15410463e-05,  6.02330547e-03, -7.08821556e

In [None]:
example_batch_predictions[0]  #100個字元標記1批 每64批再分成一大批  字典數總共66個 
#所以它是預測 每個字元標記(每次一個 總共100個)的下一個標記可能是什麼  所以預測100次 

<tf.Tensor: shape=(100, 66), dtype=float32, numpy=
array([[ 0.01473735, -0.00130256, -0.00331095, ...,  0.00644533,
        -0.00086795, -0.00319799],
       [ 0.00506034, -0.00058064,  0.00523378, ...,  0.00789508,
        -0.0011092 , -0.002467  ],
       [ 0.00409553, -0.00785127,  0.01213993, ...,  0.01036033,
         0.00402052,  0.00525858],
       ...,
       [-0.00615816,  0.00576279,  0.01624059, ...,  0.00175738,
         0.00079571, -0.00029083],
       [-0.00543165,  0.00352938,  0.01561974, ...,  0.0037165 ,
        -0.00096293, -0.00092333],
       [-0.00626807,  0.01960578,  0.01079688, ..., -0.00616088,
        -0.00654238, -0.00778664]], dtype=float32)>

In [None]:
example_batch_predictions  #64大批

<tf.Tensor: shape=(64, 100, 66), dtype=float32, numpy=
array([[[ 0.01473735, -0.00130256, -0.00331095, ...,  0.00644533,
         -0.00086795, -0.00319799],
        [ 0.00506034, -0.00058064,  0.00523378, ...,  0.00789508,
         -0.0011092 , -0.002467  ],
        [ 0.00409553, -0.00785127,  0.01213993, ...,  0.01036033,
          0.00402052,  0.00525858],
        ...,
        [-0.00615816,  0.00576279,  0.01624059, ...,  0.00175738,
          0.00079571, -0.00029083],
        [-0.00543165,  0.00352938,  0.01561974, ...,  0.0037165 ,
         -0.00096293, -0.00092333],
        [-0.00626807,  0.01960578,  0.01079688, ..., -0.00616088,
         -0.00654238, -0.00778664]],

       [[ 0.01473735, -0.00130256, -0.00331095, ...,  0.00644533,
         -0.00086795, -0.00319799],
        [ 0.00506034, -0.00058064,  0.00523378, ...,  0.00789508,
         -0.0011092 , -0.002467  ],
        [-0.00055853,  0.01730294,  0.00530882, ..., -0.00309048,
         -0.00597432, -0.00824041],
        ...,

In [None]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1) #從分類分佈中抽取樣本。   
                          #num_samples 要為每個行切片繪製的獨立樣本數
                          #從66個裡面  隨機抽出一個
sampled_indices

<tf.Tensor: shape=(100, 1), dtype=int64, numpy=
array([[58],
       [38],
       [10],
       [47],
       [44],
       [35],
       [16],
       [10],
       [39],
       [ 2],
       [34],
       [33],
       [28],
       [ 3],
       [48],
       [50],
       [ 6],
       [12],
       [53],
       [20],
       [32],
       [14],
       [57],
       [30],
       [59],
       [20],
       [ 9],
       [34],
       [33],
       [ 9],
       [21],
       [14],
       [42],
       [50],
       [ 8],
       [21],
       [ 6],
       [64],
       [58],
       [35],
       [23],
       [36],
       [13],
       [57],
       [58],
       [ 5],
       [15],
       [11],
       [34],
       [ 8],
       [30],
       [18],
       [26],
       [65],
       [ 9],
       [43],
       [ 7],
       [48],
       [49],
       [ 7],
       [25],
       [35],
       [48],
       [55],
       [ 1],
       [48],
       [ 8],
       [31],
       [ 2],
       [ 3],
       [60],
       [43],
       [ 9],
   

In [None]:
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy() #從張量的形狀中刪除大小為1的維度


這給了我們在每個時間步長下一個字符索引的預測：

In [None]:
sampled_indices

array([58, 38, 10, 47, 44, 35, 16, 10, 39,  2, 34, 33, 28,  3, 48, 50,  6,
       12, 53, 20, 32, 14, 57, 30, 59, 20,  9, 34, 33,  9, 21, 14, 42, 50,
        8, 21,  6, 64, 58, 35, 23, 36, 13, 57, 58,  5, 15, 11, 34,  8, 30,
       18, 26, 65,  9, 43,  7, 48, 49,  7, 25, 35, 48, 55,  1, 48,  8, 31,
        2,  3, 60, 43,  9, 53, 25, 43,  6, 23, 22, 26, 15,  0, 58, 19, 15,
       61, 34, 59, 41, 17, 50, 18, 62, 29, 62, 43, 39, 60, 17, 24])


解碼這些以查看此未經訓練的模型預測的文本：

In [None]:
print("Input:\n", text_from_ids(input_example_batch[0]).numpy())
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy())

Input:
 b'd souls\nDo through the clouds behold this present hour,\nEven for revenge mock my destruction!\nThis i'

Next Char Predictions:
 b"sY3heVC3Z UTO!ik';nGSArQtG.UT.HAck-H'ysVJW?rs&B:U-QEMz.d,ij,LVip\ni-R !ud.nLd'JIMB[UNK]sFBvUtbDkEwPwdZuDK"


## Train the model


此時可以將問題視為標準分類問題。給定之前的 RNN 狀態，以及這個時間步的輸入，預測下一個字符的類別。


### 附加優化器和損失函數


標準的 `tf.keras.losses.sparse_categorical_crossentropy` 損失函數在這種情況下有效，因為它應用於預測的最後一個維度。

因為您的模型返回 logits，您需要設置 `from_logits` 標誌。

In [None]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [None]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", example_batch_mean_loss)

Prediction shape:  (64, 100, 66)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.1880674, shape=(), dtype=float32)



一個新初始化的模型不應該對自己太確定，輸出的 logits 應該都有相似的大小。為了確認這一點，您可以檢查平均損失的指數是否大約等於詞彙量。更高的損失意味著模型確定它的錯誤答案，並且初始化錯誤：

In [None]:
tf.exp(example_batch_mean_loss).numpy()

65.89532


使用 `tf.keras.Model.compile` 方法配置訓練過程。使用帶有默認參數和損失函數的 `tf.keras.optimizers.Adam`。


In [None]:
model.compile(optimizer='adam', loss=loss)

### 配置檢查點


使用 `tf.keras.callbacks.ModelCheckpoint` 確保在訓練期間保存檢查點：

In [None]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)


### 執行訓練

為了保持訓練時間合理，使用 10 個 epoch 來訓練模型。在 Colab 中，將運行時設置為 GPU 以加快訓練速度。

In [None]:
EPOCHS = 10

In [None]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Generate text

使用此模型生成文本的最簡單方法是循環運行它，並在執行時跟踪模型的內部狀態。

![為了生成文本，模型的輸出被反饋到輸入](https://github.com/tensorflow/text/blob/master/docs/tutorials/images/text_generation_sampling.png?raw=1)

每次調用模型時，都會傳入一些文本和內部狀態。該模型返回對下一個字符及其新狀態的預測。將預測和狀態傳回以繼續生成文本。


下面進行單步預測：

In [None]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

In [None]:
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

在循環中運行它以生成一些文本。查看生成的文本，您會看到該模型知道何時大寫、製作段落並模仿莎士比亞式的寫作詞彙。由於訓練 epoch 的數量很少，它還沒有學會形成連貫的句子。


In [None]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

ROMEO:
Were not so can prey their eyes
With vishom from himself, some island
Were took of all woes, and I'll beg.

Bost:
Come, sir, I know it so: sir, cut it a tale: speak lagging
your virtues, if I do bey dead that breathe it: at
Aht on his brother.

KING RICHARD II:
Nor thy mistress so a noble must was,
Thy thrish o' the time ensider than here still, diseace no more
of his health; for he is minded, as but
our butiness, bad children ere mine enemy.

CLIRDEBO:
Novergound your fatheritable, that I am done;
This is a while he wash'd upon,
And no more than a batty offices
Than what you wrong'dn me for your honour.
Detestom, it so must I see whom comfort you
To fear the gaven my state to Beng Surren crast?

YORK:
Madam, had I know, I woo:' 'tis body, a very strange as their
post brought your face gaves have wife for as the stage
Unon the right thought on the heart-store.
What she was born thine hate;
Than sheep you for it. Speak not whither?

VOLUMNIA:
No sorrow, good days he of mine, some

改善結果的最簡單方法是訓練更長時間（嘗試 `EPOCHS = 30`）。

您還可以嘗試使用不同的起始字符串，嘗試添加另一個 RNN 層以提高模型的準確性，或者調整溫度參數以生成或多或少的隨機預測。


如果您希望模型更快地生成文本，那麼您可以做的最簡單的事情就是批量生成文本。在下面的示例中，模型生成 5 個輸出的時間與上面生成 1 個輸出的時間大致相同。

In [None]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result, '\n\n' + '_'*80)
print('\nRun time:', end - start)

tf.Tensor(
[b"ROMEO:\nWe will not go back.\n\nLirYT:\nI leave? Come.\n\nKING RICHARD II:\nWith think that made Romeo England's business,\nSo woound alsomen, your outrage to vent,\nAnd for this country we show morrow might\nNow York's name awaked alone.\n\nHERMIONE:\nNay, that the trumpets; he would say 'prageno her commission;\nThe truck is dead assured for gaterful\nBeholding of all eam but arms.\n\nESCALUS:\nCore, mine, as Evermake him gentleman\nI' thee, tender but horesty's grave;\nAnd in a widow, lady! prove a tramber\nwear them with a service brows, that I slave than heart\nThat seem'dis--which if you hear?' but not\nAre damnable, when he shall be irle.\n\nLUCENTIO:\nI shall be committed, my liege!\n\nESCALUS:\nIf we mistank, steal uproy, you shall were wonder-complixing.\n\nGLOUCESTER:\nGentle heaven Saint George of sorrow,\nWhich, title is a whip day as to-day, the deep wlect your honour!\nOf more. Hast thou wert stay and speak\nLest nature on thing enemy, how have lived the th

## 導出生成器

這個單步模型可以很容易地[保存和恢復]（https://www.tensorflow.org/guide/saved_model），
允許您在任何接受 `tf.saved_model` 的地方使用它。

In [None]:
tf.saved_model.save(one_step_model, 'one_step')
one_step_reloaded = tf.saved_model.load('one_step')





INFO:tensorflow:Assets written to: one_step/assets


INFO:tensorflow:Assets written to: one_step/assets


In [None]:
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(100):
  next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode("utf-8"))

ROMEO:

Groolere:
To hid his life
Hast thou sure again.

BRUTUS:
Why heartest thou, sir? first thou repost


## 高級：定制培訓

上面的訓練過程很簡單，但沒有給你太多的控制權。
它使用teacher-forcing 來防止錯誤的預測被反饋給模型，因此模型永遠不會學會從錯誤中恢復。

現在您已經了解瞭如何手動運行模型，接下來您將實現訓練循環。例如，如果您想實施_課程學習_以幫助穩定模型的開環輸出，這將提供一個起點。

自定義訓練循環中最重要的部分是訓練步驟函數。

使用 `tf.GradientTape` 來跟踪漸變。您可以通過閱讀 [eager execution guide] (https://www.tensorflow.org/guide/eager) 了解有關此方法的更多信息。

基本程序是：

1. 執行模型併計算 `tf.GradientTape` 下的損失。
2. 計算更新並使用優化器將它們應用於模型。

In [None]:
class CustomTraining(MyModel):
  @tf.function
  def train_step(self, inputs):
      inputs, labels = inputs
      with tf.GradientTape() as tape:
          predictions = self(inputs, training=True)
          loss = self.loss(labels, predictions)
      grads = tape.gradient(loss, model.trainable_variables)
      self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

      return {'loss': loss}

上述 `train_step` 方法的實現遵循 [Keras 的 `train_step` 約定]（https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit）。這是可選的，但它允許您更改訓練步驟的行為並仍然使用 keras 的 `Model.compile` 和 `Model.fit` 方法。

In [None]:
model = CustomTraining(
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [None]:
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

In [None]:
model.fit(dataset, epochs=1)



<keras.callbacks.History at 0x7f8e4f1022d0>


或者，如果您需要更多控制，您可以編寫自己的完整自定義訓練循環：

In [None]:
EPOCHS = 10

mean = tf.metrics.Mean()

for epoch in range(EPOCHS):
    start = time.time()

    mean.reset_states()
    for (batch_n, (inp, target)) in enumerate(dataset):
        logs = model.train_step([inp, target])
        mean.update_state(logs['loss'])

        if batch_n % 50 == 0:
            template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}"
            print(template)

    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))

    print()
    print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
    print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
    print("_"*80)

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 2.2124
Epoch 1 Batch 50 Loss 2.0632
Epoch 1 Batch 100 Loss 1.9641
Epoch 1 Batch 150 Loss 1.8704

Epoch 1 Loss: 1.9897
Time taken for 1 epoch 25.25 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.8679
Epoch 2 Batch 50 Loss 1.7662
Epoch 2 Batch 100 Loss 1.7425
Epoch 2 Batch 150 Loss 1.6463

Epoch 2 Loss: 1.7147
Time taken for 1 epoch 24.37 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.6155
Epoch 3 Batch 50 Loss 1.5662
Epoch 3 Batch 100 Loss 1.5025
Epoch 3 Batch 150 Loss 1.5276

Epoch 3 Loss: 1.5550
Time taken for 1 epoch 24.46 sec
________________________________________________________________________________
Epoch 4 Batch 0 Loss 1.4803
Epoch 4 Batch 50 Loss 1.5032
Epoch 4 Batch 100 Loss 1.4476
Epoch 4 Batch 150 Loss 1.4533

Epoch 4 Loss: 1.4563
Time taken for 1 epoch 24.46 sec
_____________________________________________________________________