# 生成电视剧剧本

在这个项目中，你将使用 RNN 创作你自己的[《辛普森一家》](https://zh.wikipedia.org/wiki/%E8%BE%9B%E6%99%AE%E6%A3%AE%E4%B8%80%E5%AE%B6)电视剧剧本。你将会用到《辛普森一家》第 27 季中部分剧本的[数据集](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data)。你创建的神经网络将为一个在 [Moe 酒馆](https://simpsonswiki.com/wiki/Moe's_Tavern)中的场景生成一集新的剧本。

## 获取数据
我们早已为你提供了数据`./data/Seinfeld_Scripts.txt`。我们建议你打开文档来看看这个文档内容。

>* 第一步，我们来读入文档，并看几段例子。
* 然后，你需要定义并训练一个 RNN 网络来生成新的剧本！

In [8]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [9]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# load in data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)

## 探索数据
使用 `view_line_range` 来查阅数据的不同部分，这个部分会让你对整体数据有个基础的了解。你会发现，文档中全是小写字母，并且所有的对话都是使用 `\n` 来分割的。

In [10]:
view_line_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
print('Average number of words in each line: {}'.format(np.average(word_count_line)))

print()
print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))

Dataset Stats
Roughly the number of unique words: 46367
Number of lines: 109233
Average number of words in each line: 5.544240293684143

The lines 0 to 10:
jerry: do you know what this is all about? do you know, why were here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about we should go out? this is what theyre talking about...this whole thing, were all out now, no one is home. not one person here is home, were all out! there are people trying to find us, they dont know where we are. (on an imaginary phone) did you ring?, i cant find him. where did he go? he didnt tell me where he was going. he must have gone out. you wanna go out you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...then youre standing around, what do you do? you go we gotta be getting back. once youre out, you wanna get back! y

---
## 实现预处理函数
对数据集进行的第一个操作是预处理。请实现下面两个预处理函数：

- 查询表
- 标记符号

### 查询表
要创建词嵌入，你首先要将词语转换为 id。请在这个函数中创建两个字典：

- 将词语转换为 id 的字典，我们称它为 `vocab_to_int`
- 将 id 转换为词语的字典，我们称它为 `int_to_vocab`

请在下面的元组中返回这些字典
 `(vocab_to_int, int_to_vocab)`

In [11]:
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    from collections import Counter

    ## Build a dictionary that maps words to integers
    counts = Counter(text)
    vocab = sorted(counts, key=counts.get, reverse=True)            # sorted method param"key"
    vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}
    int_to_vocab = {v:k for k,v in vocab_to_int.items()}
    # return tuple
    return (vocab_to_int, int_to_vocab)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### 标记符号的字符串
我们会使用空格当作分隔符，来将剧本分割为词语数组。然而，句号和感叹号等符号使得神经网络难以分辨“再见”和“再见！”之间的区别。

实现函数 `token_lookup` 来返回一个字典，这个字典用于将 “!” 等符号标记为 “||Exclamation_Mark||” 形式。为下列符号创建一个字典，其中符号为标志，值为标记。

- period ( . )
- comma ( , )
- quotation mark ( " )
- semicolon ( ; )
- exclamation mark ( ! )
- question mark ( ? )
- left parenthesis ( ( )
- right parenthesis ( ) )
- dash ( -- )
- return ( \n )

这个字典将用于标记符号并在其周围添加分隔符（空格）。这能将符号视作单独词汇分割开来，并使神经网络更轻松地预测下一个词汇。请确保你并没有使用容易与词汇混淆的标记。与其使用 “dash” 这样的标记，试试使用“||dash||”。

In [12]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    token_dict = {'.': '||period||',
                 ',': '||comma||',
                 '"': '||quotation_mark||',
                 ';': '||semicolon||',
                 '!': '||exclamation_mark||',
                 '?': '||question_mark||',
                 '(': '||left_parenthesis||',
                 ')': '||right_parenthesis||',
                 '-': '||dash||',
                 '\n': '||return||'}
    return token_dict

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## 预处理并保存所有数据
运行以下代码将预处理所有数据，并将它们保存至文件。建议你查看`helpers.py` 文件中的 `preprocess_and_save_data` 代码来看这一步在做什么，但是你不需要修改`helpers.py`中的函数。

In [13]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# 检查点
这是你遇到的第一个检点。如果你想要回到这个 notebook，或需要重新打开 notebook，你都可以从这里开始。预处理的数据都已经保存完毕。

In [14]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## 创建神经网络
在本节中，你会构建 RNN 中的必要 Module，以及 前向、后向函数。

### 检查 GPU 访问权限

In [15]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

No GPU found. Please use a GPU to train your neural network.


## 输入
让我们开始预处理输入数据。我们会使用 [TensorDataset](http://pytorch.org/docs/master/data.html#torch.utils.data.TensorDataset) 来为数据库提供一个数据格式；以及一个 [DataLoader](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader), 该对象会实现 batching，shuffling 以及其他数据迭代功能。

你可以通过传入 特征 和目标 tensors 来创建 TensorDataset，随后创建一个 DataLoader 。
```
data = TensorDataset(feature_tensors, target_tensors)
data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size)
```

### Batching
 通过 `TensorDataset` 和 `DataLoader` 类来实现  `batch_data` 函数来将 `words` 数据分成 `batch_size` 批次。

>你可以使用 DataLoader 来分批 单词, 但是你可以自由设置 `feature_tensors` 和 `target_tensors` 的大小以及 `sequence_length`。

比如，我们有如下输入:
```
words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4
```

你的第一个 `feature_tensor` 会包含:
```
[1, 2, 3, 4]
```
随后的 `target_tensor` 会是接下去的一个字符值:
```
5
```
那么，第二组的`feature_tensor`, `target_tensor` 则如下所示:
```
[2, 3, 4, 5]  # features
6             # target
```

In [16]:
words = [1, 2, 3, 4, 5, 6, 7]
sequ_len = 4

fea_list = [words[n:n+sequ_len] for n in range(0, len(words)-sequ_len)]
tar_list = [words[n] for n in range(sequ_len, len(words))]

print(fea_list)
print(tar_list)

[[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
[5, 6, 7]


In [17]:
from torch.utils.data import TensorDataset, DataLoader


def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    # TODO: Implement function
    feature_tensor = torch.Tensor([words[n:n+sequence_length] for n in range(0, len(words)-sequence_length)])
    target_tensor = torch.Tensor([words[n] for n in range(sequence_length, len(words))])
    
    dataset = TensorDataset(feature_tensor, target_tensor)
    
    data_loader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
    # return a dataloader
    return data_loader

# there is no test for this function, but you are encouraged to create
# print statements and tests of your own
ws_lst = [1, 2, 3, 4, 5, 6, 7]
seq_len = 4

for data in batch_data(ws_lst, seq_len, 1):
    print(data, '\t', data[0].shape, data[1].shape)

[tensor([[3., 4., 5., 6.]]), tensor([7.])] 	 torch.Size([1, 4]) torch.Size([1])
[tensor([[1., 2., 3., 4.]]), tensor([5.])] 	 torch.Size([1, 4]) torch.Size([1])
[tensor([[2., 3., 4., 5.]]), tensor([6.])] 	 torch.Size([1, 4]) torch.Size([1])


### 测试你的 dataloader 

你需要改写下述代码来测试 batching 函数，改写后的代码会现在的比较类似。

下面，我们生成了一些测试文本数据，并使用了一个你上面写 dataloader 。然后，我们会得到一些使用`sample_x`输入以及`sample_y`目标生成的文本。

你的代码会返回如下结果(通常是不同的顺序，如果你 shuffle 了你的数据):

```
torch.Size([10, 5])
tensor([[ 28,  29,  30,  31,  32],
        [ 21,  22,  23,  24,  25],
        [ 17,  18,  19,  20,  21],
        [ 34,  35,  36,  37,  38],
        [ 11,  12,  13,  14,  15],
        [ 23,  24,  25,  26,  27],
        [  6,   7,   8,   9,  10],
        [ 38,  39,  40,  41,  42],
        [ 25,  26,  27,  28,  29],
        [  7,   8,   9,  10,  11]])

torch.Size([10])
tensor([ 33,  26,  22,  39,  16,  28,  11,  43,  30,  12])
```

### 大小
你的 sample_x 应该是 `(batch_size, sequence_length)`的 大小 或者是(10, 5)， sample_y 应该是 一维的: batch_size (10)。

### 值

你应该也会发现 sample_y, 是 test_text 数据中的*下一个*值。因此，对于一个输入的序列 `[ 28,  29,  30,  31,  32]` ，它的结尾是 `32`, 那么其相应的输出应该是 `33`。

In [18]:
# test dataloader

test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)

torch.Size([10, 5])
tensor([[ 2.,  3.,  4.,  5.,  6.],
        [ 5.,  6.,  7.,  8.,  9.],
        [ 4.,  5.,  6.,  7.,  8.],
        [13., 14., 15., 16., 17.],
        [ 1.,  2.,  3.,  4.,  5.],
        [41., 42., 43., 44., 45.],
        [27., 28., 29., 30., 31.],
        [15., 16., 17., 18., 19.],
        [ 8.,  9., 10., 11., 12.],
        [11., 12., 13., 14., 15.]])

torch.Size([10])
tensor([ 7., 10.,  9., 18.,  6., 46., 32., 20., 13., 16.])


---
## 构建神经网络
使用 PyTorch [Module class](http://pytorch.org/docs/master/nn.html#torch.nn.Module) 来实现一个 循环神经网络 RNN。你需要选择一个 GRU 或者 一个 LSTM。为了完成循环神经网络。为了实现 RNN，你需要实现以下类:
 - `__init__` - 初始化函数
 - `init_hidden` - LSTM/GRU 隐藏组昂泰的初始化函数
 - `forward` - 前向传播函数
 
初始化函数需要创建神经网络的层数，并保存到类。前向传播函数会使用这些网络来进行前向传播，并生成输出和隐藏状态。

在该流程完成后，**该模型的输出是 *最后的* 文字分数结果** 对于每段输入的文字序列，我们只需要输出一个单词，也就是，下一个单词。 

### 提示

1. 确保 lstm 的输出会链接一个 全链接层，你可以参考如下代码 `lstm_output = lstm_output.contiguous().view(-1, self.hidden_dim)`
2. 你可以通过 reshape 模型最后输出的全链接层，来得到最终的文字分数:

```
# reshape into (batch_size, seq_length, output_size)
output = output.view(batch_size, -1, self.output_size)
# get last batch
out = output[:, -1]
```

In [19]:
import torch.nn as nn


class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        super(RNN, self).__init__()
        self.vocab_size = vocab_size
        self.output_size = output_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.dropout = dropout
        self.embedding = nn.Embedding(num_embeddings=self.vocab_size, 
                                      embedding_dim=self.embedding_dim)
        self.lstm = nn.LSTM(input_size=self.embedding_dim, 
                            hidden_size=self.hidden_dim, 
                            dropout=self.dropout,
                            num_layers=self.n_layers,
                            batch_first=True)
        self.fc = nn.Linear(self.hidden_dim, self.output_size)
    
    
    def forward(self, nn_input, hidden):
        nn_input = nn_input.long()
        batch_size, _ = nn_input.size() # batch first
        embedding_input = self.embedding(nn_input)
        nn_output, hidden = self.lstm(embedding_input, hidden)
        nn_output = nn_output.contiguous().view(-1, self.hidden_dim)
        
        output = self.fc(nn_output)
        output = output.view(batch_size, -1, self.output_size)
        output = output[:, -1]

        # return one batch of output word scores and the hidden state
        return output, hidden
    
    
    def init_hidden(self, batch_size):

        weight = next(self.parameters()).data
        
        # initialize hidden state with zero weights, and move to GPU if available
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        return hidden
    
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_rnn(RNN, train_on_gpu)

Tests Passed


### 定义前向及后向传播

通过你实现的 RNN 类来进行前向及后项传播。你可以在训练循环中，不断地调用如下代码来实现：
```
loss = forward_back_prop(decoder, decoder_optimizer, criterion, inp, target)
```

函数中需要返回一个批次以及其隐藏状态的loss均值，你可以调用一个函数`RNN(inp, hidden)`来实现。记得，你可以通过调用`loss.item()` 来计算得到该loss。

**如果使用 GPU，你需要将你的数据存到 GPU 的设备上。**

In [20]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    
    # move data to GPU, if available
    if train_on_gpu:
        inp, target = inp.cuda(), target.cuda()
    
    hidden = tuple([each.data for each in hidden])
    
    # perform backpropagation and optimization
    rnn.zero_grad()
    
    output, hidden = rnn(inp, hidden)
    loss = criterion(output.squeeze(), target.long())
    loss.backward()
    
    clip = 5.0 # gradient clipping
    nn.utils.clip_grad_norm_(rnn.parameters(), clip)
    optimizer.step()
    
    # return the loss over a batch and the hidden state produced by our model
    return loss.item(), hidden



# Note that these tests aren't completely extensive.
# they are here to act as general checks on the expected outputs of your functions
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed


## 神经网络训练

神经网络结构完成以及数据准备完后，我们可以开始训练网络了。

### 训练循环

训练循环是通过 `train_decoder` 函数实现的。该函数将进行 epochs 次数的训练。模型的训练成果会在一定批次的训练后，被打印出来。这个“一定批次”可以通过`show_every_n_batches` 来设置。你会在下一节设置这个参数。

In [21]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from workspace_utils import keep_awake

def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
    batch_losses = []
    
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
#     for epoch_i in range(1, n_epochs + 1):
    for epoch_i in keep_awake(range(1, n_epochs + 1)):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)

            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Batch:{:>4}/{:<4}  Loss: {}'.format(
                    epoch_i, n_epochs, batch_i, n_batches, np.average(batch_losses)))
                batch_losses = []

    # returns a trained rnn
    return rnn

### 超参数

设置并训练以下超参数:
-  `sequence_length`，序列长度 
-  `batch_size`，分批大小
-  `num_epochs`，循环次数
-  `learning_rate`，Adam优化器的学习率
-  `vocab_size`，唯一标示词汇的数量
-  `output_size`，模型输出的大小 
-  `embedding_dim`，词嵌入的维度，小于 vocab_size
-  `hidden_dim`， 隐藏层维度
-  `n_layers`， RNN的层数
-  `show_every_n_batches`，打印结果的频次

如果模型没有获得你预期的结果，调整 `RNN`类中的上述参数。

In [22]:
%%time
# Data params
# Sequence Length
# set the hyperparamaters
sequence_length = 131        # number of words in a sequence; total words: 892,110: factors are 30, 131, 227
batch_size = 128
train_loader = batch_data(int_text, sequence_length, batch_size)

Wall time: 7.85 s


In [23]:
# Training parameters
# set the training parameters
num_epochs = 3
learning_rate = 0.001       # 0.01 is worse

# set the model parameters
vocab_size = len(vocab_to_int)
output_size = vocab_size
embedding_dim = 300        # 128 is worse
hidden_dim = 512
n_layers = 2

# show stats for every n number of batches
show_every_n_batches = 5

### 训练
下一节，通过预处理数据来训练神经网络。如果你的loss结果不好，可以通过调整超参数来修正。通常情况下，大的隐藏层及层数会带来比较好的效果，但同时也会消耗较长的时间来训练。
> **你应该努力得到一个低于3.5的loss** 

你也可以试试不同的序列长度，该参数表明模型学习的范围大小。

In [24]:
%%time
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""

# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()
# criterion = nn.BCELoss()

# training the model
from workspace_utils import active_session
with active_session():
    trained_rnn = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

# saving the trained model
helper.save_model('./save/trained_rnn', trained_rnn)
print('Model Trained and Saved')

Training for 3 epoch(s)...
Epoch:    1/3     Batch:   5/6968  Loss: 9.750128173828125

Epoch:    1/3     Batch:  10/6968  Loss: 7.587319278717041

Epoch:    1/3     Batch:  15/6968  Loss: 6.27403039932251

Epoch:    1/3     Batch:  20/6968  Loss: 6.382200813293457

Epoch:    1/3     Batch:  25/6968  Loss: 6.394595146179199

Epoch:    1/3     Batch:  30/6968  Loss: 6.236073398590088

Epoch:    1/3     Batch:  35/6968  Loss: 6.104263877868652

Epoch:    1/3     Batch:  40/6968  Loss: 6.022180652618408

Epoch:    1/3     Batch:  45/6968  Loss: 6.098670864105225

Epoch:    1/3     Batch:  50/6968  Loss: 6.010707092285156

Epoch:    1/3     Batch:  55/6968  Loss: 5.996320343017578

Epoch:    1/3     Batch:  60/6968  Loss: 5.838743114471436

Epoch:    1/3     Batch:  65/6968  Loss: 5.976468849182129

Epoch:    1/3     Batch:  70/6968  Loss: 5.696450710296631

Epoch:    1/3     Batch:  75/6968  Loss: 5.736187076568603

Epoch:    1/3     Batch:  80/6968  Loss: 5.275537395477295

Epoch:    1/3 

Epoch:    1/3     Batch: 690/6968  Loss: 4.848678016662598

Epoch:    1/3     Batch: 695/6968  Loss: 4.9819238662719725

Epoch:    1/3     Batch: 700/6968  Loss: 4.903989315032959

Epoch:    1/3     Batch: 705/6968  Loss: 4.534520816802979

Epoch:    1/3     Batch: 710/6968  Loss: 4.759167289733886

Epoch:    1/3     Batch: 715/6968  Loss: 4.97975492477417

Epoch:    1/3     Batch: 720/6968  Loss: 4.754351711273193

Epoch:    1/3     Batch: 725/6968  Loss: 4.706702136993409

Epoch:    1/3     Batch: 730/6968  Loss: 4.698509693145752

Epoch:    1/3     Batch: 735/6968  Loss: 4.676018619537354

Epoch:    1/3     Batch: 740/6968  Loss: 4.763796997070313

Epoch:    1/3     Batch: 745/6968  Loss: 4.647189903259277

Epoch:    1/3     Batch: 750/6968  Loss: 4.9056438446044925

Epoch:    1/3     Batch: 755/6968  Loss: 4.8195648193359375

Epoch:    1/3     Batch: 760/6968  Loss: 4.612755298614502

Epoch:    1/3     Batch: 765/6968  Loss: 4.685379600524902

Epoch:    1/3     Batch: 770/6968  Los

Epoch:    1/3     Batch:1375/6968  Loss: 4.656357669830323

Epoch:    1/3     Batch:1380/6968  Loss: 4.468048572540283

Epoch:    1/3     Batch:1385/6968  Loss: 4.568103218078614

Epoch:    1/3     Batch:1390/6968  Loss: 4.502295732498169

Epoch:    1/3     Batch:1395/6968  Loss: 4.517316055297852

Epoch:    1/3     Batch:1400/6968  Loss: 4.522204971313476

Epoch:    1/3     Batch:1405/6968  Loss: 4.371023273468017

Epoch:    1/3     Batch:1410/6968  Loss: 4.545676898956299

Epoch:    1/3     Batch:1415/6968  Loss: 4.3565483570098875

Epoch:    1/3     Batch:1420/6968  Loss: 4.362685012817383

Epoch:    1/3     Batch:1425/6968  Loss: 4.382701587677002

Epoch:    1/3     Batch:1430/6968  Loss: 4.424052143096924

Epoch:    1/3     Batch:1435/6968  Loss: 4.461735534667969

Epoch:    1/3     Batch:1440/6968  Loss: 4.547649955749511

Epoch:    1/3     Batch:1445/6968  Loss: 4.605693054199219

Epoch:    1/3     Batch:1450/6968  Loss: 4.445745944976807

Epoch:    1/3     Batch:1455/6968  Loss

Epoch:    1/3     Batch:2060/6968  Loss: 4.518200826644898

Epoch:    1/3     Batch:2065/6968  Loss: 4.3675000190734865

Epoch:    1/3     Batch:2070/6968  Loss: 4.322602844238281

Epoch:    1/3     Batch:2075/6968  Loss: 4.642829036712646

Epoch:    1/3     Batch:2080/6968  Loss: 4.1229664325714115

Epoch:    1/3     Batch:2085/6968  Loss: 4.629550170898438

Epoch:    1/3     Batch:2090/6968  Loss: 4.189075994491577

Epoch:    1/3     Batch:2095/6968  Loss: 4.273756980895996

Epoch:    1/3     Batch:2100/6968  Loss: 4.443008804321289

Epoch:    1/3     Batch:2105/6968  Loss: 4.259553337097168

Epoch:    1/3     Batch:2110/6968  Loss: 4.57703971862793

Epoch:    1/3     Batch:2115/6968  Loss: 4.32003002166748

Epoch:    1/3     Batch:2120/6968  Loss: 4.128205251693726

Epoch:    1/3     Batch:2125/6968  Loss: 4.290370178222656

Epoch:    1/3     Batch:2130/6968  Loss: 4.253934049606324

Epoch:    1/3     Batch:2135/6968  Loss: 4.3208146572113035

Epoch:    1/3     Batch:2140/6968  Loss

Epoch:    1/3     Batch:2745/6968  Loss: 4.207505798339843

Epoch:    1/3     Batch:2750/6968  Loss: 4.254744529724121

Epoch:    1/3     Batch:2755/6968  Loss: 4.256886005401611

Epoch:    1/3     Batch:2760/6968  Loss: 4.4325484275817875

Epoch:    1/3     Batch:2765/6968  Loss: 4.177724075317383

Epoch:    1/3     Batch:2770/6968  Loss: 4.165682554244995

Epoch:    1/3     Batch:2775/6968  Loss: 4.292422533035278

Epoch:    1/3     Batch:2780/6968  Loss: 4.369127368927002

Epoch:    1/3     Batch:2785/6968  Loss: 4.333321571350098

Epoch:    1/3     Batch:2790/6968  Loss: 4.223998069763184

Epoch:    1/3     Batch:2795/6968  Loss: 4.498069953918457

Epoch:    1/3     Batch:2800/6968  Loss: 4.294446325302124

Epoch:    1/3     Batch:2805/6968  Loss: 4.407936668395996

Epoch:    1/3     Batch:2810/6968  Loss: 4.378002643585205

Epoch:    1/3     Batch:2815/6968  Loss: 4.4598517417907715

Epoch:    1/3     Batch:2820/6968  Loss: 4.437887382507324

Epoch:    1/3     Batch:2825/6968  Los

Epoch:    1/3     Batch:3430/6968  Loss: 4.187979030609131

Epoch:    1/3     Batch:3435/6968  Loss: 4.203513669967651

Epoch:    1/3     Batch:3440/6968  Loss: 4.205085706710816

Epoch:    1/3     Batch:3445/6968  Loss: 4.2782470226287845

Epoch:    1/3     Batch:3450/6968  Loss: 4.019217443466187

Epoch:    1/3     Batch:3455/6968  Loss: 4.20587854385376

Epoch:    1/3     Batch:3460/6968  Loss: 4.210698223114013

Epoch:    1/3     Batch:3465/6968  Loss: 4.2645604610443115

Epoch:    1/3     Batch:3470/6968  Loss: 4.126839971542358

Epoch:    1/3     Batch:3475/6968  Loss: 4.352223968505859

Epoch:    1/3     Batch:3480/6968  Loss: 4.360379695892334

Epoch:    1/3     Batch:3485/6968  Loss: 4.197786664962768

Epoch:    1/3     Batch:3490/6968  Loss: 4.182921075820923

Epoch:    1/3     Batch:3495/6968  Loss: 3.9223687648773193

Epoch:    1/3     Batch:3500/6968  Loss: 4.217592430114746

Epoch:    1/3     Batch:3505/6968  Loss: 4.239665746688843

Epoch:    1/3     Batch:3510/6968  Los

Epoch:    1/3     Batch:4115/6968  Loss: 4.070484161376953

Epoch:    1/3     Batch:4120/6968  Loss: 4.088151502609253

Epoch:    1/3     Batch:4125/6968  Loss: 4.015921211242675

Epoch:    1/3     Batch:4130/6968  Loss: 4.0457014560699465

Epoch:    1/3     Batch:4135/6968  Loss: 4.1323949813842775

Epoch:    1/3     Batch:4140/6968  Loss: 4.153937578201294

Epoch:    1/3     Batch:4145/6968  Loss: 4.1213866710662845

Epoch:    1/3     Batch:4150/6968  Loss: 4.004870891571045

Epoch:    1/3     Batch:4155/6968  Loss: 4.028061056137085

Epoch:    1/3     Batch:4160/6968  Loss: 4.36729998588562

Epoch:    1/3     Batch:4165/6968  Loss: 4.008653783798218

Epoch:    1/3     Batch:4170/6968  Loss: 4.302184200286865

Epoch:    1/3     Batch:4175/6968  Loss: 4.08613429069519

Epoch:    1/3     Batch:4180/6968  Loss: 4.163252830505371

Epoch:    1/3     Batch:4185/6968  Loss: 4.025714540481568

Epoch:    1/3     Batch:4190/6968  Loss: 4.413759708404541

Epoch:    1/3     Batch:4195/6968  Loss

Epoch:    1/3     Batch:4800/6968  Loss: 3.911201810836792

Epoch:    1/3     Batch:4805/6968  Loss: 3.9903016090393066

Epoch:    1/3     Batch:4810/6968  Loss: 4.167161226272583

Epoch:    1/3     Batch:4815/6968  Loss: 4.1560595512390135

Epoch:    1/3     Batch:4820/6968  Loss: 4.303090190887451

Epoch:    1/3     Batch:4825/6968  Loss: 3.889985466003418

Epoch:    1/3     Batch:4830/6968  Loss: 3.831309604644775

Epoch:    1/3     Batch:4835/6968  Loss: 4.134624195098877

Epoch:    1/3     Batch:4840/6968  Loss: 3.9535356044769285

Epoch:    1/3     Batch:4845/6968  Loss: 4.162876176834106

Epoch:    1/3     Batch:4850/6968  Loss: 4.373470163345337

Epoch:    1/3     Batch:4855/6968  Loss: 4.261623954772949

Epoch:    1/3     Batch:4860/6968  Loss: 4.271462154388428

Epoch:    1/3     Batch:4865/6968  Loss: 4.088104677200318

Epoch:    1/3     Batch:4870/6968  Loss: 3.9106927871704102

Epoch:    1/3     Batch:4875/6968  Loss: 4.260928440093994

Epoch:    1/3     Batch:4880/6968  L

Epoch:    1/3     Batch:5485/6968  Loss: 4.192758655548095

Epoch:    1/3     Batch:5490/6968  Loss: 4.1372185230255125

Epoch:    1/3     Batch:5495/6968  Loss: 4.169901180267334

Epoch:    1/3     Batch:5500/6968  Loss: 4.040326261520386

Epoch:    1/3     Batch:5505/6968  Loss: 4.114344739913941

Epoch:    1/3     Batch:5510/6968  Loss: 4.08459792137146

Epoch:    1/3     Batch:5515/6968  Loss: 3.975052833557129

Epoch:    1/3     Batch:5520/6968  Loss: 3.832414245605469

Epoch:    1/3     Batch:5525/6968  Loss: 4.078027677536011

Epoch:    1/3     Batch:5530/6968  Loss: 4.007351350784302

Epoch:    1/3     Batch:5535/6968  Loss: 3.9215002059936523

Epoch:    1/3     Batch:5540/6968  Loss: 4.392729473114014

Epoch:    1/3     Batch:5545/6968  Loss: 4.059539747238159

Epoch:    1/3     Batch:5550/6968  Loss: 3.8567872524261473

Epoch:    1/3     Batch:5555/6968  Loss: 4.105297613143921

Epoch:    1/3     Batch:5560/6968  Loss: 4.0794881820678714

Epoch:    1/3     Batch:5565/6968  Lo

Epoch:    1/3     Batch:6170/6968  Loss: 4.257771635055542

Epoch:    1/3     Batch:6175/6968  Loss: 4.157854461669922

Epoch:    1/3     Batch:6180/6968  Loss: 3.933509683609009

Epoch:    1/3     Batch:6185/6968  Loss: 4.160577011108399

Epoch:    1/3     Batch:6190/6968  Loss: 4.10240683555603

Epoch:    1/3     Batch:6195/6968  Loss: 4.289689016342163

Epoch:    1/3     Batch:6200/6968  Loss: 3.959814500808716

Epoch:    1/3     Batch:6205/6968  Loss: 4.054357957839966

Epoch:    1/3     Batch:6210/6968  Loss: 4.003375244140625

Epoch:    1/3     Batch:6215/6968  Loss: 3.9349711894989015

Epoch:    1/3     Batch:6220/6968  Loss: 3.915303945541382

Epoch:    1/3     Batch:6225/6968  Loss: 3.9115952491760253

Epoch:    1/3     Batch:6230/6968  Loss: 4.2053595066070555

Epoch:    1/3     Batch:6235/6968  Loss: 4.1535115242004395

Epoch:    1/3     Batch:6240/6968  Loss: 4.24842472076416

Epoch:    1/3     Batch:6245/6968  Loss: 4.125619888305664

Epoch:    1/3     Batch:6250/6968  Los

Epoch:    1/3     Batch:6855/6968  Loss: 4.091058921813965

Epoch:    1/3     Batch:6860/6968  Loss: 4.222443342208862

Epoch:    1/3     Batch:6865/6968  Loss: 4.026665925979614

Epoch:    1/3     Batch:6870/6968  Loss: 4.072339248657227

Epoch:    1/3     Batch:6875/6968  Loss: 4.0000901222229

Epoch:    1/3     Batch:6880/6968  Loss: 4.033051300048828

Epoch:    1/3     Batch:6885/6968  Loss: 3.8992119312286375

Epoch:    1/3     Batch:6890/6968  Loss: 4.136834716796875

Epoch:    1/3     Batch:6895/6968  Loss: 4.22398567199707

Epoch:    1/3     Batch:6900/6968  Loss: 3.8585317134857178

Epoch:    1/3     Batch:6905/6968  Loss: 4.190039110183716

Epoch:    1/3     Batch:6910/6968  Loss: 3.8949381828308107

Epoch:    1/3     Batch:6915/6968  Loss: 4.262881231307984

Epoch:    1/3     Batch:6920/6968  Loss: 3.9776817321777345

Epoch:    1/3     Batch:6925/6968  Loss: 3.873207855224609

Epoch:    1/3     Batch:6930/6968  Loss: 4.271067333221436

Epoch:    1/3     Batch:6935/6968  Loss

Epoch:    2/3     Batch: 570/6968  Loss: 3.964973878860474

Epoch:    2/3     Batch: 575/6968  Loss: 3.7770575046539308

Epoch:    2/3     Batch: 580/6968  Loss: 3.836196041107178

Epoch:    2/3     Batch: 585/6968  Loss: 3.8924279689788817

Epoch:    2/3     Batch: 590/6968  Loss: 3.7536315441131594

Epoch:    2/3     Batch: 595/6968  Loss: 4.139185953140259

Epoch:    2/3     Batch: 600/6968  Loss: 3.745117998123169

Epoch:    2/3     Batch: 605/6968  Loss: 3.598445701599121

Epoch:    2/3     Batch: 610/6968  Loss: 3.7197007656097414

Epoch:    2/3     Batch: 615/6968  Loss: 3.770541524887085

Epoch:    2/3     Batch: 620/6968  Loss: 3.79781231880188

Epoch:    2/3     Batch: 625/6968  Loss: 3.8513878345489503

Epoch:    2/3     Batch: 630/6968  Loss: 3.7037079334259033

Epoch:    2/3     Batch: 635/6968  Loss: 3.8080775260925295

Epoch:    2/3     Batch: 640/6968  Loss: 3.7520079135894777

Epoch:    2/3     Batch: 645/6968  Loss: 3.861815929412842

Epoch:    2/3     Batch: 650/6968

Epoch:    2/3     Batch:1250/6968  Loss: 4.037220191955567

Epoch:    2/3     Batch:1255/6968  Loss: 3.900098705291748

Epoch:    2/3     Batch:1260/6968  Loss: 3.957407093048096

Epoch:    2/3     Batch:1265/6968  Loss: 3.612749767303467

Epoch:    2/3     Batch:1270/6968  Loss: 4.038437128067017

Epoch:    2/3     Batch:1275/6968  Loss: 3.7147902488708495

Epoch:    2/3     Batch:1280/6968  Loss: 3.761898946762085

Epoch:    2/3     Batch:1285/6968  Loss: 3.9742703437805176

Epoch:    2/3     Batch:1290/6968  Loss: 3.861082649230957

Epoch:    2/3     Batch:1295/6968  Loss: 3.8209362506866453

Epoch:    2/3     Batch:1300/6968  Loss: 3.938722610473633

Epoch:    2/3     Batch:1305/6968  Loss: 3.936494064331055

Epoch:    2/3     Batch:1310/6968  Loss: 3.8018337726593017

Epoch:    2/3     Batch:1315/6968  Loss: 3.7577184200286866

Epoch:    2/3     Batch:1320/6968  Loss: 3.7305209159851076

Epoch:    2/3     Batch:1325/6968  Loss: 3.8598602294921873

Epoch:    2/3     Batch:1330/6968

Epoch:    2/3     Batch:1930/6968  Loss: 3.9867280960083007

Epoch:    2/3     Batch:1935/6968  Loss: 3.81814923286438

Epoch:    2/3     Batch:1940/6968  Loss: 3.737736368179321

Epoch:    2/3     Batch:1945/6968  Loss: 4.023219013214112

Epoch:    2/3     Batch:1950/6968  Loss: 4.012900066375733

Epoch:    2/3     Batch:1955/6968  Loss: 3.8611116886138914

Epoch:    2/3     Batch:1960/6968  Loss: 3.6398823738098143

Epoch:    2/3     Batch:1965/6968  Loss: 3.7452325344085695

Epoch:    2/3     Batch:1970/6968  Loss: 3.874494791030884

Epoch:    2/3     Batch:1975/6968  Loss: 4.0256494045257565

Epoch:    2/3     Batch:1980/6968  Loss: 3.95802526473999

Epoch:    2/3     Batch:1985/6968  Loss: 3.741858434677124

Epoch:    2/3     Batch:1990/6968  Loss: 3.785379409790039

Epoch:    2/3     Batch:1995/6968  Loss: 3.717906618118286

Epoch:    2/3     Batch:2000/6968  Loss: 3.7878899574279785

Epoch:    2/3     Batch:2005/6968  Loss: 3.747963285446167

Epoch:    2/3     Batch:2010/6968  L

Epoch:    2/3     Batch:2610/6968  Loss: 3.631416749954224

Epoch:    2/3     Batch:2615/6968  Loss: 3.7482368469238283

Epoch:    2/3     Batch:2620/6968  Loss: 3.9918766975402833

Epoch:    2/3     Batch:2625/6968  Loss: 3.8989760875701904

Epoch:    2/3     Batch:2630/6968  Loss: 3.922643852233887

Epoch:    2/3     Batch:2635/6968  Loss: 4.180158996582032

Epoch:    2/3     Batch:2640/6968  Loss: 3.882936191558838

Epoch:    2/3     Batch:2645/6968  Loss: 3.6973315715789794

Epoch:    2/3     Batch:2650/6968  Loss: 4.104988861083984

Epoch:    2/3     Batch:2655/6968  Loss: 3.8025761604309083

Epoch:    2/3     Batch:2660/6968  Loss: 3.694377374649048

Epoch:    2/3     Batch:2665/6968  Loss: 3.6665382385253906

Epoch:    2/3     Batch:2670/6968  Loss: 3.9422879219055176

Epoch:    2/3     Batch:2675/6968  Loss: 3.671692228317261

Epoch:    2/3     Batch:2680/6968  Loss: 3.776192331314087

Epoch:    2/3     Batch:2685/6968  Loss: 3.845983076095581

Epoch:    2/3     Batch:2690/6968

Epoch:    2/3     Batch:3290/6968  Loss: 3.7507766246795655

Epoch:    2/3     Batch:3295/6968  Loss: 3.9559173583984375

Epoch:    2/3     Batch:3300/6968  Loss: 3.9949387550354003

Epoch:    2/3     Batch:3305/6968  Loss: 3.5590884685516357

Epoch:    2/3     Batch:3310/6968  Loss: 3.736787939071655

Epoch:    2/3     Batch:3315/6968  Loss: 4.108086442947387

Epoch:    2/3     Batch:3320/6968  Loss: 3.870104694366455

Epoch:    2/3     Batch:3325/6968  Loss: 3.9322906017303465

Epoch:    2/3     Batch:3330/6968  Loss: 3.631488037109375

Epoch:    2/3     Batch:3335/6968  Loss: 4.24185733795166

Epoch:    2/3     Batch:3340/6968  Loss: 3.92414927482605

Epoch:    2/3     Batch:3345/6968  Loss: 3.7884681701660154

Epoch:    2/3     Batch:3350/6968  Loss: 3.7977845668792725

Epoch:    2/3     Batch:3355/6968  Loss: 3.9120495319366455

Epoch:    2/3     Batch:3360/6968  Loss: 3.6455427169799806

Epoch:    2/3     Batch:3365/6968  Loss: 3.755256414413452

Epoch:    2/3     Batch:3370/6968

Epoch:    2/3     Batch:3970/6968  Loss: 3.967330503463745

Epoch:    2/3     Batch:3975/6968  Loss: 3.7241886615753175

Epoch:    2/3     Batch:3980/6968  Loss: 3.9308534145355223

Epoch:    2/3     Batch:3985/6968  Loss: 3.595307159423828

Epoch:    2/3     Batch:3990/6968  Loss: 3.6305511474609373

Epoch:    2/3     Batch:3995/6968  Loss: 4.061184740066528

Epoch:    2/3     Batch:4000/6968  Loss: 3.6942914485931397

Epoch:    2/3     Batch:4005/6968  Loss: 3.7456281185150146

Epoch:    2/3     Batch:4010/6968  Loss: 3.801266050338745

Epoch:    2/3     Batch:4015/6968  Loss: 3.89742226600647

Epoch:    2/3     Batch:4020/6968  Loss: 3.8126247406005858

Epoch:    2/3     Batch:4025/6968  Loss: 3.8139456272125245

Epoch:    2/3     Batch:4030/6968  Loss: 3.7419179916381835

Epoch:    2/3     Batch:4035/6968  Loss: 3.875190019607544

Epoch:    2/3     Batch:4040/6968  Loss: 3.7667833805084228

Epoch:    2/3     Batch:4045/6968  Loss: 3.6570215225219727

Epoch:    2/3     Batch:4050/69

Epoch:    2/3     Batch:4650/6968  Loss: 3.6047362804412844

Epoch:    2/3     Batch:4655/6968  Loss: 3.660362720489502

Epoch:    2/3     Batch:4660/6968  Loss: 3.752304935455322

Epoch:    2/3     Batch:4665/6968  Loss: 3.664357900619507

Epoch:    2/3     Batch:4670/6968  Loss: 3.8813587188720704

Epoch:    2/3     Batch:4675/6968  Loss: 3.8313999652862547

Epoch:    2/3     Batch:4680/6968  Loss: 3.7130356788635255

Epoch:    2/3     Batch:4685/6968  Loss: 3.8269375801086425

Epoch:    2/3     Batch:4690/6968  Loss: 3.881352758407593

Epoch:    2/3     Batch:4695/6968  Loss: 3.701217222213745

Epoch:    2/3     Batch:4700/6968  Loss: 3.822355318069458

Epoch:    2/3     Batch:4705/6968  Loss: 3.793835926055908

Epoch:    2/3     Batch:4710/6968  Loss: 3.606548309326172

Epoch:    2/3     Batch:4715/6968  Loss: 3.971988153457642

Epoch:    2/3     Batch:4720/6968  Loss: 3.7730435848236086

Epoch:    2/3     Batch:4725/6968  Loss: 3.9121283531188964

Epoch:    2/3     Batch:4730/6968

Epoch:    2/3     Batch:5330/6968  Loss: 3.68320255279541

Epoch:    2/3     Batch:5335/6968  Loss: 3.8930083751678466

Epoch:    2/3     Batch:5340/6968  Loss: 3.7782255172729493

Epoch:    2/3     Batch:5345/6968  Loss: 3.900032472610474

Epoch:    2/3     Batch:5350/6968  Loss: 3.8756505489349364

Epoch:    2/3     Batch:5355/6968  Loss: 3.7825984001159667

Epoch:    2/3     Batch:5360/6968  Loss: 3.9927688121795653

Epoch:    2/3     Batch:5365/6968  Loss: 3.511931324005127

Epoch:    2/3     Batch:5370/6968  Loss: 3.9027687072753907

Epoch:    2/3     Batch:5375/6968  Loss: 3.802293300628662

Epoch:    2/3     Batch:5380/6968  Loss: 4.010817909240723

Epoch:    2/3     Batch:5385/6968  Loss: 3.80305438041687

Epoch:    2/3     Batch:5390/6968  Loss: 3.5982384204864504

Epoch:    2/3     Batch:5395/6968  Loss: 3.8440289974212645

Epoch:    2/3     Batch:5400/6968  Loss: 3.7002503871917725

Epoch:    2/3     Batch:5405/6968  Loss: 3.5877443313598634

Epoch:    2/3     Batch:5410/696

Epoch:    2/3     Batch:6010/6968  Loss: 3.6278922080993654

Epoch:    2/3     Batch:6015/6968  Loss: 4.014967441558838

Epoch:    2/3     Batch:6020/6968  Loss: 3.733966112136841

Epoch:    2/3     Batch:6025/6968  Loss: 3.9373778820037844

Epoch:    2/3     Batch:6030/6968  Loss: 3.761404514312744

Epoch:    2/3     Batch:6035/6968  Loss: 3.9901735305786135

Epoch:    2/3     Batch:6040/6968  Loss: 3.6298941135406495

Epoch:    2/3     Batch:6045/6968  Loss: 3.5523863792419434

Epoch:    2/3     Batch:6050/6968  Loss: 3.70329909324646

Epoch:    2/3     Batch:6055/6968  Loss: 3.6428714275360106

Epoch:    2/3     Batch:6060/6968  Loss: 3.8667835712432863

Epoch:    2/3     Batch:6065/6968  Loss: 3.6544984340667725

Epoch:    2/3     Batch:6070/6968  Loss: 3.7299103260040285

Epoch:    2/3     Batch:6075/6968  Loss: 3.5699492931365966

Epoch:    2/3     Batch:6080/6968  Loss: 3.7758718013763426

Epoch:    2/3     Batch:6085/6968  Loss: 3.976449489593506

Epoch:    2/3     Batch:6090/6

Epoch:    2/3     Batch:6690/6968  Loss: 3.879372262954712

Epoch:    2/3     Batch:6695/6968  Loss: 3.6817486763000487

Epoch:    2/3     Batch:6700/6968  Loss: 4.009530019760132

Epoch:    2/3     Batch:6705/6968  Loss: 3.708557891845703

Epoch:    2/3     Batch:6710/6968  Loss: 3.997045660018921

Epoch:    2/3     Batch:6715/6968  Loss: 3.9972421646118166

Epoch:    2/3     Batch:6720/6968  Loss: 3.8048273086547852

Epoch:    2/3     Batch:6725/6968  Loss: 3.677508735656738

Epoch:    2/3     Batch:6730/6968  Loss: 3.6277017116546633

Epoch:    2/3     Batch:6735/6968  Loss: 3.896971035003662

Epoch:    2/3     Batch:6740/6968  Loss: 3.6659164905548094

Epoch:    2/3     Batch:6745/6968  Loss: 3.8291818618774416

Epoch:    2/3     Batch:6750/6968  Loss: 3.8095068454742433

Epoch:    2/3     Batch:6755/6968  Loss: 3.876169443130493

Epoch:    2/3     Batch:6760/6968  Loss: 3.591035270690918

Epoch:    2/3     Batch:6765/6968  Loss: 3.743227815628052

Epoch:    2/3     Batch:6770/6968

Epoch:    3/3     Batch: 405/6968  Loss: 3.4857606410980226

Epoch:    3/3     Batch: 410/6968  Loss: 3.483559799194336

Epoch:    3/3     Batch: 415/6968  Loss: 3.4021045207977294

Epoch:    3/3     Batch: 420/6968  Loss: 3.574001693725586

Epoch:    3/3     Batch: 425/6968  Loss: 3.499673271179199

Epoch:    3/3     Batch: 430/6968  Loss: 3.60718994140625

Epoch:    3/3     Batch: 435/6968  Loss: 3.623507595062256

Epoch:    3/3     Batch: 440/6968  Loss: 3.306043529510498

Epoch:    3/3     Batch: 445/6968  Loss: 3.6270862102508543

Epoch:    3/3     Batch: 450/6968  Loss: 3.4511202335357667

Epoch:    3/3     Batch: 455/6968  Loss: 3.529038667678833

Epoch:    3/3     Batch: 460/6968  Loss: 3.6599810123443604

Epoch:    3/3     Batch: 465/6968  Loss: 3.7152843952178953

Epoch:    3/3     Batch: 470/6968  Loss: 3.3022836685180663

Epoch:    3/3     Batch: 475/6968  Loss: 3.4970457553863525

Epoch:    3/3     Batch: 480/6968  Loss: 3.2659651756286623

Epoch:    3/3     Batch: 485/696

Epoch:    3/3     Batch:1085/6968  Loss: 3.8552395820617678

Epoch:    3/3     Batch:1090/6968  Loss: 3.683876323699951

Epoch:    3/3     Batch:1095/6968  Loss: 3.578583526611328

Epoch:    3/3     Batch:1100/6968  Loss: 3.4680970668792725

Epoch:    3/3     Batch:1105/6968  Loss: 3.6101277351379393

Epoch:    3/3     Batch:1110/6968  Loss: 3.6279091835021973

Epoch:    3/3     Batch:1115/6968  Loss: 3.6733822345733644

Epoch:    3/3     Batch:1120/6968  Loss: 3.7929718494415283

Epoch:    3/3     Batch:1125/6968  Loss: 3.6539440155029297

Epoch:    3/3     Batch:1130/6968  Loss: 3.4479439735412596

Epoch:    3/3     Batch:1135/6968  Loss: 3.6210196018218994

Epoch:    3/3     Batch:1140/6968  Loss: 3.5091880798339843

Epoch:    3/3     Batch:1145/6968  Loss: 3.499842977523804

Epoch:    3/3     Batch:1150/6968  Loss: 3.5072057247161865

Epoch:    3/3     Batch:1155/6968  Loss: 3.5642138004302977

Epoch:    3/3     Batch:1160/6968  Loss: 3.475923442840576

Epoch:    3/3     Batch:1165

Epoch:    3/3     Batch:1765/6968  Loss: 3.42406644821167

Epoch:    3/3     Batch:1770/6968  Loss: 3.5736614227294923

Epoch:    3/3     Batch:1775/6968  Loss: 3.6834042072296143

Epoch:    3/3     Batch:1780/6968  Loss: 3.5977049350738524

Epoch:    3/3     Batch:1785/6968  Loss: 3.7201319217681883

Epoch:    3/3     Batch:1790/6968  Loss: 3.644954967498779

Epoch:    3/3     Batch:1795/6968  Loss: 3.6896531105041506

Epoch:    3/3     Batch:1800/6968  Loss: 3.4496826171875

Epoch:    3/3     Batch:1805/6968  Loss: 3.5507387638092043

Epoch:    3/3     Batch:1810/6968  Loss: 3.5638923168182375

Epoch:    3/3     Batch:1815/6968  Loss: 3.6865937232971193

Epoch:    3/3     Batch:1820/6968  Loss: 3.7124523162841796

Epoch:    3/3     Batch:1825/6968  Loss: 3.502395820617676

Epoch:    3/3     Batch:1830/6968  Loss: 3.5047096729278566

Epoch:    3/3     Batch:1835/6968  Loss: 3.6303123950958254

Epoch:    3/3     Batch:1840/6968  Loss: 3.424153184890747

Epoch:    3/3     Batch:1845/696

Epoch:    3/3     Batch:2445/6968  Loss: 3.561333990097046

Epoch:    3/3     Batch:2450/6968  Loss: 3.708781623840332

Epoch:    3/3     Batch:2455/6968  Loss: 3.572990131378174

Epoch:    3/3     Batch:2460/6968  Loss: 3.583543872833252

Epoch:    3/3     Batch:2465/6968  Loss: 3.46852126121521

Epoch:    3/3     Batch:2470/6968  Loss: 3.619842290878296

Epoch:    3/3     Batch:2475/6968  Loss: 3.5817803859710695

Epoch:    3/3     Batch:2480/6968  Loss: 3.464384126663208

Epoch:    3/3     Batch:2485/6968  Loss: 3.58955340385437

Epoch:    3/3     Batch:2490/6968  Loss: 3.4387584209442137

Epoch:    3/3     Batch:2495/6968  Loss: 3.5356839656829835

Epoch:    3/3     Batch:2500/6968  Loss: 3.467518186569214

Epoch:    3/3     Batch:2505/6968  Loss: 3.7810497760772703

Epoch:    3/3     Batch:2510/6968  Loss: 3.7282032012939452

Epoch:    3/3     Batch:2515/6968  Loss: 3.728763246536255

Epoch:    3/3     Batch:2520/6968  Loss: 3.5388291835784913

Epoch:    3/3     Batch:2525/6968  L

Epoch:    3/3     Batch:3125/6968  Loss: 3.695259857177734

Epoch:    3/3     Batch:3130/6968  Loss: 3.6842463493347166

Epoch:    3/3     Batch:3135/6968  Loss: 3.7317223072052004

Epoch:    3/3     Batch:3140/6968  Loss: 3.4611533164978026

Epoch:    3/3     Batch:3145/6968  Loss: 3.4972107887268065

Epoch:    3/3     Batch:3150/6968  Loss: 3.2903751850128176

Epoch:    3/3     Batch:3155/6968  Loss: 3.439926338195801

Epoch:    3/3     Batch:3160/6968  Loss: 3.8264657974243166

Epoch:    3/3     Batch:3165/6968  Loss: 3.7286198139190674

Epoch:    3/3     Batch:3170/6968  Loss: 3.623403549194336

Epoch:    3/3     Batch:3175/6968  Loss: 3.536606693267822

Epoch:    3/3     Batch:3180/6968  Loss: 3.5809744358062745

Epoch:    3/3     Batch:3185/6968  Loss: 3.6067673206329345

Epoch:    3/3     Batch:3190/6968  Loss: 3.6150390625

Epoch:    3/3     Batch:3195/6968  Loss: 3.703561544418335

Epoch:    3/3     Batch:3200/6968  Loss: 3.7529109001159666

Epoch:    3/3     Batch:3205/6968  

Epoch:    3/3     Batch:3805/6968  Loss: 3.516105794906616

Epoch:    3/3     Batch:3810/6968  Loss: 3.5615859508514403

Epoch:    3/3     Batch:3815/6968  Loss: 3.622150993347168

Epoch:    3/3     Batch:3820/6968  Loss: 3.4164969444274904

Epoch:    3/3     Batch:3825/6968  Loss: 3.7825687408447264

Epoch:    3/3     Batch:3830/6968  Loss: 3.4014232635498045

Epoch:    3/3     Batch:3835/6968  Loss: 3.5481127738952636

Epoch:    3/3     Batch:3840/6968  Loss: 3.8250300884246826

Epoch:    3/3     Batch:3845/6968  Loss: 3.5524948120117186

Epoch:    3/3     Batch:3850/6968  Loss: 3.718459367752075

Epoch:    3/3     Batch:3855/6968  Loss: 3.4620334625244142

Epoch:    3/3     Batch:3860/6968  Loss: 3.725214433670044

Epoch:    3/3     Batch:3865/6968  Loss: 3.649008560180664

Epoch:    3/3     Batch:3870/6968  Loss: 3.6269750595092773

Epoch:    3/3     Batch:3875/6968  Loss: 3.4563097953796387

Epoch:    3/3     Batch:3880/6968  Loss: 3.754635953903198

Epoch:    3/3     Batch:3885/6

Epoch:    3/3     Batch:4485/6968  Loss: 3.7091485023498536

Epoch:    3/3     Batch:4490/6968  Loss: 3.529189205169678

Epoch:    3/3     Batch:4495/6968  Loss: 3.7070430278778077

Epoch:    3/3     Batch:4500/6968  Loss: 3.6006863594055174

Epoch:    3/3     Batch:4505/6968  Loss: 3.5809555530548094

Epoch:    3/3     Batch:4510/6968  Loss: 3.5379905223846437

Epoch:    3/3     Batch:4515/6968  Loss: 3.5710799217224123

Epoch:    3/3     Batch:4520/6968  Loss: 3.5275697231292726

Epoch:    3/3     Batch:4525/6968  Loss: 3.7378358364105226

Epoch:    3/3     Batch:4530/6968  Loss: 3.5000651359558104

Epoch:    3/3     Batch:4535/6968  Loss: 3.4865933418273927

Epoch:    3/3     Batch:4540/6968  Loss: 3.5203099727630613

Epoch:    3/3     Batch:4545/6968  Loss: 3.5060737133026123

Epoch:    3/3     Batch:4550/6968  Loss: 3.568225383758545

Epoch:    3/3     Batch:4555/6968  Loss: 3.57031135559082

Epoch:    3/3     Batch:4560/6968  Loss: 3.377719593048096

Epoch:    3/3     Batch:4565/

Epoch:    3/3     Batch:5165/6968  Loss: 3.6839279174804687

Epoch:    3/3     Batch:5170/6968  Loss: 3.3830214500427247

Epoch:    3/3     Batch:5175/6968  Loss: 3.606970024108887

Epoch:    3/3     Batch:5180/6968  Loss: 3.6893571853637694

Epoch:    3/3     Batch:5185/6968  Loss: 3.4656761646270753

Epoch:    3/3     Batch:5190/6968  Loss: 3.4868828296661376

Epoch:    3/3     Batch:5195/6968  Loss: 3.404055452346802

Epoch:    3/3     Batch:5200/6968  Loss: 3.4418720245361327

Epoch:    3/3     Batch:5205/6968  Loss: 3.6551002979278566

Epoch:    3/3     Batch:5210/6968  Loss: 3.5998230934143067

Epoch:    3/3     Batch:5215/6968  Loss: 3.57011022567749

Epoch:    3/3     Batch:5220/6968  Loss: 3.616094636917114

Epoch:    3/3     Batch:5225/6968  Loss: 3.7826910495758055

Epoch:    3/3     Batch:5230/6968  Loss: 3.623262310028076

Epoch:    3/3     Batch:5235/6968  Loss: 3.6759397983551025

Epoch:    3/3     Batch:5240/6968  Loss: 3.655489778518677

Epoch:    3/3     Batch:5245/69

Epoch:    3/3     Batch:5845/6968  Loss: 3.7006557464599608

Epoch:    3/3     Batch:5850/6968  Loss: 3.5943041324615477

Epoch:    3/3     Batch:5855/6968  Loss: 3.503140115737915

Epoch:    3/3     Batch:5860/6968  Loss: 3.591386604309082

Epoch:    3/3     Batch:5865/6968  Loss: 3.5006507396698

Epoch:    3/3     Batch:5870/6968  Loss: 3.637872409820557

Epoch:    3/3     Batch:5875/6968  Loss: 3.707839012145996

Epoch:    3/3     Batch:5880/6968  Loss: 3.2679510593414305

Epoch:    3/3     Batch:5885/6968  Loss: 3.676378917694092

Epoch:    3/3     Batch:5890/6968  Loss: 3.6293718338012697

Epoch:    3/3     Batch:5895/6968  Loss: 3.4220908164978026

Epoch:    3/3     Batch:5900/6968  Loss: 3.510081100463867

Epoch:    3/3     Batch:5905/6968  Loss: 3.6900959491729735

Epoch:    3/3     Batch:5910/6968  Loss: 3.499337100982666

Epoch:    3/3     Batch:5915/6968  Loss: 3.4556767463684084

Epoch:    3/3     Batch:5920/6968  Loss: 3.727239418029785

Epoch:    3/3     Batch:5925/6968  

Epoch:    3/3     Batch:6525/6968  Loss: 3.5829555988311768

Epoch:    3/3     Batch:6530/6968  Loss: 3.530452919006348

Epoch:    3/3     Batch:6535/6968  Loss: 3.5610610485076903

Epoch:    3/3     Batch:6540/6968  Loss: 3.698462200164795

Epoch:    3/3     Batch:6545/6968  Loss: 3.5773062705993652

Epoch:    3/3     Batch:6550/6968  Loss: 3.756347179412842

Epoch:    3/3     Batch:6555/6968  Loss: 3.4952016353607176

Epoch:    3/3     Batch:6560/6968  Loss: 3.44041166305542

Epoch:    3/3     Batch:6565/6968  Loss: 3.753227710723877

Epoch:    3/3     Batch:6570/6968  Loss: 3.7005665779113768

Epoch:    3/3     Batch:6575/6968  Loss: 3.5332562923431396

Epoch:    3/3     Batch:6580/6968  Loss: 3.559890699386597

Epoch:    3/3     Batch:6585/6968  Loss: 3.3737367153167725

Epoch:    3/3     Batch:6590/6968  Loss: 3.5647215366363527

Epoch:    3/3     Batch:6595/6968  Loss: 3.700952100753784

Epoch:    3/3     Batch:6600/6968  Loss: 3.232957458496094

Epoch:    3/3     Batch:6605/6968

  "type " + obj.__name__ + ". It won't be checked "


### 问题: 你如何决定你的模型超参数？
比如，你是否试过不同的 different sequence_lengths 并发现哪个使得模型的收敛速度变化？那你的隐藏层数和层数呢？你是如何决定使用这个网络参数的？

**答案:** (在这里写下)    
<font color=red>1.sequence_length最后选择为131，因为句子长度为:110到892,公因数为30, 131, 227，取其中131;</font>[How to Generate Music using a LSTM Neural Network in Keras](https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5)   
<font color=red>2.隐藏层数设为300，层数设为512,根据推荐论文选择</font>[A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering](http://www.aclweb.org/anthology/P15-2116)

---
# 检查点

通过运行上面的训练单元，你的模型已经以`trained_rnn`名字存储，如果你存储了你的notebook， **你可以在之后的任何时间来访问你的代码和结果**. 下述代码可以帮助你重载你的结果!

In [None]:
import os

os.path.splitext(os.path.basename('./save/trained_rnn'))[0] + '.pt'

In [24]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch
import helper
import problem_unittests as tests

from workspace_utils import active_session

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
with active_session():
    trained_rnn = helper.load_model('./save/trained_rnn')
    
trained_rnn

ConnectionError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/attributes/keep_alive_token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000018055780>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

## 生成电视剧剧本
你现在可以生成你的“假”电视剧剧本啦！

### 生成文字
你的神经网络会不断重复生成一个单词，直到生成满足你要求长度的剧本。使用 `generate` 函数来完成上述操作。首先，使用 `prime_id` 来生成word id，之后确定生成文本长度 `predict_len`。同时， topk 采样来引入文字选择的随机性!

In [19]:
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generate text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    rnn.eval()
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        output = output.cuda()
        
        # get the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
         
        # use top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # select the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # retrieve that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        # the generated word becomes the next "current sequence" and the cycle can continue
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

### 生成一个新剧本
是时候生成一个剧本啦。设置`gen_length` 剧本长度，设置 `prime_word`为以下任意词来开始生成吧:
- "jerry"
- "elaine"
- "george"
- "kramer"

你可以把prime word 设置成 _任意 _ 单词, 但是使用名字开始会比较好(任何其他名字也是可以哒!)

In [20]:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'jerry' # name for starting the script

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

#### 存下你最爱的片段

一旦你发现一段有趣或者好玩的片段，就把它存下啦！

In [None]:
# save script to a text file
with active_session():
    f =  open("generated_script_1.txt","w")
    f.write(generated_script)
    f.close()

# 这个电视剧剧本是无意义的
如果你的电视剧剧本不是很有逻辑也是ok的。下面是一个例子。

### 生成剧本案例

>jerry: what about me?
>
>jerry: i don't have to wait.
>
>kramer:(to the sales table)
>
>elaine:(to jerry) hey, look at this, i'm a good doctor.
>
>newman:(to elaine) you think i have no idea of this...
>
>elaine: oh, you better take the phone, and he was a little nervous.
>
>kramer:(to the phone) hey, hey, jerry, i don't want to be a little bit.(to kramer and jerry) you can't.
>
>jerry: oh, yeah. i don't even know, i know.
>
>jerry:(to the phone) oh, i know.
>
>kramer:(laughing) you know...(to jerry) you don't know.


如果这个电视剧剧本毫无意义，那也没有关系。我们的训练文本不到一兆字节。为了获得更好的结果，你需要使用更小的词汇范围或是更多数据。幸运的是，我们的确拥有更多数据！在本项目开始之初我们也曾提过，这是[另一个数据集](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data)的子集。我们并没有让你基于所有数据进行训练，因为这将耗费大量时间。然而，你可以随意使用这些数据训练你的神经网络。当然，是在完成本项目之后。
# 提交项目
在提交项目时，请确保你在保存 notebook 前运行了所有的单元格代码。请将 notebook 文件保存为 "dlnd_tv_script_generation.ipynb"，并将它作为 HTML 文件保存在 "File" -> "Download as" 中。请将 "helper.py" 和 "problem_unittests.py" 文件一并打包成 zip 文件提交。

$$\;$$