Finish generating chinese poetry #439

will-am · 2017-11-09T03:42:37Z

Resolve #334

lcy-seso · 2017-11-13T01:07:14Z

generate_chinese_poetry/data/dict.txt

@@ -1,12 +1,6 @@
-<<<<<<< HEAD
 <s>
 <e>
 <unk>


不要把字典放在github上面，这个字典可以通过脚本来自动构建。

lcy-seso · 2017-11-13T01:09:02Z

generate_chinese_poetry/README.md

+# 中国古诗生成
+
+## 简介
+基于编码器-解码器(encoder-decoder)神经网络模型，利用全唐诗进行诗句-诗句(sequence to sequence)训练，实现给定诗句后，生成下一诗句。


这里用一两句话描述一下默认的网络结构信息，例如默认几层LSTM encoder/decoder，是否带attention。

已经更新README，增加了简要描述

lcy-seso · 2017-11-13T01:10:44Z

generate_chinese_poetry/README.md

+python preprocess.py --datadir data/raw --outfile data/poems.txt --dictfile data/dict.txt
+```
+
+上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息，分为三列，分别为题目、作者、诗内容。


数据字典 --> 字典。

默认情况下，字典如何构建？分词/分字？字频率统计，默认截断频率是多少，提供一些基本的信息。

已经更新README，增加了字典构建的描述

lcy-seso · 2017-11-13T01:13:04Z

generate_chinese_poetry/README.md

+```
+
+上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息，分为三列，分别为题目、作者、诗内容。
+在诗内容中，诗句之间用`.`分隔。


"." 分隔之后，训练数据的构造策略是什么？谁是源谁是目标？请解释一下数据策略。

已经更新README，增加了数据构建的简要描述

lcy-seso · 2017-11-13T01:19:13Z

generate_chinese_poetry/README.md

+                         [required]
+  --use_gpu TEXT         Whether to use GPU in generation.
+  --help                 Show this message and exit.
+```


104 ~ 115 行删去。原因同上。

脚本 generate.py 的详细命令行参数请通过执行 python generate.py --help进行查阅。这里对重要参数进行说明。（后面如果需要说明请使用中文。）

lcy-seso · 2017-11-13T01:22:05Z

generate_chinese_poetry/README.md

+  --init_model_path TEXT   The path of a trained model used to initialized all
+                           the model parameters.
+  --help                   Show this message and exit.
+```


48 ~ 64 行删去。其它例子后面会考虑各自进行修改。

这个命令行参数只是直接复制粘贴了python train.py --help 的运行结果，并没有提供比这个更多的信息，如果需要，用户可以自行执行脚本查看。只需要在README中提醒用户查看即可。

直接复制粘贴也会让代码修改情况下，这里需要同步，增加工作量。

lcy-seso · 2017-11-13T01:23:21Z

generate_chinese_poetry/README.md

+- `use_gpu`: 是否使用GPU
+
+### 执行生成
+例如将诗句 `白日依山盡，黃河入海流` 保存在文件 `input.txt` 中作为预测下句诗的输入，执行命令：


不要这样构造源和目标。
源：“白日依山尽” --> 目标："黄河如海流"。

已修改构建方法，并重新训练了模型，根据模型训练效果调整了默认训练参数，更新了例子

lcy-seso · 2017-11-14T09:52:19Z

generate_chinese_poetry/preprocess.py

+                paragraphs = filter(lambda x: len(x), paragraphs)
+                if len(paragraphs) > 1:
+                    dataset.append((title, author, paragraphs))
+    print("Finished...")


这里的注释删掉。如果要保留，请print有意义的信息，例如：什么 finished?

lcy-seso · 2017-11-14T09:52:38Z

generate_chinese_poetry/preprocess.py

+                    dataset.append((title, author, paragraphs))
+    print("Finished...")
+
+    print("Constructing vocabularies...")


Constructing --> Construct.

lcy-seso · 2017-11-14T09:52:59Z

generate_chinese_poetry/preprocess.py

+            author = data[1]
+            paragraphs = ".".join(data[2])
+            f.write("\t".join((title, author, paragraphs)) + "\n")
+    print("Finished...")


这里的注释删掉。如果要保留，请print有意义的信息，例如：什么 finished?

lcy-seso · 2017-11-14T09:53:04Z

generate_chinese_poetry/preprocess.py

+    with io.open(dictfile, "w", encoding="utf8") as f:
+        for v in vocab:
+            f.write(v + "\n")
+    print("Finished...")


这里的注释删掉。如果要保留，请print有意义的信息，例如：什么 finished?

lcy-seso · 2017-11-14T09:53:35Z

generate_chinese_poetry/preprocess.py

+            f.write(v + "\n")
+    print("Finished...")
+
+    print("Writing processed data...")


Writing --> Write

lcy-seso

LGTM

will-am added 3 commits November 8, 2017 19:28

Implement train data generation and preprocess for chinese poetry

5c886ce

Add README doc for generate_chinese_poetry

d0bab0b

Merge remote-tracking branch 'upstream/develop' into chinese_poetry

7943732

will-am requested a review from lcy-seso November 9, 2017 03:42

will-am added 4 commits November 10, 2017 20:25

Refine parameters for training chinese poetry generation

16c4afe

Merge remote-tracking branch 'upstream/develop' into chinese_poetry

dbff6d6

Fix README for generating chinese poetry

9c677e6

Fix typo in generating chinese poetry

7740e7c

lcy-seso reviewed Nov 13, 2017

View reviewed changes

will-am added 5 commits November 14, 2017 00:34

Change the split level of poem sentences

5ba2f99

Merge remote-tracking branch 'upstream/develop' into chinese_poetry

2ecc858

Update README for generating chinese poetry

f21f1ae

Merge remote-tracking branch 'upstream/develop' into chinese_poetry

4456dab

Update README for generating chinese poetry

3614e6c

lcy-seso reviewed Nov 14, 2017

View reviewed changes

will-am added 2 commits November 14, 2017 19:17

Remove unnecessary log info

1cd0bc8

Merge remote-tracking branch 'upstream/develop' into chinese_poetry

3bbe91d

lcy-seso approved these changes Nov 20, 2017

View reviewed changes

lcy-seso merged commit ede5a04 into PaddlePaddle:develop Nov 20, 2017

will-am deleted the chinese_poetry branch November 20, 2017 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish generating chinese poetry #439

Finish generating chinese poetry #439

will-am commented Nov 9, 2017

lcy-seso Nov 13, 2017

lcy-seso Nov 13, 2017

will-am Nov 14, 2017

lcy-seso Nov 13, 2017

will-am Nov 14, 2017

lcy-seso Nov 13, 2017

will-am Nov 14, 2017

lcy-seso Nov 13, 2017

lcy-seso Nov 13, 2017

lcy-seso Nov 13, 2017

will-am Nov 14, 2017

lcy-seso Nov 14, 2017

will-am Nov 14, 2017

lcy-seso Nov 14, 2017

will-am Nov 14, 2017

lcy-seso Nov 14, 2017

will-am Nov 14, 2017

lcy-seso Nov 14, 2017

will-am Nov 14, 2017

lcy-seso Nov 14, 2017

will-am Nov 14, 2017

lcy-seso left a comment

Finish generating chinese poetry #439

Finish generating chinese poetry #439

Conversation

will-am commented Nov 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment