Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish generating chinese poetry #439

Merged
merged 14 commits into from
Nov 20, 2017
Merged

Conversation

will-am
Copy link
Contributor

@will-am will-am commented Nov 9, 2017

Resolve #334

@will-am will-am requested a review from lcy-seso November 9, 2017 03:42
@@ -1,12 +1,6 @@
<<<<<<< HEAD
<s>
<e>
<unk>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要把字典放在github上面,这个字典可以通过脚本来自动构建。

# 中国古诗生成

## 简介
基于编码器-解码器(encoder-decoder)神经网络模型,利用全唐诗进行诗句-诗句(sequence to sequence)训练,实现给定诗句后,生成下一诗句。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里用一两句话描述一下默认的网络结构信息,例如默认几层LSTM encoder/decoder,是否带attention。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经更新README,增加了简要描述

python preprocess.py --datadir data/raw --outfile data/poems.txt --dictfile data/dict.txt
```

上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息,分为三列,分别为题目、作者、诗内容。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 数据字典 --> 字典。
  2. 默认情况下,字典如何构建?分词/分字?字频率统计,默认截断频率是多少,提供一些基本的信息。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经更新README,增加了字典构建的描述

```

上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息,分为三列,分别为题目、作者、诗内容。
在诗内容中,诗句之间用`.`分隔。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"." 分隔之后,训练数据的构造策略是什么?谁是源谁是目标?请解释一下数据策略。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经更新README,增加了数据构建的简要描述

[required]
--use_gpu TEXT Whether to use GPU in generation.
--help Show this message and exit.
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 104 ~ 115 行删去。原因同上。

  • 脚本 generate.py 的详细命令行参数请通过执行 python generate.py --help进行查阅。这里对重要参数进行说明。(后面如果需要说明请使用中文。)

--init_model_path TEXT The path of a trained model used to initialized all
the model parameters.
--help Show this message and exit.
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 48 ~ 64 行删去。其它例子后面会考虑各自进行修改。
    • 这个命令行参数只是直接复制粘贴了python train.py --help 的运行结果,并没有提供比这个更多的信息,如果需要,用户可以自行执行脚本查看。只需要在README中提醒用户查看即可。
    • 直接复制粘贴也会让代码修改情况下,这里需要同步,增加工作量。

- `use_gpu`: 是否使用GPU

### 执行生成
例如将诗句 `白日依山盡,黃河入海流` 保存在文件 `input.txt` 中作为预测下句诗的输入,执行命令:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要这样构造源和目标。
源:“白日依山尽” --> 目标:"黄河如海流"。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改构建方法,并重新训练了模型,根据模型训练效果调整了默认训练参数,更新了例子

paragraphs = filter(lambda x: len(x), paragraphs)
if len(paragraphs) > 1:
dataset.append((title, author, paragraphs))
print("Finished...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删

dataset.append((title, author, paragraphs))
print("Finished...")

print("Constructing vocabularies...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing --> Construct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改

author = data[1]
paragraphs = ".".join(data[2])
f.write("\t".join((title, author, paragraphs)) + "\n")
print("Finished...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删

with io.open(dictfile, "w", encoding="utf8") as f:
for v in vocab:
f.write(v + "\n")
print("Finished...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删

f.write(v + "\n")
print("Finished...")

print("Writing processed data...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing --> Write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lcy-seso lcy-seso merged commit ede5a04 into PaddlePaddle:develop Nov 20, 2017
@will-am will-am deleted the chinese_poetry branch November 20, 2017 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants