-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish generating chinese poetry #439
Changes from 1 commit
5c886ce
d0bab0b
7943732
16c4afe
dbff6d6
9c677e6
7740e7c
5ba2f99
2ecc858
f21f1ae
4456dab
3614e6c
1cd0bc8
3bbe91d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,12 +34,12 @@ python preprocess.py --datadir data/raw --outfile data/poems.txt --dictfile data | |
``` | ||
|
||
上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息,分为三列,分别为题目、作者、诗内容。 | ||
在诗内容中,诗句之间用'.'分隔。 | ||
在诗内容中,诗句之间用`.`分隔。 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "." 分隔之后,训练数据的构造策略是什么?谁是源谁是目标?请解释一下数据策略。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经更新README,增加了数据构建的简要描述 |
||
|
||
训练数据示例: | ||
```text | ||
登鸛雀樓 王之渙 白日依山盡,黃河入海流.欲窮千里目,更上一層樓 | ||
觀獵 李白 太守耀清威,乘閑弄晚暉.江沙橫獵騎,山火遶行圍.箭逐雲鴻落,鷹隨月兔飛.不知白日暮,歡賞夜方歸 | ||
觀獵 李白 太守耀清威,乘閑弄晚暉.江沙橫獵騎,山火遶行圍.箭逐雲鴻落,鷹隨月兔飛.不知白日暮,歡賞夜方歸 | ||
晦日重宴 陳嘉言 高門引冠蓋,下客抱支離.綺席珍羞滿,文場翰藻摛.蓂華彫上月,柳色藹春池.日斜歸戚里,連騎勒金羈 | ||
``` | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,6 @@ | ||
<<<<<<< HEAD | ||
<s> | ||
<e> | ||
<unk> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 不要把字典放在github上面,这个字典可以通过脚本来自动构建。 |
||
======= | ||
<unk> | ||
<s> | ||
<e> | ||
>>>>>>> 7943732ab34254df801d72b0b5e04f6f320e4127 | ||
, | ||
不 | ||
人 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经更新README,增加了字典构建的描述