Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Dataset Reader in Paddle Book #1419

Closed
reyoung opened this issue Feb 22, 2017 · 3 comments
Closed

implement Dataset Reader in Paddle Book #1419

reyoung opened this issue Feb 22, 2017 · 3 comments

Comments

@reyoung
Copy link
Collaborator

reyoung commented Feb 22, 2017

This issue is a part of #1392.

There are eight books in the current Paddle Book. We need to write eight reader creators for each dataset. There are:

  • 新手入门
  • 识别数字
  • 图像分类
  • 词向量
    • By @wangkuiyi
    • It seems we could use nltk.corpus.treebank to fetch the exactly dataset used in Book.
  • 情感分析
    • By @wen-bo-yang
    • The IMDB dataset is not included in nltk.corpus, but the Amazon reviews is included. So could we change the dataset used in Book?
  • 语义角色标注
    • By @reyoung
    • The book uses CONLL 2005 dataset. The nltk.corpus contains CONLL 2000, CONLL 2002, CONLL 2007 dataset. Could we changed the dataset into CONLL 2007?
  • 机器翻译
    • nltk.corpus contains WMT dataset.
  • 个性化推荐
@wangkuiyi
Copy link
Collaborator

data cache 目录 /usr/local/paddle/dataset

@reyoung
Copy link
Collaborator Author

reyoung commented Feb 23, 2017

/usr/local/paddle/dataset

这个目录,用户没有root权限就没法写

@wen-bo-yang
Copy link
Contributor

我在nltk.corpus中找到了影评资源,内容和IMDB的组织形式差不都,我想是否可以使用这个文本数据做情感分类的数据

/nltk_data/corpora/movie_reviews/neg# ls
cv000_29416.txt  cv143_21158.txt .....
/nltk_data/corpora/movie_reviews/pos# ls
cv000_29590.txt  cv143_19666.txt .....

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants