Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read data from hdfs #1

Open
formath opened this issue Aug 31, 2016 · 1 comment
Open

read data from hdfs #1

formath opened this issue Aug 31, 2016 · 1 comment
Assignees

Comments

@formath
Copy link

formath commented Aug 31, 2016

"Different node should owns different parts of all Train data. This simple script did not do this job, so you should prepare it at last. " I saw this in cluster training wiki. So, could paddle read data from hdfs and distribute data to each node automatically?

@reyoung reyoung self-assigned this Aug 31, 2016
@reyoung
Copy link
Collaborator

reyoung commented Aug 31, 2016

Distribute data to cluster is not added in PaddlePaddle now. You can read data directly from a HDFS file path by PyDataProvider2.

PaddlePaddle not handle how to get data file remotely, just pass the file path into a Python function. It is user's job to OPEN the file (or SQL connection string, or HDFS path), and get each
sample one by one from it.

It is welcome to contribute a script to distribute data to cluster. Or we may add it soon if this feature is very necessary.

qingqing01 pushed a commit that referenced this issue Sep 14, 2016
Update from the original
reyoung referenced this issue in reyoung/Paddle Sep 21, 2016
wangkuiyi added a commit that referenced this issue Dec 1, 2016
reyoung pushed a commit that referenced this issue Dec 5, 2016
luotao1 pushed a commit that referenced this issue Dec 28, 2016
backyes pushed a commit that referenced this issue Dec 30, 2016
Rephrase the first paragraph
qingqing01 pushed a commit that referenced this issue Mar 20, 2017
reyoung pushed a commit that referenced this issue Sep 5, 2017
Invoke check_grad many times for no_grad_set
zchen0211 added a commit that referenced this issue Sep 14, 2017
qingqing01 pushed a commit that referenced this issue Sep 19, 2017
luotao1 pushed a commit that referenced this issue Oct 16, 2017
wanghaox added a commit that referenced this issue Nov 24, 2017
typhoonzero added a commit that referenced this issue Dec 2, 2017
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023
Silv3S pushed a commit to Silv3S/Paddle that referenced this issue Mar 14, 2023
tsocha pushed a commit to tsocha/Paddle that referenced this issue Mar 16, 2023
yuanlehome referenced this issue in yuanlehome/Paddle Mar 27, 2023
adds the base classes for analyses
PeiyuLau pushed a commit to PeiyuLau/Paddle that referenced this issue Jun 8, 2023
feifei-111 referenced this issue in feifei-111/Paddle Jun 14, 2023
@paddle-bot paddle-bot bot added the status/developing 开发中 label Sep 22, 2023
@paddle-bot paddle-bot bot reopened this Sep 22, 2023
tianyan01 pushed a commit to tianyan01/Paddle that referenced this issue Nov 23, 2023
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
Rename docs-src to docs and rename demo to tutorials.
hanhaowen-mt referenced this issue in hanhaowen-mt/Paddle Feb 29, 2024
[MTAI-489] build(ci): test CI
NKNaN pushed a commit to NKNaN/Paddle that referenced this issue Mar 3, 2024
Correct license in rockspec file.
Fridge003 pushed a commit to Fridge003/Paddle that referenced this issue Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants