-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recordio cloud and local interface #2665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! Some question about if we need to return error for next_record
.
python/paddle/v2/master/client.py
Outdated
@@ -26,14 +26,27 @@ def set_dataset(self, paths): | |||
holder[idx] = c_ptr | |||
lib.paddle_set_dataset(self.c, holder, len(paths)) | |||
|
|||
# return format: (record, errno) | |||
# errno = 0: ok | |||
# = -1: EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need to return EOF. The function name is next_record
, from the user's perspective, it's an infinite stream of data. EOF does not make sense here.
Actually, do we really need to return error here? Current implementation is next_record
will log error and try to recover from the error.
If we return error here, what can user do with it? (the retry logic is already implemented inside next_record
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如线下讨论:错误是需要暴露的,cloud端EOF不需要暴露。
我在这里建立了一个ISSUE:#2678
这个PR的py接口先不暴露错误,以便保持本地和cloud接口的统一。
go/master/c/CMakeLists.txt
Outdated
@@ -6,7 +6,7 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PARENT_DIR}/cmake") | |||
|
|||
project(cxx_go C Go) | |||
|
|||
include(golang) | |||
#include(golang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件在最新的develop branch已经大改了,需要rebase或者pull一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
go/master/c/client.go
Outdated
@@ -84,11 +84,22 @@ func paddle_set_dataset(client C.paddle_master_client, path **C.char, size C.int | |||
return C.PADDLE_MASTER_OK | |||
} | |||
|
|||
// return value: | |||
// 0:ok | |||
// -2:error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not -1 error? -2 is a little strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/paddle/v2/master/client.py
Outdated
def next_record(self): | ||
p = ctypes.c_char_p() | ||
ret = ctypes.pointer(p) | ||
size = lib.paddle_next_record(self.c, ret) | ||
if size < -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps size < 0
is better. Errors are typically < 0, not < -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/paddle/v2/master/client.py
Outdated
@@ -26,14 +26,22 @@ def set_dataset(self, paths): | |||
holder[idx] = c_ptr | |||
lib.paddle_set_dataset(self.c, holder, len(paths)) | |||
|
|||
# return format: (record, errno) | |||
# errno = 0: ok | |||
# < -1: error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
< 0: error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/paddle/v2/reader/creator.py
Outdated
:path: path of recordio file | ||
:returns: data reader of recordio file | ||
Creates a data reader that outputs record one one by one | ||
from given local recordio fils path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change Creates a data reader that outputs record one one by one from given local recordio fils path.
to Creates a data reader from given RecordIO file paths separated by ",", glob pattern is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Done.
python/paddle/v2/reader/creator.py
Outdated
""" | ||
|
||
import recordio as rec | ||
|
||
def reader(): | ||
f = rec.reader(path) | ||
for i, path in enumerate(paths): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need enum here, rec.reader
support "," separated paths: https://github.com/PaddlePaddle/recordio/blob/master/c/crecordio.go#L79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方我觉得有些疑惑,我们让用户输入的是一个以','
分割的字符串?还是一个字符串的array?还是两者的混合?
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/master/client.py#L24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不好意思,不是很一致。这里跟set_dataset不同哈~set_dataset接受的是一个list,这里是","分割的字符串。具体接受什么参数要看一下Go那边是怎么实现的。
python/paddle/v2/reader/creator.py
Outdated
|
||
return reader | ||
|
||
def recordio(paths, addr="", buf_size=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this function name into __all__
variable inside this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经放了。
python/paddle/v2/reader/creator.py
Outdated
|
||
return reader | ||
|
||
def recordio(paths, addr="", buf_size=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think addr
need to be an environment variable, since user would never know the address of the etcd cluster. (so no way to provide this argument).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!Done.
python/paddle/v2/reader/creator.py
Outdated
yield r | ||
f.close() | ||
|
||
return reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the recordio function takes the buf_size
argument, we need to honor that argument in local reader as well. So perhaps return buffered(reader, buf_size)
, please see: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/reader/decorator.py#L162
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, otherwise LGTM.
python/paddle/v2/reader/creator.py
Outdated
while True: | ||
r, err = client.next_record() | ||
if r is None: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to break only if error happens. None could be an empty record (which we should not break).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.Done.
… cloudandlocal
… cloudandlocal
… cloudandlocal
Fix #2637
同时,在返回值上增加errno以便区分是
EOF
还是error
。