Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recordio cloud and local interface #2665

Merged
merged 19 commits into from
Jul 7, 2017

Conversation

gongweibao
Copy link
Contributor

@gongweibao gongweibao commented Jun 29, 2017

Fix #2637
同时,在返回值上增加errno以便区分是EOF还是error

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! Some question about if we need to return error for next_record.

@@ -26,14 +26,27 @@ def set_dataset(self, paths):
holder[idx] = c_ptr
lib.paddle_set_dataset(self.c, holder, len(paths))

# return format: (record, errno)
# errno = 0: ok
# = -1: EOF
Copy link
Contributor

@helinwang helinwang Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to return EOF. The function name is next_record , from the user's perspective, it's an infinite stream of data. EOF does not make sense here.

Actually, do we really need to return error here? Current implementation is next_record will log error and try to recover from the error.
If we return error here, what can user do with it? (the retry logic is already implemented inside next_record).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如线下讨论:错误是需要暴露的,cloud端EOF不需要暴露。
我在这里建立了一个ISSUE:#2678
这个PR的py接口先不暴露错误,以便保持本地和cloud接口的统一。

@@ -6,7 +6,7 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PARENT_DIR}/cmake")

project(cxx_go C Go)

include(golang)
#include(golang)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件在最新的develop branch已经大改了,需要rebase或者pull一下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -84,11 +84,22 @@ func paddle_set_dataset(client C.paddle_master_client, path **C.char, size C.int
return C.PADDLE_MASTER_OK
}

// return value:
// 0:ok
// -2:error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not -1 error? -2 is a little strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def next_record(self):
p = ctypes.c_char_p()
ret = ctypes.pointer(p)
size = lib.paddle_next_record(self.c, ret)
if size < -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps size < 0 is better. Errors are typically < 0, not < -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -26,14 +26,22 @@ def set_dataset(self, paths):
holder[idx] = c_ptr
lib.paddle_set_dataset(self.c, holder, len(paths))

# return format: (record, errno)
# errno = 0: ok
# < -1: error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< 0: error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:path: path of recordio file
:returns: data reader of recordio file
Creates a data reader that outputs record one one by one
from given local recordio fils path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change Creates a data reader that outputs record one one by one from given local recordio fils path. to Creates a data reader from given RecordIO file paths separated by ",", glob pattern is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Done.

"""

import recordio as rec

def reader():
f = rec.reader(path)
for i, path in enumerate(paths):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need enum here, rec.reader support "," separated paths: https://github.com/PaddlePaddle/recordio/blob/master/c/crecordio.go#L79

Copy link
Contributor Author

@gongweibao gongweibao Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方我觉得有些疑惑,我们让用户输入的是一个以','分割的字符串?还是一个字符串的array?还是两者的混合?
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/master/client.py#L24

Copy link
Contributor

@helinwang helinwang Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不好意思,不是很一致。这里跟set_dataset不同哈~set_dataset接受的是一个list,这里是","分割的字符串。具体接受什么参数要看一下Go那边是怎么实现的。


return reader

def recordio(paths, addr="", buf_size=100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this function name into __all__ variable inside this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经放了。


return reader

def recordio(paths, addr="", buf_size=100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think addr need to be an environment variable, since user would never know the address of the etcd cluster. (so no way to provide this argument).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!Done.

yield r
f.close()

return reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the recordio function takes the buf_size argument, we need to honor that argument in local reader as well. So perhaps return buffered(reader, buf_size), please see: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/reader/decorator.py#L162

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise LGTM.

while True:
r, err = client.next_record()
if r is None:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to break only if error happens. None could be an empty record (which we should not break).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.Done.

@gongweibao gongweibao merged commit 8e8f360 into PaddlePaddle:develop Jul 7, 2017
@gongweibao gongweibao deleted the cloudandlocal branch January 17, 2021 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants