-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recordio cloud and local interface #2665
Changes from 6 commits
3919b75
fc3d031
183a5d4
4874810
0fa4092
b79784e
b3c5808
97bbd17
af5ac2c
26e661b
e12d726
421d9f1
24dc0d1
20c6119
f2a82b1
660475b
b396055
4daa247
126e64f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ typedef int paddle_master_client; | |
import "C" | ||
|
||
import ( | ||
"io" | ||
"sync" | ||
"unsafe" | ||
|
||
|
@@ -84,11 +85,29 @@ func paddle_set_dataset(client C.paddle_master_client, path **C.char, size C.int | |
return C.PADDLE_MASTER_OK | ||
} | ||
|
||
// return value: | ||
// 0:ok | ||
// -1:EOF | ||
// -2:error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not -1 error? -2 is a little strange. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
//export paddle_next_record | ||
func paddle_next_record(client C.paddle_master_client, record **C.uchar) C.int { | ||
c := get(client) | ||
r := c.NextRecord() | ||
r, err := c.NextRecord() | ||
if err == io.EOF { | ||
// EOF | ||
*record = (*C.uchar)(nullPtr) | ||
return -1 | ||
} | ||
|
||
if err != nil { | ||
// Error | ||
// TODO: return the type of error? | ||
*record = (*C.uchar)(nullPtr) | ||
return -2 | ||
} | ||
|
||
if len(r) == 0 { | ||
// Empty record | ||
*record = (*C.uchar)(nullPtr) | ||
return 0 | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,14 +26,27 @@ def set_dataset(self, paths): | |
holder[idx] = c_ptr | ||
lib.paddle_set_dataset(self.c, holder, len(paths)) | ||
|
||
# return format: (record, errno) | ||
# errno = 0: ok | ||
# = -1: EOF | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we don't need to return EOF. The function name is Actually, do we really need to return error here? Current implementation is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 如线下讨论:错误是需要暴露的,cloud端EOF不需要暴露。 |
||
# < -1: error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. < 0: error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
def next_record(self): | ||
p = ctypes.c_char_p() | ||
ret = ctypes.pointer(p) | ||
size = lib.paddle_next_record(self.c, ret) | ||
if size == -1: | ||
# EOF | ||
return None, -1 | ||
|
||
if size < -1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
# Error | ||
return None, size | ||
|
||
if size == 0: | ||
# Empty record | ||
return "" | ||
return "", 0 | ||
|
||
record = ret.contents.value[:size] | ||
# Memory created from C should be freed. | ||
lib.mem_free(ret.contents) | ||
return record | ||
return record, 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,22 +57,51 @@ def reader(): | |
return reader | ||
|
||
|
||
def recordio(path): | ||
def recordio_local(paths): | ||
""" | ||
Creates a data reader that outputs record one one by one from given recordio file | ||
:path: path of recordio file | ||
:returns: data reader of recordio file | ||
Creates a data reader that outputs record one one by one | ||
from given local recordio fils path. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great! Done. |
||
:path: path of recordio files. | ||
:returns: data reader of recordio files. | ||
""" | ||
|
||
import recordio as rec | ||
|
||
def reader(): | ||
f = rec.reader(path) | ||
for i, path in enumerate(paths): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't need enum here, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个地方我觉得有些疑惑,我们让用户输入的是一个以 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 不好意思,不是很一致。这里跟set_dataset不同哈~set_dataset接受的是一个list,这里是","分割的字符串。具体接受什么参数要看一下Go那边是怎么实现的。 |
||
f = rec.reader(path) | ||
while True: | ||
r = f.read() | ||
if r is None: | ||
break | ||
yield r | ||
f.close() | ||
|
||
return reader | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the recordio function takes the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Done. |
||
|
||
def recordio(paths, addr="", buf_size=100): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe put this function name into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经放了。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great!Done. |
||
""" | ||
Creates a data reader that outputs record one one by one | ||
from given local or cloud recordio path. | ||
:path: path of recordio files. | ||
:returns: data reader of recordio files. | ||
""" | ||
import os | ||
import paddle.v2.master.client as cloud | ||
|
||
if "KUBERNETES_SERVICE_HOST" not in os.environ.keys(): | ||
return recordio_local(paths) | ||
|
||
def reader(): | ||
c = cloud(addr, buf_size) | ||
c.set_dataset(paths) | ||
|
||
while True: | ||
r = f.read() | ||
r, err = client.next_record() | ||
if r is None: | ||
break | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to break only if error happens. None could be an empty record (which we should not break). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks.Done. |
||
yield r | ||
f.close() | ||
|
||
c.close() | ||
|
||
return reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件在最新的develop branch已经大改了,需要rebase或者pull一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.