[WIP] New faster version of the RecordIO iterator#7152
[WIP] New faster version of the RecordIO iterator#7152piiswrong merged 5 commits intoapache:masterfrom
Conversation
|
Test again? |
6fcc3a6 to
eed243d
Compare
|
@zhreshold Please have a look. |
| DMLC_DECLARE_FIELD(path_imgrec).set_default("") | ||
| .describe("Path to the image RecordIO (.rec) file or a directory path. "\ | ||
| "Created with tools/im2rec.py."); | ||
| DMLC_DECLARE_FIELD(path_imgidx).set_default("") |
There was a problem hiding this comment.
Is path_imgidx required or optional?
zhreshold
left a comment
There was a problem hiding this comment.
@piiswrong LGTM
As discussed offline, I think there's potential improvement, especially the chunk shuffling part. However, this solution is working nicely and have proved to provide good results. So I guess we should take this in.
One more concern is the Turbo JPEG that will need to be addressed in the prebuilt packages. @szha
|
I can't seem to find the source code for TurboJPEG. Is it open source? I was able to find libjpeg-turbo, but it's not the same thing as TurboJPEG |
|
@ptrendx |
|
Yes, it is libjpeg-turbo (https://github.com/libjpeg-turbo/libjpeg-turbo). It has 2 APIs though - libjpeg API and TurboJPEG API. I am using TurboJPEG API since it is more straightforward. |
|
Thanks. Would you update the name of the flag to reflect this, such as |
|
Sure, I will do that. |
Added option for using libjpeg-turbo directly to decode images ImageRecordIter can now use .idx files generated by im2rec.py Added rec2idx.py utility to generate .idx files from .rec files When using IndexedRecordIO (.rec and .idx together) shuffle option performs global shuffling
|
Rebasing on current master. |
|
@piiswrong It passed CI. |
| """Returns the current position of read head. | ||
| """ | ||
| pos = ctypes.c_size_t() | ||
| check_call(_LIB.MXRecordIOReaderTell(self.handle, ctypes.byref(pos))) |
There was a problem hiding this comment.
@ptrendx Please add this to recordio python API with a separate PR after release
|
Does it print warnings when shuffle is set to True? |
|
Are you talking about the comment that shuffle does not work yet? This was when I was still working on it and is no longer true. |
* Improved ImageRecordIter performance Added option for using libjpeg-turbo directly to decode images ImageRecordIter can now use .idx files generated by im2rec.py Added rec2idx.py utility to generate .idx files from .rec files When using IndexedRecordIO (.rec and .idx together) shuffle option performs global shuffling * Add ASF license header * Update dmlc-core to fix a bug on Windows * USE_TURBO_JPEG -> USE_LIBJPEG_TURBO * trigger test
Does not yet support shuffle (ignores that setting)