Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[WIP] New faster version of the RecordIO iterator#7152

Merged
piiswrong merged 5 commits intoapache:masterfrom
ptrendx:new_io
Oct 14, 2017
Merged

[WIP] New faster version of the RecordIO iterator#7152
piiswrong merged 5 commits intoapache:masterfrom
ptrendx:new_io

Conversation

@ptrendx
Copy link
Member

@ptrendx ptrendx commented Jul 21, 2017

Does not yet support shuffle (ignores that setting)

@ptrendx
Copy link
Member Author

ptrendx commented Aug 29, 2017

Test again?

@piiswrong
Copy link
Contributor

piiswrong commented Oct 8, 2017

@zhreshold Please have a look.

DMLC_DECLARE_FIELD(path_imgrec).set_default("")
.describe("Path to the image RecordIO (.rec) file or a directory path. "\
"Created with tools/im2rec.py.");
DMLC_DECLARE_FIELD(path_imgidx).set_default("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is path_imgidx required or optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional.

Copy link
Member

@zhreshold zhreshold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piiswrong LGTM
As discussed offline, I think there's potential improvement, especially the chunk shuffling part. However, this solution is working nicely and have proved to provide good results. So I guess we should take this in.
One more concern is the Turbo JPEG that will need to be addressed in the prebuilt packages. @szha

@szha
Copy link
Member

szha commented Oct 11, 2017

I can't seem to find the source code for TurboJPEG. Is it open source? I was able to find libjpeg-turbo, but it's not the same thing as TurboJPEG

@zhreshold
Copy link
Member

zhreshold commented Oct 11, 2017

@ptrendx
Seems like I am confused too.

@ptrendx
Copy link
Member Author

ptrendx commented Oct 11, 2017

Yes, it is libjpeg-turbo (https://github.com/libjpeg-turbo/libjpeg-turbo). It has 2 APIs though - libjpeg API and TurboJPEG API. I am using TurboJPEG API since it is more straightforward.

@szha
Copy link
Member

szha commented Oct 11, 2017

Thanks. Would you update the name of the flag to reflect this, such as USE_LIBJPEG_TURBO?

@ptrendx
Copy link
Member Author

ptrendx commented Oct 11, 2017

Sure, I will do that.

Added option for using libjpeg-turbo directly to decode images

ImageRecordIter can now use .idx files generated by im2rec.py

Added rec2idx.py utility to generate .idx files from .rec files

When using IndexedRecordIO (.rec and .idx together) shuffle option
performs global shuffling
@ptrendx
Copy link
Member Author

ptrendx commented Oct 13, 2017

Rebasing on current master.

@ptrendx
Copy link
Member Author

ptrendx commented Oct 14, 2017

@piiswrong It passed CI.

"""Returns the current position of read head.
"""
pos = ctypes.c_size_t()
check_call(_LIB.MXRecordIOReaderTell(self.handle, ctypes.byref(pos)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrendx Please add this to recordio python API with a separate PR after release

@piiswrong piiswrong merged commit 65b2587 into apache:master Oct 14, 2017
@eric-haibin-lin
Copy link
Member

Does it print warnings when shuffle is set to True?

@ptrendx
Copy link
Member Author

ptrendx commented Oct 17, 2017

Are you talking about the comment that shuffle does not work yet? This was when I was still working on it and is no longer true.

crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017
* Improved ImageRecordIter performance

Added option for using libjpeg-turbo directly to decode images

ImageRecordIter can now use .idx files generated by im2rec.py

Added rec2idx.py utility to generate .idx files from .rec files

When using IndexedRecordIO (.rec and .idx together) shuffle option
performs global shuffling

* Add ASF license header

* Update dmlc-core to fix a bug on Windows

* USE_TURBO_JPEG -> USE_LIBJPEG_TURBO

* trigger test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants