Skip to content

[WIP] Indexed recordIO support in InputSplit#289

Merged
piiswrong merged 7 commits intodmlc:masterfrom
ptrendx:indexed_recordio
Aug 15, 2017
Merged

[WIP] Indexed recordIO support in InputSplit#289
piiswrong merged 7 commits intodmlc:masterfrom
ptrendx:indexed_recordio

Conversation

@ptrendx
Copy link
Contributor

@ptrendx ptrendx commented Jul 26, 2017

Not complete yet, don't merge

@piiswrong
Copy link
Member

@tqchen

unsigned num_parts,
const char *type,
const bool shuffle = false,
const int seed = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure defaulting seed to 0 is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is added to magic number inside indexedrecordioinputsplit. It is actually the second as in image iterator in MxNet (seed defaults to 0 + magic).

const char *type,
const bool shuffle = false,
const int seed = 0,
const size_t batch_size = 256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size shouldn't have a default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a hint (like the HintChunkSize) and only IndexedRecordioInputSplit actually uses that. Readers can give back any number of samples anyway so consumers of the read data need to be prepared to handle that.

}
virtual void HintChunkSize(size_t chunk_size) {
buffer_size_ = std::max(chunk_size / sizeof(size_t), buffer_size_);
buffer_size_ = std::max(chunk_size / sizeof(uint32_t), buffer_size_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though recordio is aligned to 4 bytes, reader was reading to chunk's data vector which was size_t (so 8 bytes aligned). This is problematic, because there is no guarantee that recordio entry is aligned to 8 bytes. That's why I changed chunk's internal data vector to uint32_t

@tqchen
Copy link
Member

tqchen commented Aug 3, 2017

can you comment a bit about index file format and mechanism, e.g. does every shard need to load the entire index file, etc,

@ptrendx
Copy link
Contributor Author

ptrendx commented Aug 3, 2017

It is very simple file format (line per entry with 2 integers - entry id and offset) introduced in apache/mxnet#2887 a year ago, but not actually used anywhere (although im2rec.py script actually generates the index file alongside recordIO file).
Each worker needs the whole index file, then each worker is responsible for only its own portion of the indexes.
Shuffle introduces global shuffling, but only inside a worker part (so that there is no need to share seeds between workers etc.). I thought about making it so that groups of few entries are shuffled in case the performance is bad, but at least on systems I tested it on it was fine as-is. This should really improve the shuffle option in MXNet recordio iterator, because right now it shuffles only inside a single chunk, which often is less than or comparable to single batch (so there is effectively no shuffling).
Motivation for this PR is twofold:

  • I introduced new much faster IO pipeline in MXNet ([WIP] New faster version of the RecordIO iterator apache/mxnet#7152) which unfortunately makes it hard to implement shuffling inside iterator, so shuffling needs to be done on lower level.
  • having InputSplit return precisely the batch size of images makes the IO pipeline even faster, because the overflow storage is not used, so there are no unnecessary copies being made, and also the cost of OpenMP synchronization between threads is paid only once per batch (In my tests on DGX-1 I went from 6.2k imgs/s to 7k imgs/s when testing just IO on RN50 pipeline).


line_split.o: src/io/line_split.cc
recordio_split.o: src/io/recordio_split.cc
indexed_recordio_split.o: src/io/indexed_recordio_split.cc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to add this to amalgamation

@piiswrong piiswrong merged commit 6fd2d23 into dmlc:master Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants