Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plain Datumfile input format for minimum memory usage #2193

Closed
wants to merge 6 commits into from

Conversation

@immars
Copy link

immars commented Mar 25, 2015

When training with leveldb/lmdb, memory increases linearly with iterations and even with Next()s for DataLayer's rand_skip.

Here's simple plain datum file format via std::fstream to address this issue.

Pros

  • RAM usage basically stay constant (<1G, on googlenet small batches) during training as expected
  • datum file size between leveldb and lmdb format
  • no noticeable impact on training speed because of prefetch
  • concurrent read for multiple process

Cons

  • no random read. But caffe does not need random key-value access anyway.
immars added 3 commits Mar 24, 2015
@@ -181,6 +181,87 @@ class LMDB : public DB {
MDB_dbi mdb_dbi_;
};


#define MAX_BUF 10485760 // max entry size

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Why this value?

This comment has been minimized.

Copy link
@immars

immars Mar 27, 2015

Author

It prevents a too large key_size or data_size read from file, maybe from file corruption.
I thought 10M for a datum is large enough, or maybe should be larger? 100M?

in = NULL;
SeekToFirst();
}
virtual ~DatumFileCursor() {

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Should it close the input file?


virtual void Next();

virtual string key() {

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

It should fail instead of giving a warning. Ex:
CHECK(valid()) << "not valid state at key()";

return _key;
}
virtual string value() {
if (!valid()) {

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Same as above

std::ifstream* in;
bool valid_;

string _key, _value;

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Use the convention key_ value_ for private variables


void DatumFileCursor::Next() {
valid_ = false;
if (!in->is_open()) {

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Should fail
CHECK(in->is_open()) << "file is not open!" << path;

uint32_t record_size, key_size, value_size;
in->read(reinterpret_cast<char*>(&record_size), sizeof record_size);
if (in->gcount() != (sizeof record_size) || record_size > MAX_BUF) {
if (!in->eof()) {

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

These are clear errors from which you cannot recover, so fail in this cases.

}
in->read(reinterpret_cast<char*>(&key_size), sizeof key_size);
if (in->gcount() != sizeof key_size || key_size > MAX_BUF) {
LOG(WARNING) << "key_size read error: gcount\t"

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Same as above

_key.resize(key_size);
in->read(&_key[0], key_size);
if (in->gcount() != key_size) {
LOG(WARNING) << "key read error: gcount\t"

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Same as above

}
in->read(reinterpret_cast<char*>(&value_size), sizeof value_size);
if (in->gcount() != sizeof value_size || value_size > MAX_BUF) {
LOG(WARNING) << "value_size read error: gcount\t"

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Same as above

_value.resize(value_size);
in->read(&_value[0], value_size);
if (in->gcount() != value_size) {
LOG(WARNING) << "value read error: gcount\t"

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Same as above

out->write(reinterpret_cast<char*>(&value_size), sizeof value_size);
out->write(value.data(), value_size);
} catch(std::ios_base::failure& e) {
LOG(WARNING) << "exception: "

This comment has been minimized.

Copy link
@sguada

sguada Mar 26, 2015

Contributor

Use LOG(FATAL) in this case

@immars
Copy link
Author

immars commented Mar 27, 2015

Thanks for the review @sguada !

immars added 2 commits Mar 27, 2015
weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 14, 2015
Plain Datumfile input format for minimum memory usage
weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 16, 2015
@weiliu89
Copy link

weiliu89 commented Apr 16, 2015

@immars Thanks for the pull! I have been using it, and found that when I start N training jobs accessing the same datumfile, each one only uses 100/N % of CPU. Is it normal? I am not sure if it is going to make the training slower or not.

@immars
Copy link
Author

immars commented Apr 18, 2015

@weiliu89 this should not be happening, not according to my test. No locking is used, training process should not be IO bound either. Are you running N process? what's your iostat -kx 1 output ? or nvidia-smi?

@shelhamer
Copy link
Member

shelhamer commented Apr 14, 2017

Closing as better addressed by the Python layer. There are many types of data, and as long as it can be handled in Python it can be handled as a Python layer.

@shelhamer shelhamer closed this Apr 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.