New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plain Datumfile input format for minimum memory usage #2193

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@immars

immars commented Mar 25, 2015

When training with leveldb/lmdb, memory increases linearly with iterations and even with Next()s for DataLayer's rand_skip.

Here's simple plain datum file format via std::fstream to address this issue.

Pros

  • RAM usage basically stay constant (<1G, on googlenet small batches) during training as expected
  • datum file size between leveldb and lmdb format
  • no noticeable impact on training speed because of prefetch
  • concurrent read for multiple process

Cons

  • no random read. But caffe does not need random key-value access anyway.
Show outdated Hide outdated include/caffe/util/db.hpp
@@ -181,6 +181,87 @@ class LMDB : public DB {
MDB_dbi mdb_dbi_;
};
#define MAX_BUF 10485760 // max entry size

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Why this value?

@sguada

sguada Mar 26, 2015

Contributor

Why this value?

This comment has been minimized.

@immars

immars Mar 27, 2015

It prevents a too large key_size or data_size read from file, maybe from file corruption.
I thought 10M for a datum is large enough, or maybe should be larger? 100M?

@immars

immars Mar 27, 2015

It prevents a too large key_size or data_size read from file, maybe from file corruption.
I thought 10M for a datum is large enough, or maybe should be larger? 100M?

Show outdated Hide outdated include/caffe/util/db.hpp
in = NULL;
SeekToFirst();
}
virtual ~DatumFileCursor() {

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Should it close the input file?

@sguada

sguada Mar 26, 2015

Contributor

Should it close the input file?

virtual void Next();
virtual string key() {

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

It should fail instead of giving a warning. Ex:
CHECK(valid()) << "not valid state at key()";

@sguada

sguada Mar 26, 2015

Contributor

It should fail instead of giving a warning. Ex:
CHECK(valid()) << "not valid state at key()";

Show outdated Hide outdated include/caffe/util/db.hpp
return _key;
}
virtual string value() {
if (!valid()) {

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Same as above

@sguada

sguada Mar 26, 2015

Contributor

Same as above

Show outdated Hide outdated include/caffe/util/db.hpp
std::ifstream* in;
bool valid_;
string _key, _value;

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Use the convention key_ value_ for private variables

@sguada

sguada Mar 26, 2015

Contributor

Use the convention key_ value_ for private variables

Show outdated Hide outdated src/caffe/util/db.cpp
void DatumFileCursor::Next() {
valid_ = false;
if (!in->is_open()) {

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Should fail
CHECK(in->is_open()) << "file is not open!" << path;

@sguada

sguada Mar 26, 2015

Contributor

Should fail
CHECK(in->is_open()) << "file is not open!" << path;

Show outdated Hide outdated src/caffe/util/db.cpp
uint32_t record_size, key_size, value_size;
in->read(reinterpret_cast<char*>(&record_size), sizeof record_size);
if (in->gcount() != (sizeof record_size) || record_size > MAX_BUF) {
if (!in->eof()) {

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

These are clear errors from which you cannot recover, so fail in this cases.

@sguada

sguada Mar 26, 2015

Contributor

These are clear errors from which you cannot recover, so fail in this cases.

Show outdated Hide outdated src/caffe/util/db.cpp
}
in->read(reinterpret_cast<char*>(&key_size), sizeof key_size);
if (in->gcount() != sizeof key_size || key_size > MAX_BUF) {
LOG(WARNING) << "key_size read error: gcount\t"

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Same as above

@sguada

sguada Mar 26, 2015

Contributor

Same as above

Show outdated Hide outdated src/caffe/util/db.cpp
_key.resize(key_size);
in->read(&_key[0], key_size);
if (in->gcount() != key_size) {
LOG(WARNING) << "key read error: gcount\t"

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Same as above

@sguada

sguada Mar 26, 2015

Contributor

Same as above

Show outdated Hide outdated src/caffe/util/db.cpp
}
in->read(reinterpret_cast<char*>(&value_size), sizeof value_size);
if (in->gcount() != sizeof value_size || value_size > MAX_BUF) {
LOG(WARNING) << "value_size read error: gcount\t"

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Same as above

@sguada

sguada Mar 26, 2015

Contributor

Same as above

Show outdated Hide outdated src/caffe/util/db.cpp
_value.resize(value_size);
in->read(&_value[0], value_size);
if (in->gcount() != value_size) {
LOG(WARNING) << "value read error: gcount\t"

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Same as above

@sguada

sguada Mar 26, 2015

Contributor

Same as above

Show outdated Hide outdated src/caffe/util/db.cpp
out->write(reinterpret_cast<char*>(&value_size), sizeof value_size);
out->write(value.data(), value_size);
} catch(std::ios_base::failure& e) {
LOG(WARNING) << "exception: "

This comment has been minimized.

@sguada

sguada Mar 26, 2015

Contributor

Use LOG(FATAL) in this case

@sguada

sguada Mar 26, 2015

Contributor

Use LOG(FATAL) in this case

@immars

This comment has been minimized.

Show comment
Hide comment
@immars

immars Mar 27, 2015

Thanks for the review @sguada !

immars commented Mar 27, 2015

Thanks for the review @sguada !

weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 14, 2015

Merge pull request #2193 from immars/datumfile
Plain Datumfile input format for minimum memory usage

weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 16, 2015

@weiliu89

This comment has been minimized.

Show comment
Hide comment
@weiliu89

weiliu89 Apr 16, 2015

@immars Thanks for the pull! I have been using it, and found that when I start N training jobs accessing the same datumfile, each one only uses 100/N % of CPU. Is it normal? I am not sure if it is going to make the training slower or not.

weiliu89 commented Apr 16, 2015

@immars Thanks for the pull! I have been using it, and found that when I start N training jobs accessing the same datumfile, each one only uses 100/N % of CPU. Is it normal? I am not sure if it is going to make the training slower or not.

@immars

This comment has been minimized.

Show comment
Hide comment
@immars

immars Apr 18, 2015

@weiliu89 this should not be happening, not according to my test. No locking is used, training process should not be IO bound either. Are you running N process? what's your iostat -kx 1 output ? or nvidia-smi?

immars commented Apr 18, 2015

@weiliu89 this should not be happening, not according to my test. No locking is used, training process should not be IO bound either. Are you running N process? what's your iostat -kx 1 output ? or nvidia-smi?

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Apr 14, 2017

Member

Closing as better addressed by the Python layer. There are many types of data, and as long as it can be handled in Python it can be handled as a Python layer.

Member

shelhamer commented Apr 14, 2017

Closing as better addressed by the Python layer. There are many types of data, and as long as it can be handled in Python it can be handled as a Python layer.

@shelhamer shelhamer closed this Apr 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment