Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Mv sst fadvise #88

Merged
merged 9 commits into from

2 participants

@matthewvon
Collaborator

This branch makes the posix_fadvise() calls more effective and fixes a race condition found in the file write operations. Branch notes are here: https://github.com/basho/leveldb/wiki/mv-sst-fadvise

@metadave metadave commented on the diff
db/table_cache.cc
@@ -88,6 +88,7 @@ Status TableCache::FindTable(uint64_t file_number, uint64_t file_size, int level
}
else
{
+ // (later, call SetForCompaction here too for files already in cache)

do you mean that this is a future TODO?

@matthewvon Collaborator

yes. not in this branch.

@matthewvon Collaborator

There is also a commented out posix_fadvise in env_posix.cc (line 119) that is related to this comment. There is a heavy interaction between the two that I do not have time to completely research at this time. Hence, bypassing both lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@metadave metadave commented on the diff
include/leveldb/perf_count.h
@@ -189,6 +189,7 @@ enum PerformanceCountersEnum
ePerfThrottleBacklog1=64,//!< backlog at time of posting (level1+)
ePerfThrottleCompacts1=65,//!< number of level 1+ compactions
+ ePerfBGWriteError=66, //!< error in write/close, see syslog

just curious, what's the meaning of !< in the comment?

@matthewvon Collaborator

mostly habit. it is a parse tag for doxygen. maybe some day I will have time to really comment the code for the next person ... on-line manual too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
util/env_posix.cc
@@ -108,7 +113,12 @@ class PosixRandomAccessFile: public RandomAccessFile {
public:
PosixRandomAccessFile(const std::string& fname, int fd)
- : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0) { }
+ : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0)
+ {
+#if defined(HAVE_FADVISE)
+ // posix_fadvise(fd_, 0, file_size_, POSIX_FADV_RANDOM);

is this intentionally commented out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
include/leveldb/env.h
@@ -81,6 +82,16 @@ class Env {
virtual Status NewAppendableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
+ // Derived from NewWritableFile. Sets flag that changes

from what I can tell, the flag that you are referring to is the last parameter of a call to:
PosixMmapFile(fname, fd, page_size_, 0, true);

Can you be more specific in your comment?

@matthewvon Collaborator

Hmm, wonder if that comment should be more usage based instead of technology based. Will change and check-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@metadave metadave commented on the diff
include/leveldb/env.h
@@ -81,6 +82,16 @@ class Env {
virtual Status NewAppendableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
+ // Derived from NewWritableFile. Special version of
+ // NewWritableFile that enables write and close operations
+ // to execute on background threads (where supported).
+ //
+ // The returned file will only be accessed by one thread at a time.
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result)
+ {return(NewWritableFile(fname, result));};

just double checking that you intend to call NewWritableFile from inside NewWriteOnlyFile (or at least explain why in a comment, other than "derived from NewWritableFile")

@matthewvon Collaborator

Comment was updated but is not reflected on this screen. See file's line-by-line diff to see new comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@metadave metadave commented on the diff
util/env_posix.cc
((25 lines not shown))
else
{
- Env::Default()->Schedule(&BGFileUnmapper2, ptr, 4);
+ if (and_close)
+ Env::Default()->Schedule(&BGFileCloser2, ptr, 4);

looks like that 4 refers to a specific state, can that be constant?

@matthewvon Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
util/perf_count.cc
@@ -514,7 +514,8 @@
"ThrottleMicros1",
"ThrottleKeys1",
"ThrottleBacklog1",
- "ThrottleCompacts1"
+ "ThrottleCompacts1",
+ "BGWriteError"

whitespace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@metadave metadave commented on the diff
util/env_posix.cc
@@ -509,6 +545,18 @@ class PosixEnv : public Env {
return s;
}
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result) {
+ Status s;
+ const int fd = open(fname.c_str(), O_CREAT | O_RDWR | O_TRUNC, 0644);

wouldn't you want O_WRONLY instead of O_RDWR?

@matthewvon Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
util/env_posix.cc
((27 lines not shown))
#endif
if (0 != file_ptr->unused_)
- ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ {
+ ret_val=ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ if (0!=ret_val)

whitespace messed up for this block

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
util/env_posix.cc
((18 lines not shown))
#if defined(HAVE_FADVISE)
- posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ // release newly written data from Linux page cache if possible
+ if (0==file_ptr->metadata_
+ || (file_ptr->offset_ + file_ptr->length_ < file_ptr->metadata_))
+ {
+ // must fdatasync for DONTNEED to work
+ ret_val=fdatasync(file_ptr->fd_);
+ if (0!=ret_val)

whitespace screwed up on all these if statements

@matthewvon Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@metadave

@matthewvon looks good, misc (non-critical) formatting issues.
+1 if you address the commend regarding O_WRONLY vs O_RDWR

@matthewvon matthewvon merged commit 6422b67 into master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 11, 2013
  1. @matthewvon

    clean up fadvise use on WritableFile. clean up foreground/background …

    matthewvon authored
    …selection of operations in PosixMmapFile.
  2. @matthewvon
Commits on Jul 12, 2013
  1. @matthewvon
  2. @matthewvon

    Reverse the pattern. Default is same thread like .sst. Setup recovery…

    matthewvon authored
    … log and others to use background.
  3. @matthewvon
Commits on Jul 16, 2013
  1. @matthewvon

    Clear gcc variable not used warning, but keep call to h->key() incase…

    matthewvon authored
    … the call creates side effects.
Commits on Jul 17, 2013
  1. @matthewvon

    Add fallback error reporting to syslog and to new perf_counter. Enabl…

    matthewvon authored
    …e SetForCompaction() in this code base (though not for an already open file).
Commits on Jul 18, 2013
  1. @matthewvon
Commits on Jul 22, 2013
  1. @matthewvon
This page is out of date. Refresh to see the latest.
View
4 db/db_impl.cc
@@ -1673,7 +1673,7 @@ Status DBImpl::MakeRoomForWrite(bool force) {
uint64_t new_log_number = versions_->NewFileNumber();
WritableFile* lfile = NULL;
gPerfCounters->Inc(ePerfWriteNewMem);
- s = env_->NewWritableFile(LogFileName(dbname_, new_log_number), &lfile);
+ s = env_->NewWriteOnlyFile(LogFileName(dbname_, new_log_number), &lfile);
if (!s.ok()) {
// Avoid chewing through file number space in a tight loop.
versions_->ReuseFileNumber(new_log_number);
@@ -1820,7 +1820,7 @@ Status DB::Open(const Options& options, const std::string& dbname,
if (s.ok()) {
uint64_t new_log_number = impl->versions_->NewFileNumber();
WritableFile* lfile;
- s = options.env->NewWritableFile(LogFileName(dbname, new_log_number),
+ s = options.env->NewWriteOnlyFile(LogFileName(dbname, new_log_number),
&lfile);
if (s.ok()) {
edit.SetLogNumber(new_log_number);
View
3  db/table_cache.cc
@@ -88,6 +88,7 @@ Status TableCache::FindTable(uint64_t file_number, uint64_t file_size, int level
}
else
{
+ // (later, call SetForCompaction here too for files already in cache)

do you mean that this is a future TODO?

@matthewvon Collaborator

yes. not in this branch.

@matthewvon Collaborator

There is also a commented out posix_fadvise in env_posix.cc (line 119) that is related to this comment. There is a heavy interaction between the two that I do not have time to completely research at this time. Hence, bypassing both lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
gPerfCounters->Inc(ePerfTableCached);
} // else
return s;
@@ -103,7 +104,7 @@ Iterator* TableCache::NewIterator(const ReadOptions& options,
}
Cache::Handle* handle = NULL;
- Status s = FindTable(file_number, file_size, level, &handle);
+ Status s = FindTable(file_number, file_size, level, &handle, options.IsCompaction());
if (!s.ok()) {
return NewErrorIterator(s);
}
View
19 include/leveldb/env.h
@@ -71,6 +71,7 @@ class Env {
virtual Status NewWritableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
// Derived from NewWritableFile. One change: if the file exists,
// move to the end of the file and continue writing.
// new file. On success, stores a pointer to the open file in
@@ -81,6 +82,16 @@ class Env {
virtual Status NewAppendableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
+ // Allows for virtualized version of NewWritableFile that enables write
+ // and close operations to execute on background threads
+ // (where platform supported).
+ //
+ // The returned file will only be accessed by one thread at a time.
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result)
+ {return(NewWritableFile(fname, result));};

just double checking that you intend to call NewWritableFile from inside NewWriteOnlyFile (or at least explain why in a comment, other than "derived from NewWritableFile")

@matthewvon Collaborator

Comment was updated but is not reflected on this screen. See file's line-by-line diff to see new comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
// Returns true iff the named file exists.
virtual bool FileExists(const std::string& fname) = 0;
@@ -233,6 +244,11 @@ class WritableFile {
virtual Status Flush() = 0;
virtual Status Sync() = 0;
+ // Riak specific:
+ // Provide hint where key/value data ends and metadata starts
+ // in an .sst table file.
+ virtual void SetMetadataOffset(uint64_t) {};
+
private:
// No copying allowed
WritableFile(const WritableFile&);
@@ -318,6 +334,9 @@ class EnvWrapper : public Env {
Status NewAppendableFile(const std::string& f, WritableFile** r) {
return target_->NewAppendableFile(f, r);
}
+ Status NewWriteOnlyFile(const std::string& f, WritableFile** r) {
+ return target_->NewWriteOnlyFile(f, r);
+ }
bool FileExists(const std::string& f) { return target_->FileExists(f); }
Status GetChildren(const std::string& dir, std::vector<std::string>* r) {
return target_->GetChildren(dir, r);
View
1  include/leveldb/perf_count.h
@@ -189,6 +189,7 @@ enum PerformanceCountersEnum
ePerfThrottleBacklog1=64,//!< backlog at time of posting (level1+)
ePerfThrottleCompacts1=65,//!< number of level 1+ compactions
+ ePerfBGWriteError=66, //!< error in write/close, see syslog

just curious, what's the meaning of !< in the comment?

@matthewvon Collaborator

mostly habit. it is a parse tag for doxygen. maybe some day I will have time to really comment the code for the next person ... on-line manual too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
// must follow last index name to represent size of array
// (ASSUMES previous enum is highest value)
View
3  table/table_builder.cc
@@ -228,6 +228,9 @@ Status TableBuilder::Finish() {
BlockHandle filter_block_handle, metaindex_block_handle, index_block_handle,
sst_stats_block_handle;
+ // pass hint to Linux fadvise management
+ r->file->SetMetadataOffset(r->offset);
+
// Write filter block
if (ok() && r->filter_block != NULL) {
WriteRawBlock(r->filter_block->Finish(), kNoCompression,
View
2  util/cache.cc
@@ -116,7 +116,7 @@ class HandleTable {
LRUHandle* h = list_[i];
while (h != NULL) {
LRUHandle* next = h->next_hash;
- Slice key = h->key();
+ /*Slice key =*/ h->key(); // eliminate unused var warning, but allow for side-effects
uint32_t hash = h->hash;
LRUHandle** ptr = &new_list[hash & (new_length - 1)];
h->next_hash = *ptr;
View
208 util/env_posix.cc
@@ -11,6 +11,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include <syslog.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/time.h>
@@ -60,9 +61,13 @@ struct BGCloseInfo
size_t offset_;
size_t length_;
size_t unused_;
+ uint64_t metadata_;
- BGCloseInfo(int fd, void * base, size_t offset, size_t length, size_t unused)
- : fd_(fd), base_(base), offset_(offset), length_(length), unused_(unused) {};
+ BGCloseInfo(int fd, void * base, size_t offset, size_t length,
+ size_t unused, uint64_t metadata)
+ : fd_(fd), base_(base), offset_(offset), length_(length),
+ unused_(unused), metadata_(metadata)
+ {};
};
class PosixSequentialFile: public SequentialFile {
@@ -108,7 +113,15 @@ class PosixRandomAccessFile: public RandomAccessFile {
public:
PosixRandomAccessFile(const std::string& fname, int fd)
- : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0) { }
+ : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0)
+ {
+#if defined(HAVE_FADVISE)
+ // Currently hurts performance instead of helps. Likely
+ // requires better interaction with tables already in cache
+ // that start compaction. See comment in table_cache.cc.
+ // posix_fadvise(fd_, 0, file_size_, POSIX_FADV_RANDOM);
+#endif
+ }
virtual ~PosixRandomAccessFile()
{
if (is_compaction_)
@@ -137,6 +150,9 @@ class PosixRandomAccessFile: public RandomAccessFile {
{
is_compaction_=true;
file_size_=file_size;
+#if defined(HAVE_FADVISE)
+ posix_fadvise(fd_, 0, file_size_, POSIX_FADV_SEQUENTIAL);
+#endif
};
@@ -163,7 +179,7 @@ class PosixMmapReadableFile: public RandomAccessFile {
}
virtual ~PosixMmapReadableFile()
{
- BGCloseInfo * ptr=new BGCloseInfo(fd_, mmapped_region_, 0, length_, 0);
+ BGCloseInfo * ptr=new BGCloseInfo(fd_, mmapped_region_, 0, length_, 0, 0);
Env::Default()->Schedule(&BGFileCloser, ptr, 4);
};
@@ -195,9 +211,9 @@ class PosixMmapFile : public WritableFile {
char* dst_; // Where to write next (in range [base_,limit_])
char* last_sync_; // Where have we synced up to
uint64_t file_offset_; // Offset of base_ in file
-
- // Have we done an munmap of unsynced data?
- bool pending_sync_;
+ uint64_t metadata_offset_; // Offset where sst metadata starts, or zero
+ bool pending_sync_; // Have we done an munmap of unsynced data?
+ bool is_write_only_; // can this file process in background
// Roundup x to a multiple of y
static size_t Roundup(size_t x, size_t y) {
@@ -218,19 +234,32 @@ class PosixMmapFile : public WritableFile {
pending_sync_ = true;
}
- BGCloseInfo * ptr=new BGCloseInfo(fd_, base_, file_offset_, limit_-base_, limit_-dst_);
- if (and_close)
+ BGCloseInfo * ptr=new BGCloseInfo(fd_, base_, file_offset_, limit_-base_,
+ limit_-dst_, metadata_offset_);
+
+ // write only files can perform operations async, but not
+ // files that might re-open and read again soon
+ if (!is_write_only_)
{
- // do this in foreground unfortunately, bug where file not
- // closed fast enough for reopen
- BGFileCloser2(ptr);
- fd_=-1;
+ if (and_close)
+ BGFileCloser2(ptr);
+ else
+ BGFileUnmapper2(ptr);
} // if
+
+ // called from user thread, move these operations to background
+ // queue
else
{
- Env::Default()->Schedule(&BGFileUnmapper2, ptr, 4);
+ if (and_close)
+ Env::Default()->Schedule(&BGFileCloser2, ptr, 4);

looks like that 4 refers to a specific state, can that be constant?

@matthewvon Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ else
+ Env::Default()->Schedule(&BGFileUnmapper2, ptr, 4);
} // else
+ if (and_close)
+ fd_=-1;
+
file_offset_ += limit_ - base_;
base_ = NULL;
limit_ = NULL;
@@ -277,7 +306,8 @@ class PosixMmapFile : public WritableFile {
public:
PosixMmapFile(const std::string& fname, int fd,
- size_t page_size, size_t file_offset=0L)
+ size_t page_size, size_t file_offset=0L,
+ bool is_write_only=false)
: filename_(fname),
fd_(fd),
page_size_(page_size),
@@ -287,13 +317,14 @@ class PosixMmapFile : public WritableFile {
dst_(NULL),
last_sync_(NULL),
file_offset_(file_offset),
- pending_sync_(false) {
+ metadata_offset_(0),
+ pending_sync_(false),
+ is_write_only_(is_write_only) {
assert((page_size & (page_size - 1)) == 0);
gPerfCounters->Inc(ePerfRWFileOpen);
}
-
~PosixMmapFile() {
if (fd_ >= 0) {
PosixMmapFile::Close();
@@ -364,6 +395,11 @@ class PosixMmapFile : public WritableFile {
return s;
}
+
+ virtual void SetMetadataOffset(uint64_t Metadata)
+ {
+ metadata_offset_=Metadata;
+ } // SetMetadataOffset
};
@@ -480,7 +516,7 @@ class PosixEnv : public Env {
*result = NULL;
s = IOError(fname, errno);
} else {
- *result = new PosixMmapFile(fname, fd, page_size_);
+ *result = new PosixMmapFile(fname, fd, page_size_, 0, false);
}
return s;
}
@@ -509,6 +545,18 @@ class PosixEnv : public Env {
return s;
}
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result) {
+ Status s;
+ const int fd = open(fname.c_str(), O_CREAT | O_RDWR | O_TRUNC, 0644);

wouldn't you want O_WRONLY instead of O_RDWR?

@matthewvon Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ if (fd < 0) {
+ *result = NULL;
+ s = IOError(fname, errno);
+ } else {
+ *result = new PosixMmapFile(fname, fd, page_size_, 0, true);
+ }
+ return s;
+ }
virtual bool FileExists(const std::string& fname) {
@@ -1005,22 +1053,46 @@ void PosixEnv::StartThread(void (*function)(void* arg), void* arg) {
void BGFileCloser(void * arg)
{
BGCloseInfo * file_ptr;
+ bool err_flag;
+ int ret_val;
+ err_flag=false;
file_ptr=(BGCloseInfo *)arg;
- munmap(file_ptr->base_, file_ptr->length_);
+ ret_val=munmap(file_ptr->base_, file_ptr->length_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser munmap failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
#if defined(HAVE_FADVISE)
- posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_DONTNEED);
+ ret_val=posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_DONTNEED);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser posix_fadvise DONTNEED failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+
#endif
if (0 != file_ptr->unused_)
- ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ {
+ ret_val=ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser ftruncate failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ }
close(file_ptr->fd_);
delete file_ptr;
gPerfCounters->Inc(ePerfROFileClose);
+ if (err_flag)
+ gPerfCounters->Inc(ePerfBGWriteError);
+
return;
} // BGFileCloser
@@ -1029,23 +1101,68 @@ void BGFileCloser(void * arg)
void BGFileCloser2(void * arg)
{
BGCloseInfo * file_ptr;
+ bool err_flag;
+ int ret_val;
+ err_flag=false;
file_ptr=(BGCloseInfo *)arg;
- munmap(file_ptr->base_, file_ptr->length_);
+ ret_val=munmap(file_ptr->base_, file_ptr->length_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser2 munmap failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+
#if defined(HAVE_FADVISE)
- posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ // release newly written data from Linux page cache if possible
+ if (0==file_ptr->metadata_
+ || (file_ptr->offset_ + file_ptr->length_ < file_ptr->metadata_))
+ {
+ // must fdatasync for DONTNEED to work
+ ret_val=fdatasync(file_ptr->fd_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser2 fdatasync failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+
+ ret_val=posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_DONTNEED);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser2 posix_fadvise DONTNEED failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ } // if
+ else
+ {
+ ret_val=posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser2 posix_fadvise WILLNEED failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ } // else
#endif
if (0 != file_ptr->unused_)
- ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
-
+ {
+ ret_val=ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileCloser2 ftruncate failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ }
close(file_ptr->fd_);
delete file_ptr;
gPerfCounters->Inc(ePerfRWFileClose);
+ if (err_flag)
+ gPerfCounters->Inc(ePerfBGWriteError);
+
return;
} // BGFileCloser2
@@ -1076,18 +1193,55 @@ void BGFileUnmapper(void * arg)
void BGFileUnmapper2(void * arg)
{
BGCloseInfo * file_ptr;
+ bool err_flag;
+ int ret_val;
+ err_flag=false;
file_ptr=(BGCloseInfo *)arg;
- munmap(file_ptr->base_, file_ptr->length_);
+ ret_val=munmap(file_ptr->base_, file_ptr->length_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileUnmapper2 munmap failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
#if defined(HAVE_FADVISE)
- posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ if (0==file_ptr->metadata_
+ || (file_ptr->offset_ + file_ptr->length_ < file_ptr->metadata_))
+ {
+ // must fdatasync for DONTNEED to work
+ ret_val=fdatasync(file_ptr->fd_);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileUnmapper2 fdatasync failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+
+ ret_val=posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_DONTNEED);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileUnmapper2 posix_fadvise DONTNEED failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ } // if
+ else
+ {
+ ret_val=posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ if (0!=ret_val)
+ {
+ syslog(LOG_ERR,"BGFileUnmapper2 posix_fadvise WILLNEED failed [%d, %m]", errno);
+ err_flag=true;
+ } // if
+ } // else
#endif
delete file_ptr;
gPerfCounters->Inc(ePerfRWFileUnmap);
+ if (err_flag)
+ gPerfCounters->Inc(ePerfBGWriteError);
+
return;
} // BGFileUnmapper2
View
3  util/perf_count.cc
@@ -514,7 +514,8 @@ PerformanceCounters * gPerfCounters(&LocalStartupCounters);
"ThrottleMicros1",
"ThrottleKeys1",
"ThrottleBacklog1",
- "ThrottleCompacts1"
+ "ThrottleCompacts1",
+ "BGWriteError"
};
Something went wrong with that request. Please try again.