Mv sst fadvise #88

Merged
merged 9 commits into from Jul 22, 2013

Projects

None yet

2 participants

@matthewvon
Collaborator

This branch makes the posix_fadvise() calls more effective and fixes a race condition found in the file write operations. Branch notes are here: https://github.com/basho/leveldb/wiki/mv-sst-fadvise

@metadave metadave commented on the diff Jul 18, 2013
db/table_cache.cc
@@ -88,6 +88,7 @@ Status TableCache::FindTable(uint64_t file_number, uint64_t file_size, int level
}
else
{
+ // (later, call SetForCompaction here too for files already in cache)
@metadave
metadave Jul 18, 2013

do you mean that this is a future TODO?

@matthewvon
matthewvon Jul 18, 2013 collaborator

yes. not in this branch.

@matthewvon
matthewvon Jul 18, 2013 collaborator

There is also a commented out posix_fadvise in env_posix.cc (line 119) that is related to this comment. There is a heavy interaction between the two that I do not have time to completely research at this time. Hence, bypassing both lines.

@metadave metadave commented on the diff Jul 18, 2013
include/leveldb/perf_count.h
@@ -189,6 +189,7 @@ enum PerformanceCountersEnum
ePerfThrottleBacklog1=64,//!< backlog at time of posting (level1+)
ePerfThrottleCompacts1=65,//!< number of level 1+ compactions
+ ePerfBGWriteError=66, //!< error in write/close, see syslog
@metadave
metadave Jul 18, 2013

just curious, what's the meaning of !< in the comment?

@matthewvon
matthewvon Jul 18, 2013 collaborator

mostly habit. it is a parse tag for doxygen. maybe some day I will have time to really comment the code for the next person ... on-line manual too.

@metadave metadave commented on an outdated diff Jul 18, 2013
util/env_posix.cc
@@ -108,7 +113,12 @@ class PosixRandomAccessFile: public RandomAccessFile {
public:
PosixRandomAccessFile(const std::string& fname, int fd)
- : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0) { }
+ : filename_(fname), fd_(fd), is_compaction_(false), file_size_(0)
+ {
+#if defined(HAVE_FADVISE)
+ // posix_fadvise(fd_, 0, file_size_, POSIX_FADV_RANDOM);
@metadave
metadave Jul 18, 2013

is this intentionally commented out?

@metadave metadave and 1 other commented on an outdated diff Jul 18, 2013
include/leveldb/env.h
@@ -81,6 +82,16 @@ class Env {
virtual Status NewAppendableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
+ // Derived from NewWritableFile. Sets flag that changes
@metadave
metadave Jul 18, 2013

from what I can tell, the flag that you are referring to is the last parameter of a call to:
PosixMmapFile(fname, fd, page_size_, 0, true);

Can you be more specific in your comment?

@matthewvon
matthewvon Jul 18, 2013 collaborator

Hmm, wonder if that comment should be more usage based instead of technology based. Will change and check-in.

@metadave metadave commented on the diff Jul 19, 2013
include/leveldb/env.h
@@ -81,6 +82,16 @@ class Env {
virtual Status NewAppendableFile(const std::string& fname,
WritableFile** result) = 0;
+ // Riak specific:
+ // Derived from NewWritableFile. Special version of
+ // NewWritableFile that enables write and close operations
+ // to execute on background threads (where supported).
+ //
+ // The returned file will only be accessed by one thread at a time.
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result)
+ {return(NewWritableFile(fname, result));};
@metadave
metadave Jul 19, 2013

just double checking that you intend to call NewWritableFile from inside NewWriteOnlyFile (or at least explain why in a comment, other than "derived from NewWritableFile")

@matthewvon
matthewvon Jul 22, 2013 collaborator

Comment was updated but is not reflected on this screen. See file's line-by-line diff to see new comment.

@metadave metadave commented on the diff Jul 19, 2013
util/env_posix.cc
else
{
- Env::Default()->Schedule(&BGFileUnmapper2, ptr, 4);
+ if (and_close)
+ Env::Default()->Schedule(&BGFileCloser2, ptr, 4);
@metadave
metadave Jul 19, 2013

looks like that 4 refers to a specific state, can that be constant?

@matthewvon
matthewvon Jul 19, 2013 collaborator

Not arguing that it is wrong., cuz it is. But not fixing in this pass.

Matthew

On Jul 19, 2013, at 4:06 PM, Dave Parfitt notifications@github.com wrote:

In util/env_posix.cc:

   else
   {
  •      Env::Default()->Schedule(&BGFileUnmapper2, ptr, 4);
    
  •      if (and_close)
    
  •          Env::Default()->Schedule(&BGFileCloser2, ptr, 4);
    
    looks like that 4 refers to a specific state, can that be constant?


Reply to this email directly or view it on GitHub.

@metadave metadave commented on an outdated diff Jul 22, 2013
util/perf_count.cc
@@ -514,7 +514,8 @@
"ThrottleMicros1",
"ThrottleKeys1",
"ThrottleBacklog1",
- "ThrottleCompacts1"
+ "ThrottleCompacts1",
+ "BGWriteError"
@metadave metadave commented on the diff Jul 22, 2013
util/env_posix.cc
@@ -509,6 +545,18 @@ class PosixEnv : public Env {
return s;
}
+ virtual Status NewWriteOnlyFile(const std::string& fname,
+ WritableFile** result) {
+ Status s;
+ const int fd = open(fname.c_str(), O_CREAT | O_RDWR | O_TRUNC, 0644);
@metadave
metadave Jul 22, 2013

wouldn't you want O_WRONLY instead of O_RDWR?

@matthewvon
matthewvon Jul 22, 2013 collaborator

Ah, this file handle is passed into PosixMmapFile. PosixMmapFile performs Read/Write mappings … that does not match WRONLY. Maybe everything "could" be read only. But that is a ton more work than this bug fix release should being changing.

On Jul 22, 2013, at 9:03 AM, Dave Parfitt notifications@github.com wrote:

In util/env_posix.cc:

@@ -509,6 +545,18 @@ class PosixEnv : public Env {
return s;
}

  • virtual Status NewWriteOnlyFile(const std::string& fname,
  •                             WritableFile*\* result) {
    
  • Status s;
  • const int fd = open(fname.c_str(), O_CREAT | O_RDWR | O_TRUNC, 0644);
    wouldn't you want O_WRONLY instead of O_RDWR?


Reply to this email directly or view it on GitHub.

@metadave metadave commented on an outdated diff Jul 22, 2013
util/env_posix.cc
#endif
if (0 != file_ptr->unused_)
- ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ {
+ ret_val=ftruncate(file_ptr->fd_, file_ptr->offset_ + file_ptr->length_ - file_ptr->unused_);
+ if (0!=ret_val)
@metadave
metadave Jul 22, 2013

whitespace messed up for this block

@metadave metadave and 1 other commented on an outdated diff Jul 22, 2013
util/env_posix.cc
#if defined(HAVE_FADVISE)
- posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
+ // release newly written data from Linux page cache if possible
+ if (0==file_ptr->metadata_
+ || (file_ptr->offset_ + file_ptr->length_ < file_ptr->metadata_))
+ {
+ // must fdatasync for DONTNEED to work
+ ret_val=fdatasync(file_ptr->fd_);
+ if (0!=ret_val)
@metadave
metadave Jul 22, 2013

whitespace screwed up on all these if statements

@matthewvon
matthewvon Jul 22, 2013 collaborator

Meta-x untabify

On Jul 22, 2013, at 9:15 AM, Dave Parfitt notifications@github.com wrote:

In util/env_posix.cc:

#if defined(HAVE_FADVISE)

  • posix_fadvise(file_ptr->fd_, file_ptr->offset_, file_ptr->length_, POSIX_FADV_WILLNEED);
  • // release newly written data from Linux page cache if possible
  • if (0==file_ptr->metadata_
  •    || (file_ptr->offset_ + file_ptr->length_ < file_ptr->metadata_))
    
  • {
  •    // must fdatasync for DONTNEED to work
    
  •    ret_val=fdatasync(file_ptr->fd_);
    
  • if (0!=ret_val)
    whitespace screwed up on all these if statements


Reply to this email directly or view it on GitHub.

@metadave

@matthewvon looks good, misc (non-critical) formatting issues.
+1 if you address the commend regarding O_WRONLY vs O_RDWR

@matthewvon matthewvon merged commit 6422b67 into master Jul 22, 2013
@matthewvon matthewvon deleted the mv-sst-fadvise branch May 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment