Commits on Mar 30, 2012
  1. Checking in changes prior to tagging of version 2.60.

    Changelog diff is:
    diff --git a/CHANGES b/CHANGES
    index 770e518..5b59d7f 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,12 @@
    +2012-03-30: Release version 2.60
    +   * Fix fsck status when running for the first time (dormando <>)
    +   * Checksum support (Major update!) (Eric Wong <>)
    +     See doc/checksums.txt for an overview of how the new checksum system
    +     works. Also keep an eye on the wiki ( for more
    +     complete documentation in the coming weeks.
     2012-02-29: Release version 2.59
        * don't make SQLite error out on lock calls (dormando <>)
    committed Mar 30, 2012
  2. make fsck_checksum == off honored in more places

    if fsck_checksum was set to off, it would ignore the checksums deep in the
    code, but would still attempt to "fix" the fids each time, which runs far more
    code and UPDATE's each fid's devcount even if you tell it not to.
    now it does what it should. however FSCK with checksums enabled will still
    UPDATE devcount on each check.
    committed Mar 30, 2012
  3. Fix fsck status when running for the first time

    Fsck would print "Status: N / 0 " if it's never been started before. Now
    internally finds the max(fid) on its own.
    committed Mar 30, 2012
  4. doc/checksums: use $HASHTYPE for referring to hash names

    $NAME is potentially ambiguous and $HASHTYPE matches the
    database column name.
    Eric Wong committed with Mar 22, 2012
  5. re-enable SHA-1 for checksums

    Optimized SHA-1 implementations aren't significantly slower than
    MD5 and some folks (e.g. Tomas Doran) may already have SHA-1 in
    place for their data.
    A liberally licensed, GPL-compatible collection of SHA-1
    primitives is available from one of the OpenSSL developers:
    It would be nice to allow the Perl Digest module to
    transparently take advantage of architecture-specific
    Note there is no standardized equivalent to the HTTP Content-MD5
    header/trailer for any of the SHA variants, so verification for
    replication/uploads may take significantly longer.
    Requested-by: Tomas Doran
    Eric Wong committed with Mar 13, 2012
  6. DevFID size caching for fsck with checksumming

    The digest path relies on having a known file size to calculate
    the MD5 timeout, so save an HTTP HEAD request since we always
    check file sizes in fsck before we checksum the file.
    Eric Wong committed with Mar 13, 2012
  7. checksums: use a low-priority task queue for fsck digests

    MD5 is I/O-intensive, and having fsck request MD5s concurrently
    ends up causing I/O contention on rotational drives with high
    seek latency.  So limit fsck MD5 requests to a single job per
    Eric Wong committed with Mar 12, 2012
  8. fsck_checksum setting replaces fsck_auto_checksum

    Unlike the setting it replaces, this new setting can be used to disable
    checksumming entirely, regardless of per-class options.
    class - is the default, fsck based on per-class hashtype
    off - skip all checksumming regardless of per-class setting
    MD5 - same as the previous fsck_auto_checksum=MD5
    Eric Wong committed with Mar 11, 2012
  9. checksums: disable all hash algorithms except MD5

    MD5 is faster than SHA1, and much faster than any of the SHA2
    variants.  Given the time penalty of fsck is already high with
    MD5, prevent folks from shooting themselves in the foot with
    extremely expensive hash algorithms.
    Eric Wong committed with Mar 10, 2012
  10. fsck: add fsck_auto_checksum server setting

    Enabling this setting allows fsck to checksum all replicas on
    all devices and report any corrupted copies regardless of
    per-class settings.
    This feature is useful for determining if enabling checksums on
    certain classes is necessary and will also benefit users who
    cannot afford to store checksums in the database.
    Eric Wong committed with Mar 10, 2012
  11. httpfile: fix timeout comparison when digesting via mogstored

    The timeout comparison is wrong and causing ping_cb to never
    fire.  This went unnoticed since I have reasonably fast disks
    on my storage nodes and the <$sock> operation was able to
    complete before being hit by a watchdog timeout.
    Eric Wong committed with Jan 18, 2012
  12. doc/checksums: clarify binary column type for various DBs

    We don't actually use the BLOB type anywhere, as checksums
    are definitely not "L"(arge) objects.
    Eric Wong committed with Dec 2, 2011
  13. get_domains returns hashtype as a string

    This special-cases "NONE" for no hash for our users.
    Eric Wong committed with Nov 29, 2011
  14. always use HTTPFile->digest with a the ping callback

    We need to ensure the worker stays alive during MD5
    generation, especially on large files that can take
    many seconds to verify.
    Eric Wong committed with Nov 29, 2011
  15. Newer SQLite _can_ ALTER TABLE .. ADD COLUMN in some cases

    I'll be testing checksum functionality on my home installation
    before testing it on other installations, and I run SQLite at
    Eric Wong committed with Nov 29, 2011
  16. rename checksum{type,name} => hash{type,name}

    It reads more easily this way, at least to me.
    Eric Wong committed with Nov 29, 2011
  17. flesh out fsck functionality for checksums

    Fsck behavior is based on existing behavior for size mismatches.
    size failures take precedence, since it's much cheaper to verify
    size match/mismatches than checksum mismatches.
    While checksum calculations are expensive and fsck is already
    parallel, so we do not parallelize checksum calculations on
    a per-FID basis.
    Eric Wong committed with Nov 29, 2011
  18. replicate generates proper CRLF for Content-MD5 header

    TODO: see if we can use LWP to avoid mistakes like this :x
    Eric Wong committed with Nov 29, 2011
  19. ensure checksum row is deleted when FID is deleted

    Stale rows are bad.
    Eric Wong committed with Nov 29, 2011
  20. doc: update checksums document

    Only the fsck part remains to be implemented... And I've never
    studied/used fsck much :x
    Eric Wong committed with Nov 27, 2011
  21. replication skips HTTPFile->digest if device can reject bad MD5s

    Rereading a large file is expensive.  If we can monitor
    and observe our storage nodes for MD5 rejectionability, we
    can rely on that instead of having to have anybody reread
    the entire file to calculate its MD5.
    Eric Wong committed with Nov 27, 2011
  22. monitor observes Content-MD5-rejectability

    This functionality (and a server capable of rejecting bad MD5s)
    will allow us to skip an expensive MogileFS::HTTPFile->digest
    request at replication time.
    Also testing with the following patch to Perlbal:
      --- a/lib/mogdeps/Perlbal/
      +++ b/lib/mogdeps/Perlbal/
    @@ -22,6 +22,7 @@ use fields ('put_in_progress', # 1 when we're currently waiting for an async job
                 'content_length',  # length of document being transferred
                 'content_length_remain', # bytes remaining to be read
                 'chunked_upload_state', # bool/obj:  if processing a chunked upload, Perlbal::ChunkedUploadState object, else undef
    +            'md5_ctx',
     use HTTP::Date ();
    @@ -29,6 +30,7 @@ use File::Path;
     use Errno qw( EPIPE );
    +use Digest::MD5;
     # class list of directories we know exist
     our (%VerifiedDirs);
    @@ -61,6 +63,7 @@ sub init {
         $self->{put_fh} = undef;
         $self->{put_pos} = 0;
         $self->{chunked_upload_state} = undef;
    +    $self->{md5_ctx} = undef;
     sub close {
    @@ -134,6 +137,8 @@ sub handle_put {
         return $self->send_response(403) unless $self->{service}->{enable_put};
    +    $self->{md5_ctx} = $hd->header('Content-MD5') ? Digest::MD5->new : undef;
         return if $self->handle_put_chunked;
         # they want to put something, so let's setup and wait for more reads
    @@ -421,6 +426,8 @@ sub put_writeout {
         my $data = join("", map { $$_ } @{$self->{read_buf}});
         my $count = length $data;
    +    my $md5_ctx = $self->{md5_ctx};
    +    $md5_ctx->add($data) if $md5_ctx;
         # reset our input buffer
         $self->{read_buf}   = [];
    @@ -460,6 +467,17 @@ sub put_close {
         if (CORE::close($self->{put_fh})) {
             $self->{put_fh} = undef;
    +        my $md5_ctx = $self->{md5_ctx};
    +        if ($md5_ctx) {
    +            my $actual = $md5_ctx->b64digest;
    +            my $expect = $self->{req_headers}->header("Content-MD5");
    +            $expect =~ s/=+\s*\z//;
    +            if ($actual ne $expect) {
    +                return $self->send_response(400,
    +                    "Content-MD5 mismatch, expected: $expect actual: $actual");
    +            }
    +        }
             return $self->send_response(200);
         } else {
             return $self->system_error("Error saving file", "error in close: $!");
    Eric Wong committed with Nov 27, 2011
  23. add checksum generation/verifiation to replication worker

    replication now lazily generates checksums if they're not
    provided by the client (but required by the storage class).
    replication may also verify checksums if they're available
    in the database.
    replication now sets the Content-MD5 header on PUT requests,
    in case the remote server is capable of rejecting corrupt
    transfers based on it
    replication attempts to verify the checksum of the freshly
    PUT-ed file.
    TODO: monitor will attempt "test-write" with mangled Content-MD5
          to determine if storage backends are Content-MD5-capable
          so replication can avoid reading checksum on destination
    Eric Wong committed with Nov 26, 2011
  24. add MogileFS::FID->checksum function

    This returns undef if a checksum is missing for a class,
    and a MogileFS::Checksum object if it exists.
    Eric Wong committed with Nov 25, 2011
  25. test for update_class with checksumtype=NONE

    we need to be able to both enable and disable checksuming for a class
    Eric Wong committed with Nov 25, 2011
  26. wire up checksum to create_close/file_info/create_class commands

    We can now:
    * enable checksums for classes
    * save client-provided checksums to the database
    * verify them on create_close
    * read them in file_info
    Eric Wong committed with Nov 25, 2011
  27. checksums: genericize to be algorithm-independent, add SHA*

    We'll use the "Digest" class in Perl as a guide for this.
    Only MD5 is officially supported.
    However, this *should* support SHA-(1|256|384|512) and it's easy
    to add more algorithms.
    Eric Wong committed with Nov 25, 2011
  28. checksum: add "from_string" and "save" function

    This can come in handy.
    Eric Wong committed with Nov 25, 2011
  29. doc: add checksums.txt for basic design/implementation notes

    Helps me keep my head straight.
    Eric Wong committed with Nov 25, 2011
  30. t/40-httpfile.t: speedup test with working clear_cache

    This branch is now rebased against my latest clear_cache
    which allows allows much faster metadata updates for testing.
    Eric Wong committed with Nov 24, 2011
  31. class: wire up checksum support to this

    Checksum usage will be decided on a per-class basis.
    Eric Wong committed with Nov 23, 2011
  32. replicate: optional digest support

    Digest::MD5 and Digest::SHA1 both support the same API for
    streaming data for the calculation, so we can validate our
    content as we stream it.
    Eric Wong committed with Nov 23, 2011
  33. store: update class table with checksumtype column

    This is needed to wire up checksums to classes.
    Eric Wong committed with Nov 23, 2011
  34. add MogileFS::Checksum class

    We need a place to store mappings for various checksum
    types we'll support.
    Eric Wong committed with Nov 23, 2011