Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
osd: EC read handling: don't grab an objectstore error to use as the … #16663
A missing shard (ENOENT from objectstore) will currently get back to the client. Since ErasureCode::minimum_to_decode() returns EIO when there aren't enough shards to read an object, I'm going to always use that error code even though we've collected any and all errors from each shard's attempt to read.
If you use cmpext to read an object that doesn't exist, you should still get ENOENT I assume based on the primary having no shard. Also, if OSDs are down such that there aren't enough readable shards available that is handled earlier in the process (maybe that case hangs). Finally, if we get past those two cases and reading shards get ENOENT from the objectstore, we need to decide what error code to return. I hadn't thought about the fact that a client can't tell if ENOENT means that an object doesn't exist, or there are not enough shards because one or more shards got ENOENT when attempting to read from the objectstore.
referenced this pull request
Jul 28, 2017
I don't think this has anything to do with CMPEXT (other than that having surfaced the issue again), does it?
In general yes, we should only return ENOENT if we believe that the object does not exist, not just because there happened to have been a missing shard somewhere.
@gregsfortytwo This fixes CMPEXT getting confused by an ENOENT due to filestore error(s) for an object that should exist and doesn't contain all zeros.
non-existent object -> ENOENT
A scenario we aren't handling yet is a combination of down OSDs and filestore errors. If the non-error filestore reads plus the down OSDs yields sufficient shards, we should hang until those OSDs rejoin, otherwise it is a permanent error.
With this change and the removal all non-primary shards for object named "foo":
$ bin/rados -p ecpool get foo foo.out
With these clogs of objectstore errors.
Of course, we already do return an error in that case, so this isn't worse. But we need to fix this, I think?