Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: add check for index entry's existing when adding bucket stats during bucket reshard. #29062

Merged
merged 1 commit into from May 4, 2020

Conversation

zhangsw
Copy link
Contributor

@zhangsw zhangsw commented Jul 16, 2019

If we reshard versioning bucket, bucket stats will contain extra information rgw.none.
Before reshard:

"mtime": "2019-07-16T09:39:07.175539Z",
    "max_marker": "0#,1#,2#",
    "usage": {
        "rgw.main": {
            "size": 2,
            "size_actual": 4096,
            "size_utilized": 2,
            "size_kb": 1,
            "size_kb_actual": 4,
            "size_kb_utilized": 1,
            "num_objects": 1
        }
    },

After reshard:

"mtime": "2019-07-17T02:36:12.490795Z",
    "max_marker": "0#,1#,2#,3#,4#",
    "usage": {
        "rgw.none": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 1
        },
        "rgw.main": {
            "size": 2,
            "size_actual": 4096,
            "size_utilized": 2,
            "size_kb": 1,
            "size_kb_actual": 4,
            "size_kb_utilized": 1,
            "num_objects": 1
        }
    },

Versioning bucket has idx like below and it's counted in stats while resharding bucket.

"type": "plain",
        "idx": "obj10",
        "entry": {
            "name": "obj10",
            "instance": "",
            "ver": {
                "pool": -1,
                "epoch": 0
            },
            "locator": "",
            "exists": "false",
            "meta": {
                "category": 0,
                "size": 0,
                "mtime": "0.000000",
                "etag": "",
                "storage_class": "",
                "owner": "",
                "owner_display_name": "",
                "content_type": "",
                "accounted_size": 0,
                "user_data": "",
                "appendable": "false"
            },
            "tag": "",
            "flags": 8,
            "pending_map": [],
            "versioned_epoch": 0
        }

Fixes: https://tracker.ceph.com/issues/45970

@mattbenjamin
Copy link
Contributor

@zhangsw I think it would be helpful for the commit msg/description to provide a bit more of an explanation, if possible?

@zhangsw
Copy link
Contributor Author

zhangsw commented Jul 17, 2019

@mattbenjamin I've updated the description

@stale
Copy link

stale bot commented Sep 15, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Sep 15, 2019
@stale stale bot removed the stale label Nov 21, 2019
@ivancich
Copy link
Member

rgw.none statistics have been a common source of confusion amongst users, it seems. So I appreciate this PR to try to address the issue.

So recently I looked to see what causes an rgw.none stat to appear. Here's what I came to understand:

So I initially set out to look into the use of rgw.none.

It seems like updates to the bucket index are done transactionally. Normally the update operation is prepared, then it is done, and then it is marked as completed. If the operation cannot complete, the sequence is prepared, attempt with error, and finally cancelled.

There are two ways an entry can be listed as Category::None. First if it is a delete operation; not a delete marker, just a delete. Second is if the transaction is cancelled. The entry is apparently left in the index and simply marked as cancelled.

It looks like deletes on versioned buckets are handled separately altogether and do not make use of Category::None.

So this seems to be compatible with Vikhyat's observations. I would expect the size of deleted items and of cancelled updates to be zero. But their count could be non-zero.

I'm curious as to whether there's a mechanism to garbage collect these entries. Additionally I'm curious as to whether these entries are copied to the new bucket index during resharding. I haven't looked at either of those issues yet.

So we have two options. One is not to even collect these stats, which your current code change does. The other is to keep collecting them but not reporting them.

So I thought I'd throw this question out to @zhangsw, @mattbenjamin, @cbodley, and @theanalyst to see what you all think.

Thanks everyone!

@ivancich
Copy link
Member

@vumrao I would like to invite you to comment on this issue as well. See my previous comment.

@ivancich
Copy link
Member

FWIW after thinking about it, I lean towards the solution in this PR -- not collecting the stats.

@vumrao
Copy link
Contributor

vumrao commented Nov 22, 2019

@vumrao I would like to invite you to comment on this issue as well. See my previous comment.

Thanks, @ivancich. I really like the idea of not reporting it as you mentioned and we have seen downstream, it causes a lot of confusion. I think if we won't report it then it will take care of keeping the object count reporting consistent? Like we saw in downstream when rgw.none comes in the object count reporting is not consistent.

  • In below example, when adding 25 million objects in a bucket and the bucket went with resharding and it started reporting rgw.none then in the rgw.main the object count was not consistent with the number of objects created in the given bucket instead the rgw.none has some object count. Hope with this fix when we will not report rgw.none we will have 25 million objects reported in rgw.main when we will create 25 million objects in the given bucket.
 {
        "bucket": "mycontainers1",
        "zonegroup": "86f0b214-aad6-4ae9-8d5b-c552f58ab508",
        "placement_rule": "default-placement",
        "explicit_placement": {
            "data_pool": "",
            "data_extra_pool": "",
            "index_pool": ""
        },
        "id": "b7db2537-67a5-4a17-97c9-06dcef12b4df.5746040.1",
        "marker": "b7db2537-67a5-4a17-97c9-06dcef12b4df.5755938.1",
        "index_type": "Normal",
        "owner": "johndoe",
        "ver": "0#82343,1#82749,2#82673,3#82462,4#82599,5#82268,6#82670,7#82807,8#82759,9#83009,10#82555,11#83044,12#82716,13#82649,14#82810,15#82351,16#82788,17#82377,18#82640,19#82459,20#82569,21#82965,22#82545,23#83001,24#82850,25#82638,26#83037,27#82499,28#82945,29#82497,30#82232,31#82659,32#82475,33#82737,34#82382,35#82589,36#82820,37#82595,38#83353,39#82493,40#83019,41#82823,42#82767,43#83132,44#82171,45#82677,46#82447,47#82561,48#82588,49#82231,50#82927,51#82580,52#82748,53#83020,54#82821,55#83024,56#82403,57#82910,58#82570,59#82350,60#82721,61#82235,62#82463,63#82576,64#82644,65#82667,66#82517,67#83058,68#82762,69#82830,70#82670,71#82543,72#83046,73#82440,74#82632,75#82704,76#82419,77#82749,78#82702,79#82739,80#82713,81#82679,82#83035,83#82690,84#82645,85#82735,86#82716,87#82668,88#82390,89#82952,90#82472,91#82275,92#82594,93#82627,94#82846,95#82668,96#82699,97#83052,98#82567,99#82904,100#82679,101#82632,102#82624,103#82613,104#82683,105#82198,106#82463,107#82787,108#82667,109#82697,110#82643,111#82947,112#82651,113#82709,114#82965,115#82462,116#82420,117#82423,118#82784,119#82520,120#82355,121#79669,122#79497,123#79921,124#80028,125#79928,126#80234,127#79735,128#79997,129#80172,130#79748,131#79849,132#79872,133#79797,134#79481,135#79689,136#79825,137#79595,138#79780,139#79832,140#79963,141#80039,142#79941,143#80220,144#79736,145#79606,146#79718,147#79770,148#79801,149#79496,150#79707,151#79445,152#79632,153#80339,154#80134,155#79743,156#79809,157#80013,158#80090,159#79651,160#80015,161#79820,162#79537,163#79759,164#79717,165#79844,166#79827,167#79756,168#79695,169#79980,170#79979,171#80059,172#80068,173#79764,174#79780,175#79932,176#79791,177#79687,178#79972,179#79783,180#79693,181#79633,182#79998,183#80294,184#79823,185#79896,186#80006,187#80053,188#80047,189#79999,190#79576,191#79562,192#79778,193#79756,194#79766,195#79741,196#79644,197#79845,198#79940,199#79990,200#80197,201#79838,202#79585,203#79980,204#79916,205#79783,206#79527,207#79705,208#79661,209#79727,210#79858,211#79996,212#80075,213#79874,214#80134,215#80003,216#79936,217#80036,218#79938,219#79541,220#79805,221#79700,222#79812,223#79766,224#79542,225#79910,226#79792,227#79957,228#80174,229#80169,230#80013,231#79694,232#80051,233#79919,234#79882,235#79523,236#79357,237#79655,238#79751,239#80021,240#79912,241#79712,242#80119,243#80026,244#80327,245#79787,246#79983,247#79942,248#79527,249#79873,250#79645,251#79717,252#79711,253#79494,254#79994,255#79687,256#79947,257#80084,258#79915,259#80143,260#79774,261#80018,262#79667,263#79850,264#80011,265#79567,266#79938,267#79550,268#79908,269#80272,270#79742,271#80021,272#79868,273#80133,274#80001,275#79832,276#80052",
        "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0,64#0,65#0,66#0,67#0,68#0,69#0,70#0,71#0,72#0,73#0,74#0,75#0,76#0,77#0,78#0,79#0,80#0,81#0,82#0,83#0,84#0,85#0,86#0,87#0,88#0,89#0,90#0,91#0,92#0,93#0,94#0,95#0,96#0,97#0,98#0,99#0,100#0,101#0,102#0,103#0,104#0,105#0,106#0,107#0,108#0,109#0,110#0,111#0,112#0,113#0,114#0,115#0,116#0,117#0,118#0,119#0,120#0,121#0,122#0,123#0,124#0,125#0,126#0,127#0,128#0,129#0,130#0,131#0,132#0,133#0,134#0,135#0,136#0,137#0,138#0,139#0,140#0,141#0,142#0,143#0,144#0,145#0,146#0,147#0,148#0,149#0,150#0,151#0,152#0,153#0,154#0,155#0,156#0,157#0,158#0,159#0,160#0,161#0,162#0,163#0,164#0,165#0,166#0,167#0,168#0,169#0,170#0,171#0,172#0,173#0,174#0,175#0,176#0,177#0,178#0,179#0,180#0,181#0,182#0,183#0,184#0,185#0,186#0,187#0,188#0,189#0,190#0,191#0,192#0,193#0,194#0,195#0,196#0,197#0,198#0,199#0,200#0,201#0,202#0,203#0,204#0,205#0,206#0,207#0,208#0,209#0,210#0,211#0,212#0,213#0,214#0,215#0,216#0,217#0,218#0,219#0,220#0,221#0,222#0,223#0,224#0,225#0,226#0,227#0,228#0,229#0,230#0,231#0,232#0,233#0,234#0,235#0,236#0,237#0,238#0,239#0,240#0,241#0,242#0,243#0,244#0,245#0,246#0,247#0,248#0,249#0,250#0,251#0,252#0,253#0,254#0,255#0,256#0,257#0,258#0,259#0,260#0,261#0,262#0,263#0,264#0,265#0,266#0,267#0,268#0,269#0,270#0,271#0,272#0,273#0,274#0,275#0,276#0",
        "mtime": "2019-10-24 19:57:34.188184",
        "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,42#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#,64#,65#,66#,67#,68#,69#,70#,71#,72#,73#,74#,75#,76#,77#,78#,79#,80#,81#,82#,83#,84#,85#,86#,87#,88#,89#,90#,91#,92#,93#,94#,95#,96#,97#,98#,99#,100#,101#,102#,103#,104#,105#,106#,107#,108#,109#,110#,111#,112#,113#,114#,115#,116#,117#,118#,119#,120#,121#,122#,123#,124#,125#,126#,127#,128#,129#,130#,131#,132#,133#,134#,135#,136#,137#,138#,139#,140#,141#,142#,143#,144#,145#,146#,147#,148#,149#,150#,151#,152#,153#,154#,155#,156#,157#,158#,159#,160#,161#,162#,163#,164#,165#,166#,167#,168#,169#,170#,171#,172#,173#,174#,175#,176#,177#,178#,179#,180#,181#,182#,183#,184#,185#,186#,187#,188#,189#,190#,191#,192#,193#,194#,195#,196#,197#,198#,199#,200#,201#,202#,203#,204#,205#,206#,207#,208#,209#,210#,211#,212#,213#,214#,215#,216#,217#,218#,219#,220#,221#,222#,223#,224#,225#,226#,227#,228#,229#,230#,231#,232#,233#,234#,235#,236#,237#,238#,239#,240#,241#,242#,243#,244#,245#,246#,247#,248#,249#,250#,251#,252#,253#,254#,255#,256#,257#,258#,259#,260#,261#,262#,263#,264#,265#,266#,267#,268#,269#,270#,271#,272#,273#,274#,275#,276#",
        "usage": {
            "rgw.none": {
                "size": 0,
                "size_actual": 0,
                "size_utilized": 0,
                "size_kb": 0,
                "size_kb_actual": 0,
                "size_kb_utilized": 0,
                "num_objects": 148
            },
            "rgw.main": {
                "size": 374089750000,
                "size_actual": 415160393728,
                "size_utilized": 374089750000,
                "size_kb": 365322022,
                "size_kb_actual": 405430072,
                "size_kb_utilized": 365322022,
                "num_objects": 24999855
            }
        },
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": true,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    },

@ivancich
Copy link
Member

@vumrao So you're speculating that the rgw.none is hiding some real objects. And it is interesting that 24,999,855 + 148 = 25000003, just over the 25,000,000 expected.

But given my earlier analysis, a bucket-index entry that counts as rgw.none happens when an operation is cancelled. So I think the more likely explanation is that 145 objects did not actually get fully uploaded. This PR cannot address that issue. Can you verify that the bucket actually has 25,000,000 GETable objects?

@ivancich
Copy link
Member

ivancich commented Nov 22, 2019

We just had someone report on ceph-users that they have an rgw.none num_objects stat that appears to be 2^64 - 13, and they believe it triggered resharding to 65521 shards.

See: ceph-users thread head

I'm not certain, though, how they got -13 via a reshard operation. I'm guessing this happened due to bucket index manipulations that don't account for all ops.

So this is a buggy area and I'm thinking we should removal all rgw.none calculations, during resharding and during bucket-index updates.

@vumrao
Copy link
Contributor

vumrao commented Dec 3, 2019

Can you verify that the bucket actually has 25,000,000 GETable objects?

Thanks, @ivancich. Sorry setup was removed but it looks like you are right these objects never made to the cluster. I will discuss with team and see if they will see this issue again in the small count will ask them to check the count from the application side(like s3cmd or swift) and see if we have correct count. If application is reporting the same what is listed in rgw.main then it will prove that rgw.none objects never made to the cluster.

@theanalyst
Copy link
Member

@ivancich thanks for the detailed analysis, I also agree that not accounting the stats in the none category (as proposed here) makes sense

@cbodley
Copy link
Contributor

cbodley commented Jan 2, 2020

        "exists": "false",

i think this flag is what reshard is failing to consider. rgw_bucket_complete_op() in cls_rgw.cc is only accounting for entries with exists=true, so reshard should ignore entries with exists=false to remain consistent and generate the same stats

@IlsooByun
Copy link
Contributor

IlsooByun commented Jan 3, 2020

@ivancich There are some things to consider before removing this stat. In my experience, this is a frequent occurrence in our environment and can lead to serious performance issues when listing objects, so this information may be necessary to proactively handle the situation.
For example, I had a performance issue when listing objects within some buckets. After investigating, I found that the reason is that there are too many rgw.none indexes in this bucket. This problem was solved with the radosgw-admin fix command.
If there are too many rgw.none indexes, that means that bucket needs to be fixed. This can be a useful information.

@ivancich
Copy link
Member

ivancich commented Jan 3, 2020

@zhangsw Given @cbodley's insight and @IlsooByun's good argument for keeping the functionality, would you like to update this PR or submit a new one that keeps rgw.none but checks the exists flag?

Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment, but I think it'd be better to go in the direction @cbodley is suggesting. Please let us know your intentions -- whether to modify this PR or create a new one. Thanks!

@stale
Copy link

stale bot commented Mar 3, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Mar 3, 2020
@ivancich ivancich removed the stale label Mar 5, 2020
…ring resharding.

Signed-off-by: zhang Shaowen <zhangshaowen@cmss.chinamobile.com>
@zhangsw
Copy link
Contributor Author

zhangsw commented Apr 21, 2020

@ivancich @cbodley I‘ve added a check for “exists”, it works~

@zhangsw zhangsw changed the title rgw: remove rgw.none category from bucket stats result which is generated by resharding versioning bucket rgw: add check for index entry's existing when adding bucket stats during bucket reshard. Apr 21, 2020
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@ivancich ivancich self-requested a review April 21, 2020 14:24
@mattbenjamin
Copy link
Contributor

@ivancich what's the way forward for this change?

@ivancich
Copy link
Member

@ivancich what's the way forward for this change?

I think QA and then very likely merge, @mattbenjamin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants