New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: add a pg _fastinfo attribute to reduce per-io metadata updates #11213
Conversation
@liewegas Hmm. If you are right that the check is sufficiently cheap, then this looks pretty reasonable. However, I'm not entirely convinced that explicitly adding a dirty_small_info (or better name!) would really be that bad. It seems to me that all client writes update the same set of fields (see PG::append_log). Those'll be the ones we want to include in the smaller structure. So, it seems like PG::append_log would just set dirty_small_info and for all other users, dirty_info would imply dirty_small_info (don't care about efficiency in those cases). |
@liewegas Conveniently, neither ECMsgTypes.h:ECSubWrite nor MOSDRepOp include the whole pg_info structure, so you can just read out of those what we are updating! |
That'll need some funky conditional in append_log to decide if the delta_stats can be covered by fast info or not. And we'd need to be really careful that the append_log callers aren't doing other info updates and relying on the dirty_info = true in append_log to cover them. Hmm. |
_prepare_write_info goes from .02% to .10% in my perf profile. I think that's okay. |
No real additional comments, other than that either solution seems like a step up from just blindly varint encoding everything. Given sage's perf results, this seems pretty reasonable. |
@liewegas |
I did sudo perf record --call-graph dwarf -F99 -p and then looked at the result with sudo perf report On Wed, 28 Sep 2016, shun-s wrote:
|
jf.flush(*_dout); | ||
} | ||
*_dout << dendl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas My log is flooding with fastinfo failed error , here is one snippet..
2016-10-04 17:35:50.221450 7fb52ab9c700 -1 _prepare_write_info fastinfo failed, info:
{
"pgid": "1.2b0",
"last_update": "18'41",
"last_complete": "18'41",
"log_tail": "0'0",
"last_user_version": 41,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 15,
"last_epoch_started": 16,
"last_epoch_clean": 16,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 15,
"same_interval_since": 15,
"same_primary_since": 15,
"last_scrub": "0'0",
"last_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_clean_scrub_stamp": "2016-10-04 17:34:54.968066"
},
"stats": {
"version": "18'40",
"reported_seq": "15",
"reported_epoch": "18",
"state": "active+clean",
"last_fresh": "2016-10-04 17:35:42.281977",
"last_change": "2016-10-04 17:34:56.004604",
"last_active": "2016-10-04 17:35:42.281977",
"last_peered": "2016-10-04 17:35:42.281977",
"last_clean": "2016-10-04 17:35:42.281977",
"last_became_active": "2016-10-04 17:34:56.002579",
"last_became_peered": "2016-10-04 17:34:56.002579",
"last_unstale": "2016-10-04 17:35:42.281977",
"last_undegraded": "2016-10-04 17:35:42.281977",
"last_fullsized": "2016-10-04 17:35:42.281977",
"mapping_epoch": 15,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 15,
"last_epoch_clean": 16,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_clean_scrub_stamp": "2016-10-04 17:34:54.968066",
"log_size": 40,
"ondisk_log_size": 40,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 42991616,
"num_objects": 11,
"num_object_clones": 0,
"num_object_copies": 20,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 11,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 82,
"num_write_kb": 41984,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0
},
"up": [
11,
0
],
"acting": [
11,
0
],
"blocked_by": [],
"up_primary": 11,
"acting_primary": 11
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 16,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
last_written_info:
{
"pgid": "1.2b0",
"last_update": "18'41",
"last_complete": "18'41",
"log_tail": "0'0",
"last_user_version": 41,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 15,
"last_epoch_started": 16,
"last_epoch_clean": 16,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 15,
"same_interval_since": 15,
"same_primary_since": 15,
"last_scrub": "0'0",
"last_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_clean_scrub_stamp": "2016-10-04 17:34:54.968066"
},
"stats": {
"version": "18'40",
"reported_seq": "15",
"reported_epoch": "18",
"state": "active+clean",
"last_fresh": "2016-10-04 17:35:42.281977",
"last_change": "2016-10-04 17:34:56.004604",
"last_active": "2016-10-04 17:35:42.281977",
"last_active": "2016-10-04 17:35:42.281977",
"last_peered": "2016-10-04 17:35:42.281977",
"last_clean": "2016-10-04 17:35:42.281977",
"last_became_active": "2016-10-04 17:34:56.002579",
"last_became_peered": "2016-10-04 17:34:56.002579",
"last_unstale": "2016-10-04 17:35:42.281977",
"last_undegraded": "2016-10-04 17:35:42.281977",
"last_fullsized": "2016-10-04 17:35:42.281977",
"mapping_epoch": 15,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 15,
"last_epoch_clean": 16,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2016-10-04 17:34:54.968066",
"last_clean_scrub_stamp": "2016-10-04 17:34:54.968066",
"log_size": 40,
"ondisk_log_size": 40,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 42991616,
"num_objects": 11,
"num_object_clones": 0,
"num_object_copies": 20,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 10,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 82,
"num_write_kb": 41984,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0
},
"up": [
11,
0
],
"acting": [
11,
0
],
"blocked_by": [],
"up_primary": 11,
"acting_primary": 11
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 16,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
But, I am seeing pg_info is 178 bytes now , what's wrong ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed the num_dirty_objects field. Updated, can you retest please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas working fine , no more log flooding , fastinfo size is now 186 bytes..
@liewegas Also in some cases I am seeing _info of 855 bytes and _fastinfo as 0.. 2016-10-04 20:29:58.656150 7f49eeda2700 30 submit_transaction Rocksdb transaction: Seems to be a bug.. |
3717e7d
to
352ee95
Compare
The 0 byte _fastinfo happens when fields outside of the fast info struct update.. it's normal that it happens sometimes, we just want to make it rare-ish so that most IO operations take the fastinfo path. |
208968a
to
93c1ca6
Compare
@liewegas can't we just not write the key if the value is 0 bytes as it will cause extra write in the kv ? |
@liewegas can we merge it ? |
needs to go through qa first. |
we would need to add a version field to ensure that we don't merge in the
fast info if it is stale. that should be possible, though... i'll update
|
Not QA, but I've successfully been running this through performance tests without issue. I think Somnath is checking to see if this is related to his performance issues with longer running tests though. |
93c1ca6
to
f6694db
Compare
b549907
to
6821ee7
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We'll use this shortly. Signed-off-by: Sage Weil <sage@redhat.com>
For most IO operations we only update a handful of fields in the pg_info_t structure. However, the full struct, when encoded, is on the order of 800 bytes. This adds a new attribute, _fastinfo, which contains only the most commonly updated fields. When present, the fastinfo fields should be overlayed on top of the full info struct contained in the existing info attr. If a field outside of the "fast" set is updated, we clear the fastinfo attribute and update the full info attr. Signed-off-by: Sage Weil <sage@redhat.com>
This way we don't have to bother deleting the fastinfo attr, which will reduce the amount of work the underlying store has to do. Signed-off-by: Sage Weil <sage@redhat.com>
All compat pg formats can handle past interval removal and mark-complete. Signed-off-by: Sage Weil <sage@redhat.com>
e5e7f4a
to
57fbb83
Compare
Note info updates (of any kind), fastinfo updates, and biginfo updates. Signed-off-by: Sage Weil <sage@redhat.com>
57fbb83
to
bd407ce
Compare
For most IO operations we only update a handful of fields in the pg_info_t structure. However, the full struct, when encoded, is on the order of 800 bytes.
This adds a new attribute, _fastinfo, which contains only the most commonly updated fields. When present, the fastinfo fields should be overlayed on top of the full info struct contained in the existing info attr. If a field outside of the "fast" set is updated, we clear the fastinfo attribute and update the full info attr.
This takes the approach of speculatively trying to do a fast diff at the lowest level helper function and then comparing whether the result is correct. That comparison is probably the slow part. A much harder and more fragile approach would be to make all the call sites that update pg info (and set dirty_info = true) indicate whether they are updating a fast or non-fast field. I'm not sure that's really practical, so I think this is the way to go (at least for now).
pg_fast_info_t encodes to 178 bytes, vs 847 for pg_info_t. That could probably be improved given the high correlation between the last_* fields.