Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: implement object content recompression/defragmentation when scrubbing #57631

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
17 changes: 14 additions & 3 deletions src/common/options/global.yaml.in
Original file line number Diff line number Diff line change
Expand Up @@ -4493,25 +4493,36 @@ options:
desc: Default policy for using compression when pool does not specify
long_desc: '''none'' means never use compression. ''passive'' means use compression
when clients hint that data is compressible. ''aggressive'' means use compression
unless clients hint that data is not compressible. This option is used when the
per-pool property for the compression mode is not present.'
unless clients hint that data is not compressible. ''*_lazy'' counterparts instruct
to apply compression as per above during deep-scrubbing only. Which has to be
additionally enabled at per-pool level using ''deep_scrub_recompression'' pool
setting.
This option is used when the per-pool property for the compression mode is not present.'
fmt_desc: The default policy for using compression if the per-pool property
``compression_mode`` is not set. ``none`` means never use
compression. ``passive`` means use compression when
:c:func:`clients hint <rados_set_alloc_hint>` that data is
compressible. ``aggressive`` means use compression unless
clients hint that data is not compressible. ``force`` means use
compression under all circumstances even if the clients hint that
the data is not compressible.
the data is not compressible. ''*_lazy'' modes are similar to their
counterpart ones but compression to be performed during deep
scrubbing only. Which has to be additionally enabled on per-pool basis
using ''deep_scrub_recompress'' pool setting.
default: none
enum_values:
- none
- passive
- aggressive
- force
- passive_lazy
- aggressive_lazy
- force_lazy
flags:
- runtime
with_legacy: true
see_also:
- bluestore_prefer_deferred_size
- name: bluestore_compression_algorithm
type: str
level: advanced
Expand Down
10 changes: 10 additions & 0 deletions src/compressor/Compressor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ const char *Compressor::get_comp_mode_name(int m) {
case COMP_PASSIVE: return "passive";
case COMP_AGGRESSIVE: return "aggressive";
case COMP_FORCE: return "force";
case COMP_PASSIVE_LAZY: return "passive_lazy";
case COMP_AGGRESSIVE_LAZY: return "aggressive_lazy";
case COMP_FORCE_LAZY: return "force_lazy";

default: return "???";
Copy link
Contributor

@aclamk aclamk Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need following combination of compressions for regular operation and for scrub.

regular-compression scrub-compression
none none
none passive
none agressive
none force
passive passive
passive agresive
passive force
aggresive agresive
aggresive force
force force

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... So you think we might need a separate lazy_compression_mode parameter (none/passive/aggressive/force) for background compression, right? Wouldn't that be an overkill for users?

}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate discussion should be if we shouldn't give different conf.bluestore_compression_required_ratio for scrub recompression.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from purist's perspective we might need a completely different set of compression settings for lazy/background mode. Whether we really need that - I'm not sure...

Expand All @@ -65,6 +69,12 @@ Compressor::get_comp_mode_type(std::string_view s) {
return COMP_AGGRESSIVE;
if (s == "passive")
return COMP_PASSIVE;
if (s == "force_lazy")
return COMP_FORCE_LAZY;
if (s == "aggressive_lazy")
return COMP_AGGRESSIVE_LAZY;
if (s == "passive_lazy")
return COMP_PASSIVE_LAZY;
if (s == "none")
return COMP_NONE;
return {};
Expand Down
5 changes: 4 additions & 1 deletion src/compressor/Compressor.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,10 @@ class Compressor {
COMP_NONE, ///< compress never
COMP_PASSIVE, ///< compress if hinted COMPRESSIBLE
COMP_AGGRESSIVE, ///< compress unless hinted INCOMPRESSIBLE
COMP_FORCE ///< compress always
COMP_FORCE, ///< compress always
COMP_PASSIVE_LAZY, ///< delayed compression during reformatting if hinted COMPRESSIBLE
COMP_AGGRESSIVE_LAZY, ///< delayed_compression during reformatting unless hinted INCOMPRESSIBLE
COMP_FORCE_LAZY ///< unconditional delayed_compression during reformatting
};

static const char* get_comp_alg_name(int a);
Expand Down
1 change: 1 addition & 0 deletions src/include/rados.h
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,7 @@ enum {
CEPH_OSD_OP_FLAG_FADVISE_NOCACHE = 0x40, /* data will be accessed only once by this client */
CEPH_OSD_OP_FLAG_WITH_REFERENCE = 0x80, /* need reference couting */
CEPH_OSD_OP_FLAG_BYPASS_CLEAN_CACHE = 0x100, /* bypass ObjectStore cache, mainly for deep-scrub */
CEPH_OSD_OP_FLAG_ALLOW_DATA_REFORMATTING = 0x200, /*ObjectStore can optimize data layout afterwards*/
};

#define EOLDSNAPC 85 /* ORDERSNAP flag set; writer has old snapc*/
Expand Down
4 changes: 2 additions & 2 deletions src/mon/MonCommands.h
Original file line number Diff line number Diff line change
Expand Up @@ -1141,11 +1141,11 @@ COMMAND("osd pool rename "
"rename <srcpool> to <destpool>", "osd", "rw")
COMMAND("osd pool get "
"name=pool,type=CephPoolname "
"name=var,type=CephChoices,strings=size|min_size|pg_num|pgp_num|crush_rule|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|target_max_objects|target_max_bytes|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority|compression_mode|compression_algorithm|compression_required_ratio|compression_max_blob_size|compression_min_blob_size|csum_type|csum_min_block|csum_max_block|allow_ec_overwrites|fingerprint_algorithm|pg_autoscale_mode|pg_autoscale_bias|pg_num_min|pg_num_max|target_size_bytes|target_size_ratio|dedup_tier|dedup_chunk_algorithm|dedup_cdc_chunk_size|eio|bulk|read_ratio",
"name=var,type=CephChoices,strings=size|min_size|pg_num|pgp_num|crush_rule|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|target_max_objects|target_max_bytes|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority|compression_mode|compression_algorithm|compression_required_ratio|compression_max_blob_size|compression_min_blob_size|csum_type|csum_min_block|csum_max_block|allow_ec_overwrites|fingerprint_algorithm|pg_autoscale_mode|pg_autoscale_bias|pg_num_min|pg_num_max|target_size_bytes|target_size_ratio|dedup_tier|dedup_chunk_algorithm|dedup_cdc_chunk_size|eio|bulk|read_ratio|deep_scrub_defragment|deep_scrub_recompress",
"get pool parameter <var>", "osd", "r")
COMMAND("osd pool set "
"name=pool,type=CephPoolname "
"name=var,type=CephChoices,strings=size|min_size|pg_num|pgp_num|pgp_num_actual|crush_rule|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority|compression_mode|compression_algorithm|compression_required_ratio|compression_max_blob_size|compression_min_blob_size|csum_type|csum_min_block|csum_max_block|allow_ec_overwrites|fingerprint_algorithm|pg_autoscale_mode|pg_autoscale_bias|pg_num_min|pg_num_max|target_size_bytes|target_size_ratio|dedup_tier|dedup_chunk_algorithm|dedup_cdc_chunk_size|eio|bulk|read_ratio "
"name=var,type=CephChoices,strings=size|min_size|pg_num|pgp_num|pgp_num_actual|crush_rule|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority|compression_mode|compression_algorithm|compression_required_ratio|compression_max_blob_size|compression_min_blob_size|csum_type|csum_min_block|csum_max_block|allow_ec_overwrites|fingerprint_algorithm|pg_autoscale_mode|pg_autoscale_bias|pg_num_min|pg_num_max|target_size_bytes|target_size_ratio|dedup_tier|dedup_chunk_algorithm|dedup_cdc_chunk_size|eio|bulk|read_ratio|deep_scrub_defragment|deep_scrub_recompress "
"name=val,type=CephString "
"name=yes_i_really_mean_it,type=CephBool,req=false",
"set pool parameter <var> to <val>", "osd", "rw")
Expand Down
11 changes: 9 additions & 2 deletions src/mon/OSDMonitor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5405,7 +5405,8 @@ namespace {
CSUM_TYPE, CSUM_MAX_BLOCK, CSUM_MIN_BLOCK, FINGERPRINT_ALGORITHM,
PG_AUTOSCALE_MODE, PG_NUM_MIN, TARGET_SIZE_BYTES, TARGET_SIZE_RATIO,
PG_AUTOSCALE_BIAS, DEDUP_TIER, DEDUP_CHUNK_ALGORITHM,
DEDUP_CDC_CHUNK_SIZE, POOL_EIO, BULK, PG_NUM_MAX, READ_RATIO };
DEDUP_CDC_CHUNK_SIZE, POOL_EIO, BULK, PG_NUM_MAX, READ_RATIO,
DEEP_SCRUB_DEFRAGMENT, DEEP_SCRUB_RECOMPRESS };

std::set<osd_pool_get_choices>
subtract_second_from_first(const std::set<osd_pool_get_choices>& first,
Expand Down Expand Up @@ -6156,7 +6157,9 @@ bool OSDMonitor::preprocess_command(MonOpRequestRef op)
{"dedup_chunk_algorithm", DEDUP_CHUNK_ALGORITHM},
{"dedup_cdc_chunk_size", DEDUP_CDC_CHUNK_SIZE},
{"bulk", BULK},
{"read_ratio", READ_RATIO}
{"read_ratio", READ_RATIO},
{"deep_scrub_defragment", DEEP_SCRUB_DEFRAGMENT},
{"deep_scrub_recompress", DEEP_SCRUB_RECOMPRESS},
};

typedef std::set<osd_pool_get_choices> choices_set_t;
Expand Down Expand Up @@ -6403,6 +6406,8 @@ bool OSDMonitor::preprocess_command(MonOpRequestRef op)
case DEDUP_CHUNK_ALGORITHM:
case DEDUP_CDC_CHUNK_SIZE:
case READ_RATIO:
case DEEP_SCRUB_DEFRAGMENT:
case DEEP_SCRUB_RECOMPRESS:
pool_opts_t::key_t key = pool_opts_t::get_opt_desc(i->first).key;
if (p->opts.is_set(key)) {
if(*it == CSUM_TYPE) {
Expand Down Expand Up @@ -6567,6 +6572,8 @@ bool OSDMonitor::preprocess_command(MonOpRequestRef op)
case DEDUP_CHUNK_ALGORITHM:
case DEDUP_CDC_CHUNK_SIZE:
case READ_RATIO:
case DEEP_SCRUB_DEFRAGMENT:
case DEEP_SCRUB_RECOMPRESS:
for (i = ALL_CHOICES.begin(); i != ALL_CHOICES.end(); ++i) {
if (i->second == *it)
break;
Expand Down
Loading
Loading