Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.0.0 #536

Closed
neverchanje opened this issue May 14, 2020 · 1 comment
Closed

Release 2.0.0 #536

neverchanje opened this issue May 14, 2020 · 1 comment
Labels
release-note Notes on the version release

Comments

@neverchanje
Copy link
Contributor

neverchanje commented May 14, 2020

2.0.0

rDSN

PR (34 TOTAL) TITLE
XiaoMi/rdsn#468 fix: fix bug in hpc_task_queue
XiaoMi/rdsn#462 improvement(third-party): Update the concurrentqueue to a stable release version
XiaoMi/rdsn#472 fix(dup): reject add_dup if remote address incorrect
XiaoMi/rdsn#474 refactor: use rpc_holder to reimplement on_query_configuration_by_index
XiaoMi/rdsn#470 fix: fix memory leak in dsn_message_parser
XiaoMi/rdsn#461 feat: add rate limit for learning and support remote-command
XiaoMi/rdsn#458 refactor: change large write size request forbiden log
XiaoMi/rdsn#459 fix: fix the bug in restore
XiaoMi/rdsn#457 feat(bulk-load): meta server send bulk load request
XiaoMi/rdsn#454 feat(bulk-load): meta server start bulk load
XiaoMi/rdsn#443 feat(cold-backup): add rate limit for fds
XiaoMi/rdsn#456 feat: update rpc_holder
XiaoMi/rdsn#455 fix: fix bug in local_service
XiaoMi/rdsn#452 fix: fix memory leak in meta_load_balance_test
XiaoMi/rdsn#445 feat(bulk-load): add meta_bulk_load_service and some structures
XiaoMi/rdsn#450 feat(dup): optimize time-lag by reducing repeat delay
XiaoMi/rdsn#451 fix: fix memory leak in meta_state_service_simple
XiaoMi/rdsn#275 refactor(util): use flags to define configurations
XiaoMi/rdsn#448 fix(dup): don't GC by valid_start_offset during duplication & add app_name to replica_base
XiaoMi/rdsn#433 feat: update the way to get heap profile
XiaoMi/rdsn#449 fix: fix memory leak in meta test
XiaoMi/rdsn#441 feat(hotkey): add replication.codes about hotkey detect
XiaoMi/rdsn#447 fix(asan): memory leak in local_service.cpp
XiaoMi/rdsn#442 feat(util): add restore_read function in rpc_message
XiaoMi/rdsn#391 feat(split): register child partition
XiaoMi/rdsn#446 fix(asan): heap-use-after-free caused by using string_view in fail_point
XiaoMi/rdsn#418 feat: append mlog in fixed-size blocks using log_appender
XiaoMi/rdsn#436 refactor: simplify mutation_log write_pending_mutations
XiaoMi/rdsn#434 refactor(backup): move collect_backup_info to replica_backup_manager
XiaoMi/rdsn#432 refactor(backup): make backup clear decoupled from on_cold_backup
XiaoMi/rdsn#419 feat: add perf-counter for backup request
XiaoMi/rdsn#408 feat: refine mlog_dump output
XiaoMi/rdsn#255 refactor(rpc): refactor request meta & add support for backup request

Pegasus

PR (28 TOTAL) TITLE
#539 docs: update CentOS build dependencies
#538 fix(shell/sds): NULL check before memset
#533 feat: overload dump_write_request of replication_app_base.h
#537 fix: fix bug in table_hotspot_policy
#534 fix: fix bug in local_service
#532 feat(rocksdb): Support to config meta data read source
#530 improvement(cu): set ttl on capacity unit data
#528 improvement(lb): add add_node_list.sh to add nodes with copy_pri after all copy_sec done
#529 improvement: add logging on rocksdb write stalls
#526 feat(dup): add metric for time lag between master&slave
#525 fix: bind prometheus exposer to 0.0.0.0
#524 refactor(server_impl): Separate constructor to an independent file
#522 feat(rocksdb): Support more configurable items for bloom filter
#520 feat(dup): support shell set fail_mode and collector duplication ops
#521 feat(metrics): Add bloom filter related metrics
#519 feat(shell): count_data return estimate count by default
#514 fix(collector): no validate the app_name after parse_app_perf_counter_name
#504 fix: include key size while calculating capacity units
#494 feat(rocksdb): write meta info both in manifest and meta CF
#501 feat: statistics backup request qps in info collector
#499 feat: support of getting backup request perf-counter in command_helper
#473 feat(rocksdb): Bump rocksdb to v6.6.4
#459 feat(dup): write pegasus value in new data version for duplication
#436 chore: upgrade dev version to 1.13.SNAPSHOT

Incompatitable Modifications

#459 feat(dup): write pegasus value in new data version for duplication
#473 feat(rocksdb): Bump rocksdb to v6.6.4
XiaoMi/rdsn#255 refactor(rpc): refactor request meta & add support for backup request

New perf-counter

New configuration

[pegasus.server]
+rocksdb_bloom_filter_bits_per_key = 10
+rocksdb_format_version = 2
+dup_lagging_write_threshold_ms = 10000
+get_meta_store_type = manifest

[replication]
+ fds_write_limit_rate = 100
+ fds_read_limit_rate = 100

[nfs]
+ max_copy_rate_megabytes = 500
@neverchanje neverchanje added the release-note Notes on the version release label May 14, 2020
@neverchanje
Copy link
Contributor Author

neverchanje commented May 14, 2020

Release Note

NOTE: 2.0.0 is backward-compatible only, which means servers upgraded to this version can't rollback to previous versions.

The following are the highlights in this release:

Duplication

Duplication is the solution of Pegasus for intra-cluster data copying in real-time. We currently limit our master-master duplication for 'PUT' and 'MULTI_PUT' only. See this document for more details:
https://pegasus-kv.github.io/administration/duplication.

Backup Request

Backup Request is a way to eliminate tail latency by sacrificing minor data consistency, fallback reading from a random secondary when the primary read failed to finish at the expected time.
See the discussion here: #251.

RocksDB Meta CF

Pegasus currently has a hacked version of RocksDB that stores a few metadata in the manifest file, which makes our RocksDB incompatible with the official version. In this version, we exploit an additional column family (called 'Meta CF') to store those metadata.

To finally get rid of the legacy RocksDB, you must first upgrade the ReplicaServer to 2.0.0.

Bloom Filter Optimization

This time we support metrics for the utilization of bloom filters in Pegasus. And for critical scenarios, we provide configurations for performance tuning on bloom filters.
See #522, #521.

Cold-Backup FDS Limit

This feature adds throttling on download and upload during cold-backup.
See XiaoMi/rdsn#443.

Adding Node Optimization

We previously suffer from the effect brought by data migration when adding one or more nodes into a cluster. In some latency-critical scenarios (mostly focus on read-latency) this (3~10 times increase in latency) usually implies the service briefly unavailable.

In 2.0.0 we support a strategy that the new nodes do not serve read requests until most migrations are done. Although the new nodes still participate in write-2PC and the overall migration workload doesn't decrease, the read latency significantly improved thanks to this job.

Be aware that this feature requires merely pegasus-tools to be 2.0.0, you don't have to upgrade the server to 2.0.0. See #528.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Notes on the version release
Projects
None yet
Development

No branches or pull requests

1 participant