Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: remove unused parameters. #18635

Merged
merged 1 commit into from Nov 6, 2017

Conversation

majianpeng
Copy link
Member

No description provided.

@majianpeng
Copy link
Member Author

@xiexingguo . Can you review this PR? Thanks!

@xiexingguo xiexingguo changed the title Bluestore omap flush os/bluestore: drop o->flush on omap update paths Oct 31, 2017
xiexingguo
xiexingguo previously approved these changes Oct 31, 2017
Copy link
Member

@xiexingguo xiexingguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sounds reasonable to me

@majianpeng
Copy link
Member Author

retest

Copy link

@amitkumar50 amitkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update PR description why o->flush() is removed.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flushes are necessary!

@@ -10733,7 +10733,6 @@ int BlueStore::_do_remove(
bool is_gen = !o->oid.is_no_gen();
_do_truncate(txc, c, o, 0, is_gen ? &maybe_unshared_blobs : nullptr);
if (o->onode.has_omap()) {
o->flush();
_do_omap_clear(txc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_do_omap_clear has to iterator over the keys in order to delete them, so the flush() is needed here. We could move it into _do_omap_clear for better clarity, perhaps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it use rm_range_keys . so we only consider rm_range_keys

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry, I am unfamiliar with RocksDB)

@liewegas The root cause here is because there are still chances we are manipulating the omap of the same object in concurrent transaction groups, such as pglogs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, if following osd_op rm the key which previous op add,but the previous don't submit .so db don't know key exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, jianpeng.

if following osd_op rm the key which previous op add,but the previous don't submit .so db don't know key exist.

Then add_omap_keys and rm_omap_keys should be two separate ops, right? Generally PG should order write ops manipulating the same object, which means rm_omap_keys should not be handled by BlueStore until add_omap_keys is stable, but I guess there might be exceptions(e.g., pglogs?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly.. when i fixed this the test failure i saw was that the pg log was removed but left a single omap key behind, confusing things down the line.

the problem is that in order to implement the rm_range_keys we have to enumerate them which means we have to flush prior writes.

someday when rocksdb let's us enable the RangeDelete we can take these out.. but not until then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine to have an add followed by a delete.. just not a range delete that requires us to enumerate in order to name the keys to delete.

Copy link
Member

@xiexingguo xiexingguo Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas Thanks for the excellent clarifications:-)

@@ -11078,7 +11076,6 @@ int BlueStore::_omap_rmkey_range(TransContext *txc,
{
const string& prefix =
o->onode.is_pgmeta_omap() ? PREFIX_PGMETA_OMAP : PREFIX_OMAP;
o->flush();
get_omap_key(o->onode.nid, first, &key_first);
get_omap_key(o->onode.nid, last, &key_last);
txc->t->rm_range_keys(prefix, key_first, key_last);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm_range_keys impl enumerates keys, so we need the flush.

we could perhaps have a KeyValueDB method to tell us whether a proper range delete is implemented and if so skip the flush, but we have that off by default still (rocksdb recommends not using yet iirc?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@liewegas
Copy link
Member

liewegas commented Nov 1, 2017

the second cleanup patch is good, though!

@xiexingguo
Copy link
Member

xiexingguo commented Nov 1, 2017

@majianpeng Mind dropping the flush commit and repushing?

@majianpeng majianpeng changed the title os/bluestore: drop o->flush on omap update paths os/bluestore: remove unused parameters. Nov 6, 2017
@majianpeng
Copy link
Member Author

@xiexingguo . sorry for later response. updated by your suggestions. BTW, i has a quesiton: You said "Then add_omap_keys and rm_omap_keys should be two separate ops, right? Generally PG should order write ops manipulating the same object, which means rm_omap_keys should not be handled by BlueStore until add_omap_keys is stable, but I guess there might be exceptions(e.g., pglogs?)".

How pg do this? by on_disk_lock or others? Thanks!

@xiexingguo xiexingguo merged commit 7e5a8a7 into ceph:master Nov 6, 2017
@xiexingguo
Copy link
Member

xiexingguo commented Nov 6, 2017

@majianpeng I mean flush only works if we want to make sure that the omap changes made by a previous transaction(-group) are now visible in DB and hence the relevant keys can be safely enumerated.

It is okay (and possible) to manipulate omap multiple times in a single transaction but it's then caller's responsibility to make sure there is no conflicits.

@majianpeng
Copy link
Member Author

@xiexingguo . FileStore don't care this (in a osop: set_omap_keys, rm_omap_keys).

@majianpeng majianpeng deleted the bluestore-omap-flush branch November 6, 2017 08:02
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
joscollin pushed a commit to joscollin/ceph that referenced this pull request Jan 5, 2018
In some situations the IP address the Monitor wants to bind to
might not be available yet.

This might for example be a IPv6 Address which is still performing
DAD or waiting for a Router Advertisement to be send by the Router(s).

Have systemd wait for 10s before starting the Mon and increase the amount
of times it does so to 5.

This allows the system to bring up IP Addresses in the mean time while
systemd waits with restarting the Mon.

Fixes: ceph#18635

Signed-off-by: Wido den Hollander <wido@42on.com>
(cherry picked from commit e73eb8c)
joscollin pushed a commit to joscollin/ceph that referenced this pull request Jan 5, 2018
In some situations the IP address the Monitor wants to bind to
might not be available yet.

This might for example be a IPv6 Address which is still performing
DAD or waiting for a Router Advertisement to be send by the Router(s).

Have systemd wait for 10s before starting the Mon and increase the amount
of times it does so to 5.

This allows the system to bring up IP Addresses in the mean time while
systemd waits with restarting the Mon.

Fixes: ceph#18635

Signed-off-by: Wido den Hollander <wido@42on.com>
(cherry picked from commit e73eb8c)
dalgaaf pushed a commit to dalgaaf/ceph that referenced this pull request Apr 26, 2018
In some situations the IP address the Monitor wants to bind to
might not be available yet.

This might for example be a IPv6 Address which is still performing
DAD or waiting for a Router Advertisement to be send by the Router(s).

Have systemd wait for 10s before starting the Mon and increase the amount
of times it does so to 5.

This allows the system to bring up IP Addresses in the mean time while
systemd waits with restarting the Mon.

Fixes: ceph#18635

Signed-off-by: Wido den Hollander <wido@42on.com>
(cherry picked from commit e73eb8c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants