Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-volume: major batch refactor #34740

Merged
merged 31 commits into from Sep 30, 2020

Conversation

jan--f
Copy link
Contributor

@jan--f jan--f commented Apr 24, 2020

This is a major rewrite of batch in order to use the create/prepare code path and significantly simplify this subcommands code. Simultaneously this adds support for vg re-use, idempotency, use of LVs with batch and additional implicit sizing arguments.

This also closes a number of bugs and unblocks other work in cephadm and rook.

Fixes: https://tracker.ceph.com/issues/24969
Fixes: https://tracker.ceph.com/issues/36242
Fixes: https://tracker.ceph.com/issues/36283
Fixes: https://tracker.ceph.com/issues/37502
Fixes: https://tracker.ceph.com/issues/37590
Fixes: https://tracker.ceph.com/issues/38168
Fixes: https://tracker.ceph.com/issues/42412
Fixes: https://tracker.ceph.com/issues/43899
Fixes: https://tracker.ceph.com/issues/44749
Fixes: https://tracker.ceph.com/issues/44783
Fixes: https://tracker.ceph.com/issues/44951
Fixes: https://tracker.ceph.com/issues/46031
Fixes: https://tracker.ceph.com/issues/46033

Signed-off-by: Jan Fajerski jfajerski@suse.com

Some things are still TODO

  • Add auto functionality by sorting passed drives
  • Tidy code
  • Tidy commits
  • Tidy tests
  • Add new unit tests
  • Documentation

@tchaikov
Copy link
Contributor

jenkins test docs

@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from cbbd9f8 to 7f0f8c0 Compare April 27, 2020 09:53
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from 7f0f8c0 to e523d64 Compare April 27, 2020 10:29
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 3 times, most recently from f79fabc to 63b0e8a Compare May 21, 2020 11:41
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from 63b0e8a to a1b8cac Compare June 9, 2020 16:03
jschmid1 pushed a commit to SUSE/DeepSea that referenced this pull request Jun 15, 2020
Signed-off-by: Joshua Schmid <jschmid@suse.de>
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 2 times, most recently from b7855f0 to de8f034 Compare June 18, 2020 09:09
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from e86004a to 6d746d6 Compare June 23, 2020 15:06
@jan--f jan--f changed the title [WIP] ceph-volume: major batch refactor ceph-volume: major batch refactor Jun 23, 2020
@jan--f jan--f marked this pull request as ready for review June 23, 2020 15:06
@jan--f jan--f requested a review from a team as a code owner June 23, 2020 15:06
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 3 times, most recently from f6ce9f9 to 6ba8195 Compare June 23, 2020 15:13
@jan--f
Copy link
Contributor Author

jan--f commented Jun 23, 2020

jenkins test ceph-volume tox

@jan--f
Copy link
Contributor Author

jan--f commented Jun 23, 2020

jenkins test ceph-volume all

@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 5 times, most recently from 38d109c to 9920e6f Compare June 24, 2020 11:31
@jan--f jan--f removed the DNM label Jun 24, 2020
@guits
Copy link
Contributor

guits commented Sep 29, 2020

This is not only for backporting concerns.
IMO, when you debug/troubleshoot, if you have to reverse the code months and months later, it's very nice when you can link a commit with the corresponding tracker so you get more context.
I would say generally not having the corresponding tracker (if there's one...) and no commit message at all can be a nightmare in that kind of situation.
Just saying...

The majority of the Fixes would have to be added to b0b7973. I'm not entirely sure how helpful that would be.

yes, in that case maybe it's not really relevant, I was more speaking about how I think it should be from a more general point of view.

Jan Fajerski added 5 commits September 29, 2020 22:37
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Some device we never want to pass to the batch subcommand. For now this
includes devices that have a partition or are mounted on the machine.
One goal is to filter the root device, so it is not included on a batch
command and thus would contribute to its implicit sizing calculation.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
This enables user to pass sizes like "10G", which batch now understands.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from 5b347b3 to 8178d5c Compare September 29, 2020 20:39
@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

Alright, I added the missing output in the doc. One last round of tests and unless someone has objections I'll merge this.

@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

jenkins test ceph-volume all

@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

jenkins test ceph-volume tox

@jan--f jan--f merged commit 239c4bc into ceph:master Sep 30, 2020
jschmid1 pushed a commit to jschmid1/ceph that referenced this pull request Oct 2, 2020
With ceph#34740 ceph-volume is now idempotent.
We now assemble the device set based on the drivegroups
in mgr/cephadm (regardless of their availability) and pass it to c-v.

This is the missing piece to actually do this.

related changes: ceph@7d168ad

Signed-off-by: Joshua Schmid <jschmid@suse.de>
jschmid1 pushed a commit to jschmid1/ceph that referenced this pull request Oct 2, 2020
With ceph#34740 ceph-volume is now idempotent.
We now assemble the device set based on the drivegroups
in mgr/cephadm (regardless of their availability) and pass it to c-v.

This is the missing piece to actually do this.

related changes: ceph@7d168ad

Signed-off-by: Joshua Schmid <jschmid@suse.de>
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 3, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 4, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 9, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 9, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 12, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2a854ca)
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 12, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2a854ca)
dsavineau added a commit to dsavineau/ceph that referenced this pull request Nov 19, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2a854ca)
ifed01 pushed a commit to ifed01/ceph that referenced this pull request Nov 30, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2a854ca)
votdev pushed a commit to votdev/ceph that referenced this pull request Dec 1, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2a854ca)
jmolmo pushed a commit to jmolmo/ceph that referenced this pull request Dec 14, 2020
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.

Fixes: https://tracker.ceph.com/issues/48106

[1] ceph#34740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
@haslersn
Copy link

haslersn commented Mar 10, 2021

I just experienced the following error with docker.io/ceph/ceph/v15.2.9. It seems to be related to this PR.

2021-03-10 00:17:53.440791 D | exec: Traceback (most recent call last):
2021-03-10 00:17:53.440803 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-03-10 00:17:53.440809 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-03-10 00:17:53.440817 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-03-10 00:17:53.440824 D | exec:     self.main(self.argv)
2021-03-10 00:17:53.440828 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-03-10 00:17:53.440832 D | exec:     return f(*a, **kw)
2021-03-10 00:17:53.440836 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main
2021-03-10 00:17:53.440840 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-03-10 00:17:53.440844 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440849 D | exec:     instance.main()
2021-03-10 00:17:53.440853 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-03-10 00:17:53.440857 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-03-10 00:17:53.440861 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440865 D | exec:     instance.main()
2021-03-10 00:17:53.440869 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-03-10 00:17:53.440872 D | exec:     return func(*a, **kw)
2021-03-10 00:17:53.440876 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 402, in main
2021-03-10 00:17:53.440879 D | exec:     plan = self.get_plan(self.args)
2021-03-10 00:17:53.440883 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 440, in get_plan
2021-03-10 00:17:53.440888 D | exec:     args.wal_devices)
2021-03-10 00:17:53.440895 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 472, in get_deployment_layout
2021-03-10 00:17:53.440901 D | exec:     fast_type)
2021-03-10 00:17:53.440910 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 509, in fast_allocations
2021-03-10 00:17:53.440914 D | exec:     ret.extend(get_lvm_fast_allocs(lvm_devs))
2021-03-10 00:17:53.440918 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 146, in get_lvm_fast_allocs
2021-03-10 00:17:53.440922 D | exec:     disk.Size(b=int(d.lvs[0].lv_size)), 1) for d in lvs if not
2021-03-10 00:17:53.440926 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 147, in <listcomp>
2021-03-10 00:17:53.440930 D | exec:     d.used_by_ceph]
2021-03-10 00:17:53.440933 D | exec: IndexError: list index out of range

@guits
Copy link
Contributor

guits commented Mar 10, 2021

I just experienced the following error with docker.io/ceph/ceph/v15.2.9. It seems to be related to this PR.

2021-03-10 00:17:53.440791 D | exec: Traceback (most recent call last):
2021-03-10 00:17:53.440803 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-03-10 00:17:53.440809 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-03-10 00:17:53.440817 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-03-10 00:17:53.440824 D | exec:     self.main(self.argv)
2021-03-10 00:17:53.440828 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-03-10 00:17:53.440832 D | exec:     return f(*a, **kw)
2021-03-10 00:17:53.440836 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main
2021-03-10 00:17:53.440840 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-03-10 00:17:53.440844 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440849 D | exec:     instance.main()
2021-03-10 00:17:53.440853 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-03-10 00:17:53.440857 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-03-10 00:17:53.440861 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440865 D | exec:     instance.main()
2021-03-10 00:17:53.440869 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-03-10 00:17:53.440872 D | exec:     return func(*a, **kw)
2021-03-10 00:17:53.440876 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 402, in main
2021-03-10 00:17:53.440879 D | exec:     plan = self.get_plan(self.args)
2021-03-10 00:17:53.440883 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 440, in get_plan
2021-03-10 00:17:53.440888 D | exec:     args.wal_devices)
2021-03-10 00:17:53.440895 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 472, in get_deployment_layout
2021-03-10 00:17:53.440901 D | exec:     fast_type)
2021-03-10 00:17:53.440910 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 509, in fast_allocations
2021-03-10 00:17:53.440914 D | exec:     ret.extend(get_lvm_fast_allocs(lvm_devs))
2021-03-10 00:17:53.440918 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 146, in get_lvm_fast_allocs
2021-03-10 00:17:53.440922 D | exec:     disk.Size(b=int(d.lvs[0].lv_size)), 1) for d in lvs if not
2021-03-10 00:17:53.440926 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 147, in <listcomp>
2021-03-10 00:17:53.440930 D | exec:     d.used_by_ceph]
2021-03-10 00:17:53.440933 D | exec: IndexError: list index out of range

@haslersn could you paste the exact command you tried to run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet