New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per OSD crush_device_class definition #49555
Conversation
|
@adk3798 can you check this when you have the chance? It's something we already discussed at the orch meeting and it basically solves the tracker in $subject. or specify a per disk If no |
|
@adk3798 I suspect the failing tests [1] are unrelated to this patch, right? [1] https://jenkins.ceph.com/job/ceph-pull-requests/108756/console |
1be85ee
to
ce80830
Compare
doc/cephadm/services/osd.rst
Outdated
| - data: /dev/sdb | ||
| - crush_device_class: ssd | ||
| - data: /dev/sdc | ||
| - crush_device_class: nvme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right YAML structure? I expected something more like:
data_devices:
paths:
- data: /dev/sdb
crush_device_class: ssd
- data: /dev/sdc
crush_device_class: nvme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I had an issue in the previous change. The right structure is the following:
...
...
spec:
data_devices:
paths:
- data: /dev/sdb
crush_device_class: ssd
- data: /dev/sdc
crush_device_class: nvme
db_devices:
...
where paths is translated into a json like paths: [ { "data": "/dev/sdb", "crush_device_class": "ssd" }, { "data": "/dev/sdc", "crush_device_class": "nvme" } ] which is easy to process (and validate) from cephadm.
Thanks, nice catch on that issue I had with the yaml definition, just updated the doc change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML! I hate YAML! Even with strawberries!
;)
ce80830
to
b929930
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM, no code or overall PR approval should be inferred.
they look unrelated, yeah |
|
jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, some minor things/cleanup
| dg = DriveGroupSpec.from_json(yaml.safe_load(test_input)) | ||
| assert dg.service_id == 'testing_drivegroup' | ||
| assert all([isinstance(x, Device) for x in dg.data_devices.paths]) | ||
| assert dg.data_devices.paths[0].path == '/dev/sda' | ||
| if isinstance(dg.data_devices.paths[0].path, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was mypy complaining without this extra isinstance? It seems like the path attribute for the Device class is still just type str so it feels like this shouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this change just solved that problem. I'll add a stacktrace to show you the issue, but definitely having this check (that doesn't change the nature of the test) helped to make it happy
07f59ca
to
be8f4bc
Compare
|
|
||
| # For this use case we don't apply any custom crush_device_classes | ||
| # Note that filestore is not supported anymore by the DriveGroupSpec | ||
| if self.spec.objectstore == 'filestore': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adk3798 I'm not sure we want to keep L121-L138 (the filestore use case). This code is actually never reached, and if filestore is added to the spec:
test_input11 = """service_type: osd
service_id: testing_drivegroup
placement:
host_pattern: hostname
objectstore: filestore
data_devices:
paths:
- path: /dev/ceph_vg/ceph_lv_data
crush_device_class: ssd
- path: /dev/ceph_vg1/ceph_lv_data1
- path: /dev/ceph_vg1/ceph_lv_data2
- path: /dev/ceph_vg1/ceph_lv_data3
- path: /dev/ceph_vg1/ceph_lv_data4
"""
the spec validation will fail with the following error:
raise DriveGroupValidationError(self.service_id,
ceph.deployment.drive_group.DriveGroupValidationError: Failed to validate OSD spec "testing_drivegroup": filestore is not supported. Must be one of ('bluestore')
which is actually expected from [1].
We can leave this block here (and it's out of the for loop), but it's just something we want to clean up (maybe in a follow up change?)
[1] https://github.com/ceph/ceph/blob/main/src/python-common/ceph/deployment/drive_group.py#L328-L331
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check if this is getting used by anything else somehow that could make use of that. Maybe something for a follow up PR, yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, filestore is going to be deprecated and removed soon
be8f4bc
to
f4e9630
Compare
| - path: /dev/sdb | ||
| crush_device_class: hdd | ||
| - path: /dev/sdc | ||
| crush_device_class: ssd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that test_input6 != test_input5 as here we have multiple device classes
f4e9630
to
d774803
Compare
This patch introduces a per osd crush_device_class definition in the DriveGroup spec. The Device object is extended to support a crush_device_class parameter which is processed by ceph-volume when drives are prepared in batch mode. According to the per osd defined crush device classes, drives are collected and grouped in a dict that is used to produce a set of ceph-volume commands that eventually apply (if defined) the right device class. The test_drive_group unit tests are also extended to make sure we're not breaking compatibility with the default definition and the new syntax is validated, raising an exception if it's violated. Fixes: https://tracker.ceph.com/issues/58184 Signed-off-by: Francesco Pantano <fpantano@redhat.com>
d774803
to
6c6cb2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Lots of failures (13) but all accounted for
Overall, PRs in the run should be okay to merge other than the mon crush location one causing failures. Will start to try and clean up the test suite now that we're able to make builds and run tests again. |
|
lots of failures from infra stuff (re-imaging machines, installing things pre-test) so did a rerun of all failed and dead jobs, resulting in Leaving us with 6 failures.
Overall, nothing that would block merging for anything other than the mon crush location PR. |
This patch introduces a per osd
crush_device_classdefinition in theDriveGroupspec. TheDeviceobject is extended to support acrush_device_classparameter which is processed byceph-volumewhen drives are prepared inbatchmode. According to the per osd defined crush device classes, drives are collected and grouped in a dict that is used to produce a set ofceph-volumecommands that eventually apply (if defined) the right device class. Thetest_drive_groupunit tests are also extended to make sure we're not breaking compatibility with the default definition and the new syntax is validated, raising an exception if it's violated.Fixes: https://tracker.ceph.com/issues/58184
Signed-off-by: Francesco Pantano fpantano@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Tracker (select at least one)
Tests (select at least one)
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows