Skip to content

New workload classes#333

Merged
harriscr merged 10 commits intoceph:mainfrom
harriscr:ch_wip_workload_refactor
Jul 8, 2025
Merged

New workload classes#333
harriscr merged 10 commits intoceph:mainfrom
harriscr:ch_wip_workload_refactor

Conversation

@harriscr
Copy link
Contributor

@harriscr harriscr commented May 19, 2025

This is the first part of the work to allow the workloads feature added in PR 306 to be used with any benchmark type.

To achieve this a number of new classes have been added to CBT. UML class digrams for these have been generated using pyreverse, and are given in each section below.

To integrate Workloads into any particular benchmark type a Workloads object would need to be added to the class variables, and instantiated by passing the configuration object used to create the benchmark.
A new command class for the benchmark type would also have to be created so that the workload can understand how to convert the yaml options to a CLI invocation that can be used to run the I/O exerciser in question.

The easiest way would be to add a self._workloads to the Benchmark base class and instantiate it there with the config object and archive directory.

self._workloads: Workloads = Workloads(config, self.run_dir)

Then the specific benchmark can run the workload using:

if self._workloads.exist():
            self._workloads.set_benchmark_type(<string_representation_of_benchmark_type>)
            self._workloads.set_executable(<full_path_to_executable_for_benchmark>)
            self._workloads.run()

Workload Classes

Class diagram

workload_classes

Workloads

This class is a container for the actual workload classes themselves. It is instantiated with the workloads section of the benchmark yaml file, and is used to run the individual workloads.

Workload

The Workload class contains the details for each individual workload. It is designed to be created and called via the Workloads class, so should never be called directly.

Command classes

These classes try to encapsulate the CLI command that will eventually be run on the system to run the I/O exerciser. Eventually the aim is to add a command class for each individual I/O exerciser

Class diagram

command_classes

Command

An abstract base class for any command type.

FioCommand

The concrete class for an fio command. Parses all the options passed via a CBT yaml file for a single run of the fio I/O exerciser. This may need to be split farther in future into rbd and non-rbd versions in the future, but for this initial code a single class is sufficient

CliOptions

A class based on the standard python dictionary that holds key/value pairs that equate to an option passed used on the CLI and the corresponding value for that option. Unlike a regular dictionary it does not allow values to be updated, and returns None instead of a KeyError when a value for an unknown key is requested.
Note to reviewers: I'm not sure if this class is needed or not, but it seems to be a neat way to cope with sets of options and corresponding values for a CLI invocation

Testing

============================= slowest 5 durations ==============================
0.02s setup    tests/test_bm_cephtestrados.py::TestBenchmarkcephtestrados::test_valid_archive_dir
0.01s setup    tests/test_bm_rbdfio.py::TestBenchmarkrbdfio::test_valid_archive_dir
0.01s setup    tests/test_bm_kvmrbdfio.py::TestBenchmarkkvmrbdfio::test_valid_archive_dir
0.01s setup    tests/test_bm_librbdfio.py::TestBenchmarklibrbdfio::test_valid_archive_dir
0.01s setup    tests/test_bm_rawfio.py::TestBenchmarkrawfio::test_valid_archive_dir
======================== 326 passed, 3 skipped in 0.44s ========================
Finished running tests!

Black and ruff show no errors.

I'll update teuthology logs once they have run

Chris Harris and others added 7 commits April 24, 2025 11:59
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
@harriscr harriscr self-assigned this May 19, 2025
@harriscr harriscr marked this pull request as draft May 19, 2025 14:42

def __init__(self, options: dict[str, str], workload_output_directory: str) -> None:
self._volume_number: int = int(options["volume_number"])
self._total_iodepth: Optional[str] = options.get("total_iodepth", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more generic to consider encapsulate any FIO options as a class. Unit tests can be generated to a set of valid options. That definitely protects the code for the future.

from command.command import Command
from command.fio_command import FioCommand

WORKLOAD_TYPE = dict[str, Union[str, list[str]]] # pylint: disable=["invalid-name"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need WORKLOAD_TYPE to be a class itself, because it encapsulates useful info, like number of OSD of the cluster, whether its Classic or Crimson, whether we need a number of Reactors, Alien threads, etc.
Notice also that can be part of the info you need for post-processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WORKLOAD_TYPE here is a type hint to MyPy for checking the code. As we collect more of these I think we should look into creating a cbt_types.py file to contain them all.

I agree that storing things like the number of ODS etc. that you mentioned would be useful. To me that sort of intofmation is a property of the cluster under test, not a property of an I/O workload though. It would belong to the Cluster object.

I am wondering if eventually we want to have something like a results object that stores the configuration and options/results for a single benchmark run. I know we output the cluster configuration at the start of a run (in the benchmark .yaml file),
but that leave a step of matching the config to the benchmark run. For a 1:1 mapping this isn't too difficult, but if CBT ran with multiple cluster definitions then it makes things harder.
There is also already a Result object in the code (see benchmark.py), but that currently doesn't store any information about the configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Results object class already exists, albeit it looks an initial try and needs TLC, have a look at benchmark.py:

class Result:
    def __init__(self, run, alias, result, baseline, stmt, accepted):
        self.run = run
        self.alias = alias
        self.result = result
        self.baseline = baseline
        self.stmt = stmt
        self.accepted = accepted

    def __str__(self):
        fmt = '{run}: {alias}: {stmt}:: {result}/{baseline}  => {status}'
        return fmt.format(run=self.run, alias=self.alias, stmt=self.stmt,
                          result=self.result, baseline=self.baseline,
                          status="accepted" if self.accepted else "rejected")

I think it would be very useful to have a relation between a 'run' with its corresponding (Cluster) configuration.

(Sorry I skipped the gun, just read the rest of the post and saying the same thing).

Copy link
Contributor

@perezjosibm perezjosibm May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a 1:1 mapping this isn't too difficult, but if CBT ran with multiple cluster definitions then it makes things harder.

I see your point, I noticed different convention for EC test runs.

For example, for Crimson, I normally need a range over number of OSD (implicitly backend drives), number of reactors, number of Alien threads, etc. so diverge completely from the current test plan schema that CBT expects. That's why I capture succinct details of the configuration parameters in the test run name itself, so each test run ends up with its own .json linking the configuration details.
I might not lobby for such a convention to be supported in CBT (since recreating clusters etc might conflict with the expected behaviour wrt teuthology, which I think runs CBT with flag indicating to use the existing cluster iirc). I might keep a prototype in my own checkout and test it in anger.

self._all_options: WORKLOAD_TYPE = options.copy()
self._executable_path: str
self._script: str = f"{options.get('pre_workload_script', '')}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minimum, a workload can be specified (regardless of benchmark) by the following parameters:

  • IO type,
  • Block size,
  • IO depth,
  • (IO size normally the full target),
  • target (device or volume)
    Of course the free dict _all_options, supports that, but if we have this in place already would be very useful for consistency across the code base, including post processing. This can be serialised in to a .json object and load it when post-processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean as instance object attributes in the Workload class (self._x variables) 👍

Copy link
Contributor

@perezjosibm perezjosibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving review comments as per requested review, eventhough the state of the pr is still Draft.

elif isinstance(value, list):
global_options[option_name] = value
else:
global_options[option_name] = f"{value}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you convert single/scalar values to strings, why is not requited for list of items?

Copy link
Contributor Author

@harriscr harriscr May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to convert the entire list to a string as that doesn't quite do what's expected.
e.g (using python 3.9)

>>> l: list[int] = [1, 2, 3, 6, 89]
>>> print(f"{l}")
[1, 2, 3, 6, 89]
>>> b = f"{l}"
>>> print(b[:-3])
[1, 2, 3, 6,

So the whole list, including the brackets would be converted to a single string, which would then have to be parsed at a later date and the brackets stripped off.
If we leave this list as a list, then we can iterate over it in later code without having to fist strip the brackets and then split the string into its component parts.
There is an argument to say we should do the same with dictionaries, but I haven't come across one yet - maybe when I'm doing more testing something will show up.

We could iterate over the contents of the list and also convert those to a string, but with the random nested structure present in the test plan yaml files we will never be able to convert everything correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point you want to make is this:

  • options is a list, keep it -- even if its a singleton, eg a list with a single element
  • options is a scalar, convert to string.

For consistency, to avoid the ambiguous situation of a single element list, I'd prefer to always be a list, and convert them all as appropriate at the same time (rather than at different times), which makes the code clearer to maintain.

but with the random nested structure present in the test plan yaml files we will never be able to convert everything correctly.

Not sure I understand, iirc there were lots of discussions last year (some of which might still be around in the slack channel ceph-uk-cbt) wrt changing the format to the cbt .yaml, etc. I guess that did not progress at the end 😞

Chris Harris added 2 commits May 29, 2025 15:47
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
@harriscr
Copy link
Contributor Author

harriscr commented Jul 1, 2025

Teuthology runs:

perf-basic
https://pulpito.ceph.com/harriscr-2025-06-24_10:54:23-perf-basic-main-distro-default-smithi/

rados/perf
https://pulpito.ceph.com/harriscr-2025-06-24_12:31:18-rados:perf-main-distro-default-smithi/
2 failed with "Command failed on smithi053 with status 100: 'sudo apt-get -y --force-yes install python3-pip librbd-dev collectl linux-tools-generic'"

@harriscr harriscr marked this pull request as ready for review July 1, 2025 15:39
Copy link
Member

@lee-j-sanders lee-j-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@harriscr harriscr requested a review from perezjosibm July 8, 2025 08:08
@harriscr harriscr merged commit 89f7cea into ceph:main Jul 8, 2025
@harriscr harriscr deleted the ch_wip_workload_refactor branch August 28, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants