Conversation
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris(harriscr@uk.ibm.com)
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
|
|
||
| def __init__(self, options: dict[str, str], workload_output_directory: str) -> None: | ||
| self._volume_number: int = int(options["volume_number"]) | ||
| self._total_iodepth: Optional[str] = options.get("total_iodepth", None) |
There was a problem hiding this comment.
Probably more generic to consider encapsulate any FIO options as a class. Unit tests can be generated to a set of valid options. That definitely protects the code for the future.
| from command.command import Command | ||
| from command.fio_command import FioCommand | ||
|
|
||
| WORKLOAD_TYPE = dict[str, Union[str, list[str]]] # pylint: disable=["invalid-name"] |
There was a problem hiding this comment.
We probably need WORKLOAD_TYPE to be a class itself, because it encapsulates useful info, like number of OSD of the cluster, whether its Classic or Crimson, whether we need a number of Reactors, Alien threads, etc.
Notice also that can be part of the info you need for post-processing.
There was a problem hiding this comment.
The WORKLOAD_TYPE here is a type hint to MyPy for checking the code. As we collect more of these I think we should look into creating a cbt_types.py file to contain them all.
I agree that storing things like the number of ODS etc. that you mentioned would be useful. To me that sort of intofmation is a property of the cluster under test, not a property of an I/O workload though. It would belong to the Cluster object.
I am wondering if eventually we want to have something like a results object that stores the configuration and options/results for a single benchmark run. I know we output the cluster configuration at the start of a run (in the benchmark .yaml file),
but that leave a step of matching the config to the benchmark run. For a 1:1 mapping this isn't too difficult, but if CBT ran with multiple cluster definitions then it makes things harder.
There is also already a Result object in the code (see benchmark.py), but that currently doesn't store any information about the configuration.
There was a problem hiding this comment.
The Results object class already exists, albeit it looks an initial try and needs TLC, have a look at benchmark.py:
class Result:
def __init__(self, run, alias, result, baseline, stmt, accepted):
self.run = run
self.alias = alias
self.result = result
self.baseline = baseline
self.stmt = stmt
self.accepted = accepted
def __str__(self):
fmt = '{run}: {alias}: {stmt}:: {result}/{baseline} => {status}'
return fmt.format(run=self.run, alias=self.alias, stmt=self.stmt,
result=self.result, baseline=self.baseline,
status="accepted" if self.accepted else "rejected")I think it would be very useful to have a relation between a 'run' with its corresponding (Cluster) configuration.
(Sorry I skipped the gun, just read the rest of the post and saying the same thing).
There was a problem hiding this comment.
For a 1:1 mapping this isn't too difficult, but if CBT ran with multiple cluster definitions then it makes things harder.
I see your point, I noticed different convention for EC test runs.
For example, for Crimson, I normally need a range over number of OSD (implicitly backend drives), number of reactors, number of Alien threads, etc. so diverge completely from the current test plan schema that CBT expects. That's why I capture succinct details of the configuration parameters in the test run name itself, so each test run ends up with its own .json linking the configuration details.
I might not lobby for such a convention to be supported in CBT (since recreating clusters etc might conflict with the expected behaviour wrt teuthology, which I think runs CBT with flag indicating to use the existing cluster iirc). I might keep a prototype in my own checkout and test it in anger.
| self._all_options: WORKLOAD_TYPE = options.copy() | ||
| self._executable_path: str | ||
| self._script: str = f"{options.get('pre_workload_script', '')}" | ||
|
|
There was a problem hiding this comment.
As a minimum, a workload can be specified (regardless of benchmark) by the following parameters:
- IO type,
- Block size,
- IO depth,
- (IO size normally the full target),
- target (device or volume)
Of course the free dict _all_options, supports that, but if we have this in place already would be very useful for consistency across the code base, including post processing. This can be serialised in to a .json object and load it when post-processing.
There was a problem hiding this comment.
Yes, I mean as instance object attributes in the Workload class (self._x variables) 👍
perezjosibm
left a comment
There was a problem hiding this comment.
Leaving review comments as per requested review, eventhough the state of the pr is still Draft.
| elif isinstance(value, list): | ||
| global_options[option_name] = value | ||
| else: | ||
| global_options[option_name] = f"{value}" |
There was a problem hiding this comment.
If you convert single/scalar values to strings, why is not requited for list of items?
There was a problem hiding this comment.
We do not want to convert the entire list to a string as that doesn't quite do what's expected.
e.g (using python 3.9)
>>> l: list[int] = [1, 2, 3, 6, 89]
>>> print(f"{l}")
[1, 2, 3, 6, 89]
>>> b = f"{l}"
>>> print(b[:-3])
[1, 2, 3, 6,
So the whole list, including the brackets would be converted to a single string, which would then have to be parsed at a later date and the brackets stripped off.
If we leave this list as a list, then we can iterate over it in later code without having to fist strip the brackets and then split the string into its component parts.
There is an argument to say we should do the same with dictionaries, but I haven't come across one yet - maybe when I'm doing more testing something will show up.
We could iterate over the contents of the list and also convert those to a string, but with the random nested structure present in the test plan yaml files we will never be able to convert everything correctly.
There was a problem hiding this comment.
I think the point you want to make is this:
- options is a list, keep it -- even if its a singleton, eg a list with a single element
- options is a scalar, convert to string.
For consistency, to avoid the ambiguous situation of a single element list, I'd prefer to always be a list, and convert them all as appropriate at the same time (rather than at different times), which makes the code clearer to maintain.
but with the random nested structure present in the test plan yaml files we will never be able to convert everything correctly.
Not sure I understand, iirc there were lots of discussions last year (some of which might still be around in the slack channel ceph-uk-cbt) wrt changing the format to the cbt .yaml, etc. I guess that did not progress at the end 😞
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
Signed-off-by: Chris Harris<harriscr@uk.ibm.com>
|
Teuthology runs: perf-basic rados/perf |
This is the first part of the work to allow the workloads feature added in PR 306 to be used with any benchmark type.
To achieve this a number of new classes have been added to CBT. UML class digrams for these have been generated using pyreverse, and are given in each section below.
To integrate Workloads into any particular benchmark type a Workloads object would need to be added to the class variables, and instantiated by passing the configuration object used to create the benchmark.
A new command class for the benchmark type would also have to be created so that the workload can understand how to convert the yaml options to a CLI invocation that can be used to run the I/O exerciser in question.
The easiest way would be to add a self._workloads to the Benchmark base class and instantiate it there with the config object and archive directory.
Then the specific benchmark can run the workload using:
Workload Classes
Class diagram
Workloads
This class is a container for the actual workload classes themselves. It is instantiated with the workloads section of the benchmark yaml file, and is used to run the individual workloads.
Workload
The Workload class contains the details for each individual workload. It is designed to be created and called via the Workloads class, so should never be called directly.
Command classes
These classes try to encapsulate the CLI command that will eventually be run on the system to run the I/O exerciser. Eventually the aim is to add a command class for each individual I/O exerciser
Class diagram
Command
An abstract base class for any command type.
FioCommand
The concrete class for an fio command. Parses all the options passed via a CBT yaml file for a single run of the fio I/O exerciser. This may need to be split farther in future into rbd and non-rbd versions in the future, but for this initial code a single class is sufficient
CliOptions
A class based on the standard python dictionary that holds key/value pairs that equate to an option passed used on the CLI and the corresponding value for that option. Unlike a regular dictionary it does not allow values to be updated, and returns None instead of a KeyError when a value for an unknown key is requested.
Note to reviewers: I'm not sure if this class is needed or not, but it seems to be a neat way to cope with sets of options and corresponding values for a CLI invocation
Testing
Black and ruff show no errors.
I'll update teuthology logs once they have run