Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Common Benchmark Fields #296

Closed
whitleykeith opened this issue Jul 14, 2021 · 4 comments
Closed

Extend Common Benchmark Fields #296

whitleykeith opened this issue Jul 14, 2021 · 4 comments

Comments

@whitleykeith
Copy link

whitleykeith commented Jul 14, 2021

Common Benchmark Fields

With @learnitall's PR merged we have a launch pad to start creating a well defined schema for our benchmarks. The biggest gap I've seen is around common fields within benchmark results.

This proposal is to create strictly defined fields across all benchmarks, regardless of benchmark/environment/config. This data should consist of:

  • benchmark-agnostic user defined fields (i.e. uuid/run_id)
  • benchmark-agnostic generated fields (i.e. start_time, end_time, duration, status)
  • benchmark-specific in-use but benchmark-agnostic in definition (i.e. iteration/sample)

This would look something like this:

@dataclass
class BenchmarkResult:
    name: str
    uuid: str
    start_time: datetime.datetime
    end_time: datetime.datetime
    duration: str 
    iteration: Optional[int]
    metadata: Dict[str, Any]
    config: Dict[str, Any]
    data: Dict[str, Any]
    labels: Dict[str, Any]
    tag: str

These fields should take precedence over any field defined with the same key in metadata/config/data/labels. For instance if benchmark creates a field called start_time in it's data dictionary, then it should be overwritten by the top-level start_time field.

Common logic for setting the fields

Subsequently, the Benchmark Abstract class should enforce the definition of these fields while providing users the ability to override if required.

Psuedo-code Example:

class Benchmark(ABC):
  @abstractmethod
  def get_start_time(self): 
     # common logic to create start_time here
  @abstractmethod
  def get_end_time(self): 
     # common logic to create end_time

  # This may requrie some changes in how the Benchmark class encapsulates a run
  def run(self):
    start_time = self.get_start_time()
    # run logic 
    end_time = self.get_end_time()
    

   

Ideally we can create a common run method that ensures generated fields like start_time and end_time get added, but it'll require tweaking how the Benchmark class works atm.

@learnitall
Copy link
Collaborator

This would be fantastic! It would also allow us to create really nice and specific templates for elasticsearch. One question that I have, which @mfleader talked to me about, was about the definition for config, metadata and labels. Right now we have the following:

  • config represents the configuration of the benchmark itself
  • metadata are fields common to all benchmarks, such as cluster_name and user
  • labels are any key-value pairs that a user has passed in during snafu invocation.

I think that the metadata field is a bit confusing. We might want to look at the current fields we are placing there and whether or not they should be added into BenchmarkResult as a field.

@whitleykeith
Copy link
Author

yeah metadata is a bit weird. I'm fine if some of the fields I mentioned (start_time, end_time, run_id, etc.) are in that metadata class in the actual code as long as it gets flattened in the doc. cluster_name shouldn't be there though since it's environment specific

@mfleader
Copy link

For newly proposed benchmark wrappers, let's say I have a field in my BenchmarkResult.data with a name, like duration, that conflicts with a field name in BenchmarkResult. If it has a different definition than BenchmarkResult.duration (specifically it is a sum of round trip times), should I just give that data field a new name?

@mfleader
Copy link

How is start_time defined? Do they come from the beginning of the benchmarking tool or the beginning of the parent python process?

@whitleykeith whitleykeith closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants