New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report usage using a richer configuration traversing protocol #3229
Conversation
This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information. This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage query is to ask about use of the enable_mpi_mode parameter and this modification supports that. Beware that this reports on configuration of components, and does not report any further usage unless those components are so augmented using the new API: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. (It is hopefully a straightforward change to add a UsageInformation implementation to report if those classes are actually used to stage data in any run). To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message. An example start message looks like this: (pretty-formatted) {'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51', 'parsl_v': '1.3.0-dev', 'python_v': '3.12.2', 'platform.system': 'Linux', 'start': 1710150467, 'components': ['parsl.config.Config', {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False}, 'parsl.providers.local.local.LocalProvider', 'parsl.channels.local.local.LocalChannel', 'parsl.launchers.launchers.SingleNodeLauncher', 'parsl.data_provider.ftp.FTPInTaskStaging', 'parsl.data_provider.http.HTTPInTaskStaging', 'parsl.data_provider.file_noop.NoOpFileStaging', 'parsl.monitoring.monitoring.MonitoringHub']}
…info of Config object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; a couple of minor comments/suggestions but no blockers.
for arg in argspec.args[1:]: # skip first arg, self | ||
arg_value = getattr(obj, arg) | ||
d = get_parsl_usage(arg_value) | ||
me += d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic alternative (but by no means a blocker):
me.extend(get_parsl_usage(getattr(obj, arg)) for arg in argspec.args[1:])
I converted this to draft status, because others are working on usage tracking now (which @kylechard and @yadudoc are driving I think?) - so this can be merged if they want it, otherwise we can close it. |
Merge changes from Ben related to usage tracking update Parsl#3229
This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information.
The protocol now reports configured objects either as a JSON string class name, or as a JSON object containing the class name and any additional information that class wishes to report for usage (via the
UsageInformation
abstract class)This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage tracking query is to ask about use of the enable_mpi_mode parameter, and so the HighThroughputExecutor will now report the boolean value of that parameter.
Beware that this reports on configuration, not use, of components: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. The
UsageInformation
API is intended to support reporting whether these staging providers actually stage anything, but this PR does not implement that in those staging provider components.To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message - for example, the DFK report occurs only end in the end message.
An example start message looks like this: (pretty-formatted)
Changed Behaviour
Different information will be reported via usage tracking - anyone processing that usage data will need to adapt their code.
Type of change