Report usage using a richer configuration traversing protocol #3229

benclifford · 2024-03-11T10:04:16Z

This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information.

The protocol now reports configured objects either as a JSON string class name, or as a JSON object containing the class name and any additional information that class wishes to report for usage (via the UsageInformation abstract class)

This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage tracking query is to ask about use of the enable_mpi_mode parameter, and so the HighThroughputExecutor will now report the boolean value of that parameter.

Beware that this reports on configuration, not use, of components: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. The UsageInformation API is intended to support reporting whether these staging providers actually stage anything, but this PR does not implement that in those staging provider components.

To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message - for example, the DFK report occurs only end in the end message.

An example start message looks like this: (pretty-formatted)

{'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51',
 'parsl_v': '1.3.0-dev',
 'python_v': '3.12.2',
 'platform.system': 'Linux',
 'start': 1710150467,
 'components': ['parsl.config.Config',
                {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False},
                'parsl.providers.local.local.LocalProvider',
                'parsl.channels.local.local.LocalChannel',
                'parsl.launchers.launchers.SingleNodeLauncher',
                'parsl.data_provider.ftp.FTPInTaskStaging',
                'parsl.data_provider.http.HTTPInTaskStaging',
                'parsl.data_provider.file_noop.NoOpFileStaging',
                'parsl.monitoring.monitoring.MonitoringHub']}

Changed Behaviour

Different information will be reported via usage tracking - anyone processing that usage data will need to adapt their code.

Type of change

New feature

This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information. This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage query is to ask about use of the enable_mpi_mode parameter and this modification supports that. Beware that this reports on configuration of components, and does not report any further usage unless those components are so augmented using the new API: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. (It is hopefully a straightforward change to add a UsageInformation implementation to report if those classes are actually used to stage data in any run). To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message. An example start message looks like this: (pretty-formatted) {'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51', 'parsl_v': '1.3.0-dev', 'python_v': '3.12.2', 'platform.system': 'Linux', 'start': 1710150467, 'components': ['parsl.config.Config', {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False}, 'parsl.providers.local.local.LocalProvider', 'parsl.channels.local.local.LocalChannel', 'parsl.launchers.launchers.SingleNodeLauncher', 'parsl.data_provider.ftp.FTPInTaskStaging', 'parsl.data_provider.http.HTTPInTaskStaging', 'parsl.data_provider.file_noop.NoOpFileStaging', 'parsl.monitoring.monitoring.MonitoringHub']}

…info of Config object

khk-globus

Looks good; a couple of minor comments/suggestions but no blockers.

parsl/usage_tracking/api.py

khk-globus · 2024-03-18T11:38:28Z

parsl/usage_tracking/api.py

+    for arg in argspec.args[1:]:  # skip first arg, self
+        arg_value = getattr(obj, arg)
+        d = get_parsl_usage(arg_value)
+        me += d


Stylistic alternative (but by no means a blocker):

me.extend(get_parsl_usage(getattr(obj, arg)) for arg in argspec.args[1:])

parsl/usage_tracking/api.py

parsl/usage_tracking/usage.py

benclifford · 2024-03-18T15:19:13Z

I converted this to draft status, because others are working on usage tracking now (which @kylechard and @yadudoc are driving I think?) - so this can be merged if they want it, otherwise we can close it.

Merge changes from Ben related to usage tracking update Parsl#3229

benclifford added 3 commits March 11, 2024 09:11

move site count away from being a hard-coded parameter to being usage…

8beeba1

…info of Config object

abstract away from requiring a DFK to support non-DFK users of parsl

f21d88e

benclifford changed the title ~~Traverse configuration heirarchy to report more usage information~~ Report usage using a richer configuration traversing protocol Mar 11, 2024

benclifford marked this pull request as ready for review March 11, 2024 11:47

Merge branch 'master' into benc-usage-protocol

7253fc7

khk-globus approved these changes Mar 18, 2024

View reviewed changes

benclifford marked this pull request as draft March 18, 2024 15:18

NishchayKarle added a commit to NishchayKarle/parsl that referenced this pull request Apr 24, 2024

Merge branch 'benc-usage-protocol' into update-usage-tracking

39ac632

Merge changes from Ben related to usage tracking update Parsl#3229

NishchayKarle mentioned this pull request Apr 25, 2024

Allow users to select their preferred level of usage tracking #3400

Open

benclifford added 5 commits April 25, 2024 12:25

Merge branch 'master' into benc-usage-protocol

3f9559e

fix typo

a2fd7cd

Remove explicit utf-8 encoding selection in favour of default

831ffc7

import inspect module in normal place

b0f5b79

add return type annotations onto get_parsl_usage dispatched methods

9dbbca8

benclifford marked this pull request as ready for review April 25, 2024 10:53

benclifford merged commit 06f25fc into master Apr 25, 2024
6 checks passed

benclifford deleted the benc-usage-protocol branch April 25, 2024 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report usage using a richer configuration traversing protocol #3229

Report usage using a richer configuration traversing protocol #3229

benclifford commented Mar 11, 2024 •

edited

khk-globus left a comment

khk-globus Mar 18, 2024

benclifford commented Mar 18, 2024

Report usage using a richer configuration traversing protocol #3229

Report usage using a richer configuration traversing protocol #3229

Conversation

benclifford commented Mar 11, 2024 • edited

Changed Behaviour

Type of change

khk-globus left a comment

Choose a reason for hiding this comment

khk-globus Mar 18, 2024

Choose a reason for hiding this comment

benclifford commented Mar 18, 2024

benclifford commented Mar 11, 2024 •

edited