Route string/bool values to Pluto config, handle unmapped types by asaiacai · Pull Request #120 · Trainy-ai/pluto

asaiacai · 2026-06-05T23:52:52Z

Description

This PR improves how the wandb compatibility shim forwards logged values to Pluto by implementing proper value routing and handling for unmapped types.

Key Changes

Value Routing for Pluto:
- Numeric metrics: int/float and any scalar exposing .item() (numpy/torch/etc.) → Pluto metrics (time-series)
- Strings and bools: → Pluto config (latest-wins, queryable via get_run().config). This enables the /resume-crashed-run use case where checkpoint paths and other string metadata need to be readable from Pluto
- wandb media and lists: → converted Pluto media
- JSON-serializable values (dicts, None, nested structures): → Pluto config as fallback
- Unmapped types: Reported to Sentry telemetry once per key (maintainer-coverage signal, not user-facing)
Improved Scalar Detection:
- Replaced _is_torch_tensor_scalar() with _as_scalar_number() that mirrors Pluto core's own op._process_log_item_sync
- Now accepts any scalar exposing a callable .item() method (numpy scalars, 0-d arrays, custom tensor-like objects)
- Properly excludes bool and str from scalar handling
Unforwardable Value Handling:
- Added _handle_unforwardable() method to gracefully handle values with no metric/media/config mapping
- Attempts JSON serialization as last resort (preserves dicts, None, nested primitives)
- Reports non-serializable types to Sentry once per key for maintainer awareness
- No user-facing warnings (not a user-actionable problem)
Config Synchronization:
- Added pluto_config dict to collect string/bool/JSON-serializable values
- Calls pluto_run.update_config() after logging metrics
- Includes error handling for config sync failures

Testing

Added comprehensive test coverage in test_wandb_compat.py:

test_log_routes_strings_to_config_not_metrics: Verifies strings land in config, not metrics
test_log_forwards_numpy_scalars_as_metrics: Ensures numpy scalars reach Pluto
test_log_forwards_any_item_scalar_like_pluto_core: Guards against being stricter than Pluto core
test_log_does_not_treat_failing_item_as_scalar: Handles non-scalars with .item() gracefully
test_unforwardable_value_alerts_sentry_once_not_user: Verifies Sentry alerting and deduplication
test_json_serializable_unmapped_value_falls_back_to_config: Confirms dict/None fallback to config

Tested (run the relevant ones):

Code formatting: bash format.sh
New unit tests for value routing and type handling

https://claude.ai/code/session_01JzBDybRDXt1TLnsQAxhKqz

Note

Medium Risk
Changes dual-logging routing in a hot path (wandb.log); mis-routing could affect resume/metadata in Pluto while wandb stays correct, but failures are contained in try/except and covered by new unit tests.

Overview
The wandb→Pluto shim WandbRunWrapper.log now splits each wandb.log payload before forwarding: numbers (including any scalar with a working .item(), not only torch) go to pluto_run.log, while str, bool, and JSON-serializable leftovers go to pluto_run.update_config (latest-wins), so checkpoint paths and similar metadata are readable from Pluto config (e.g. resume flows).

Values that are not metrics, media, or JSON config are handled by new _handle_unforwardable: serializable dicts/None/primitive lists are kept as config; otherwise the Pluto copy is skipped, debug is logged once per key, and Sentry gets a one-time maintainer alert—wandb behavior is unchanged.

^{Reviewed by Cursor Bugbot for commit 3d52d9c. Configure here.}

WandbRunWrapper.log() pre-filtered logged values and only forwarded Python int/float, torch scalar tensors, and wandb media to Pluto. String values had no branch (_convert_wandb_to_pluto returns None for str), so they were silently dropped. Concretely, a checkpoint-metadata logger that does: wandb.log({'checkpoint/step': step, # int -> reached Pluto 'checkpoint/epoch': epoch, # int -> reached Pluto 'checkpoint/r2_path': r2_path, # str -> DROPPED 'checkpoint/local_path': path}, # str -> DROPPED step=step) lost the two string paths while the numeric step/epoch came through fine. Those string paths are exactly what a resume flow needs to know which object to stage. Now str/bool values route to update_config() (latest-wins, queryable via get_run().config), mirroring where wandb places loose strings (run summary/overview). A resume skill can read the most recent checkpoint/r2_path off the run config instead of scanning storage. Also forward numpy scalars (np.generic, excluding np.bool_) as metrics via .item(). This is defensive hardening, not part of the above bug: the shim was stricter than Pluto's own log(), which already accepts anything exposing .item(). Frameworks that log np.int64/np.float32 were dropped here even though Pluto core would have kept them.

Generalize the shim's numeric forwarding to accept any value exposing a callable .item() returning a number, replacing the narrower torch-tensor + numpy-generic checks. This mirrors Pluto's own log() (op._process_log_item_sync), which already forwards anything with .item(). Motivation: a checkpoint logger annotated as log_checkpoint(step: int, epoch: int, ...) can still pass a non-plain-int at runtime (np.int64, a 0-d tensor, a framework scalar). The old shim dropped those even though Pluto core would have kept them, so a logged 'epoch' could silently never reach Pluto. A failing .item() (multi-element array) is treated as not-a-scalar and ignored, same as Pluto would fail it. bool/str remain routed to config.

The shim only forwarded recognized types (numbers, wandb media, str/bool) and silently dropped everything else — dicts, None, raw/multi-element tensors, numpy arrays, unconvertible wandb media (Html/Object3D/...), custom objects. Silent drops are what made missing data (e.g. a whole checkpoint metadata call) so hard to diagnose: no error, no warning, just absent. Add a 'preserve-what-we-can, otherwise fail loud' fallback in _handle_unforwardable(): - JSON-serializable leftovers (nested dicts/lists of primitives, None, etc.) are preserved as Pluto config, mirroring how wandb keeps loose values in the run summary. - Anything else warns ONCE per key (WARNING, naming key + type) instead of vanishing. The value still reaches W&B; only the Pluto copy is dropped, and now visibly. Never raises — wandb behavior is unaffected.

The previous change warned users (WARNING log) when a value had no Pluto mapping. But that's a gap in OUR type coverage, not a user error — people migrating away from wandb shouldn't be nagged about types only the package maintainers can fix. Reroute the signal: genuinely unforwardable values (non-JSON-serializable, no metric/media mapping) now emit a Sentry telemetry alert via the SDK's isolated client (pluto/sentry.py), once per key, grouped by type name so we can see which unhandled types show up in the wild and add coverage. The local log drops to debug. JSON-serializable leftovers still fall back to config silently. Sentry honors PLUTO_DISABLE_TELEMETRY and swallows all errors, so this never affects the user or wandb.

cursor

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 3d52d9c. Configure here.}

cursor · 2026-06-05T23:54:02Z

+                        logger.debug(
+                            f'pluto.compat.wandb: Failed to sync string/bool '
+                            f'values to Pluto config: {e}'
+                        )


Pluto log failure skips config

Medium Severity

update_config runs in the same outer try as pluto_run.log. If metric/media logging raises, execution jumps to the broad handler and string or bool config updates from the same wandb.log call are never sent to Pluto.

^{Reviewed by Cursor Bugbot for commit 3d52d9c. Configure here.}

cursor · 2026-06-05T23:54:02Z

+    try:
+        json.dumps(value)
+        return True
+    except (TypeError, ValueError):


Config gate rejects OmegaConf values

Medium Severity

_is_json_serializable uses plain json.dumps, while update_config normalizes via to_native_config (including OmegaConf). Logged DictConfig or nested OmegaConf nodes fail the JSON check and are treated as unforwardable instead of being stored as config.

Additional Locations (1)

pluto/compat/wandb.py#L289-L292

^{Reviewed by Cursor Bugbot for commit 3d52d9c. Configure here.}

gemini-code-assist

Code Review

This pull request enhances the W&B compatibility shim by implementing robust value routing to Pluto. Specifically, it routes string and boolean values to the run configuration, supports numpy scalars as metrics, and provides fallback handling for unforwardable values (either saving them as config if JSON-serializable or reporting them to Sentry). The reviewer's feedback identifies a critical performance bottleneck where logging string or boolean values at every step triggers redundant, synchronous configuration updates. The reviewer suggests caching the last logged configuration values to only sync actual changes, and provides a unit test to verify this optimization.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-05T23:54:28Z

+        # Keys we've already warned about being unforwardable to Pluto, so a
+        # value logged every step warns once rather than spamming the logs.
+        self._unforwardable_warned: set = set()


Performance Bottleneck: Redundant Config Updates

Logging string or boolean values (such as training phase, status, or checkpoint paths) at every step is a very common pattern in machine learning training loops. Currently, every call to log() containing a string or boolean will trigger a call to self._pluto_run.update_config(pluto_config).

If the sync process is disabled or not yet initialized, update_config performs a synchronous, blocking HTTP POST request to the server. Even when the sync process is enabled, it triggers a synchronous write to the local SQLite database. Doing this at every single step will severely degrade training performance due to network or disk I/O bottlenecks.

To prevent this, we should cache the last logged config values and only send updates when a value actually changes.

Suggested change

# Keys we've already warned about being unforwardable to Pluto, so a

# value logged every step warns once rather than spamming the logs.

self._unforwardable_warned: set = set()

# Keys we've already warned about being unforwardable to Pluto, so a

# value logged every step warns once rather than spamming the logs.

self._unforwardable_warned: set = set()

# Cache of the last logged config values to avoid redundant updates.

self._last_logged_config: Dict[str, Any] = {}

gemini-code-assist · 2026-06-05T23:54:28Z

                for key, value in data.items():
-                    if isinstance(value, (int, float)):
+                    if isinstance(value, bool):
+                        # bool is a subclass of int, but Pluto drops bool
+                        # metrics — surface it as config so it isn't lost.
+                        pluto_config[key] = value
+                    elif isinstance(value, (int, float)):
                        pluto_data[key] = value
-                    elif _is_torch_tensor_scalar(value):
-                        pluto_data[key] = value.item()
+                    elif (num := _as_scalar_number(value)) is not None:
+                        pluto_data[key] = num
+                    elif isinstance(value, str):
+                        pluto_config[key] = value


Only add string and boolean values to pluto_config if they have actually changed from their previously logged values.

Suggested change

for key, value in data.items():

if isinstance(value, (int, float)):

if isinstance(value, bool):

# bool is a subclass of int, but Pluto drops bool

# metrics — surface it as config so it isn't lost.

pluto_config[key] = value

elif isinstance(value, (int, float)):

pluto_data[key] = value

elif _is_torch_tensor_scalar(value):

pluto_data[key] = value.item()

elif (num := _as_scalar_number(value)) is not None:

pluto_data[key] = num

elif isinstance(value, str):

pluto_config[key] = value

for key, value in data.items():

if isinstance(value, bool):

# bool is a subclass of int, but Pluto drops bool

# metrics — surface it as config so it isn't lost.

if self._last_logged_config.get(key) != value:

pluto_config[key] = value

elif isinstance(value, (int, float)):

pluto_data[key] = value

elif (num := _as_scalar_number(value)) is not None:

pluto_data[key] = num

elif isinstance(value, str):

if self._last_logged_config.get(key) != value:

pluto_config[key] = value

gemini-code-assist · 2026-06-05T23:54:28Z

+                if pluto_config:
+                    try:
+                        self._pluto_run.update_config(pluto_config)
+                    except Exception as e:
+                        logger.debug(
+                            f'pluto.compat.wandb: Failed to sync string/bool '
+                            f'values to Pluto config: {e}'
+                        )


Update the local config cache self._last_logged_config once the config has been successfully synced.

Suggested change

if pluto_config:

try:

self._pluto_run.update_config(pluto_config)

except Exception as e:

logger.debug(

f'pluto.compat.wandb: Failed to sync string/bool '

f'values to Pluto config: {e}'

)

if pluto_config:

try:

self._pluto_run.update_config(pluto_config)

self._last_logged_config.update(pluto_config)

except Exception as e:

logger.debug(

f'pluto.compat.wandb: Failed to sync string/bool '

f'values to Pluto config: {e}'

)

gemini-code-assist · 2026-06-05T23:54:29Z

+        if _is_json_serializable(value):
+            pluto_config[key] = value
+            return


Only add fallback JSON-serializable values to pluto_config if they have changed from their previously logged values.

Suggested change

if _is_json_serializable(value):

pluto_config[key] = value

return

if _is_json_serializable(value):

if self._last_logged_config.get(key) != value:

pluto_config[key] = value

return

gemini-code-assist · 2026-06-05T23:54:29Z

+def test_json_serializable_unmapped_value_falls_back_to_config():
+    """A dict/None with no metric mapping is preserved as config, not dropped."""
+    wrapper, pluto_run = _make_wrapper()
+
+    wrapper.log({'meta/info': {'kind': 'resume', 'attempt': 3}, 'note': None})
+
+    cfg = pluto_run.update_config.call_args.args[0]
+    assert cfg['meta/info'] == {'kind': 'resume', 'attempt': 3}
+    assert cfg['note'] is None
+    assert not pluto_run.log.called  # no numeric metrics in this call


Add a unit test to verify that redundant config updates are skipped to avoid performance bottlenecks.

def test_json_serializable_unmapped_value_falls_back_to_config(): """A dict/None with no metric mapping is preserved as config, not dropped.""" wrapper, pluto_run = _make_wrapper() wrapper.log({'meta/info': {'kind': 'resume', 'attempt': 3}, 'note': None}) cfg = pluto_run.update_config.call_args.args[0] assert cfg['meta/info'] == {'kind': 'resume', 'attempt': 3} assert cfg['note'] is None assert not pluto_run.log.called # no numeric metrics in this call def test_log_skips_redundant_config_updates(): """Verify that redundant config updates are skipped to avoid performance bottlenecks.""" wrapper, pluto_run = _make_wrapper() # First log: config is updated wrapper.log({'phase': 'train', 'loss': 0.5}) assert pluto_run.update_config.call_count == 1 # Second log with same config value: update_config should NOT be called again pluto_run.update_config.reset_mock() wrapper.log({'phase': 'train', 'loss': 0.4}) assert pluto_run.update_config.call_count == 0 # Third log with changed config value: update_config should be called wrapper.log({'phase': 'val', 'loss': 0.3}) assert pluto_run.update_config.call_count == 1 assert pluto_run.update_config.call_args.args[0] == {'phase': 'val'}

Minimal end-to-end script mirroring linum's WandbLogger.log_checkpoint: numeric step/epoch (as np.int64) plus string r2_path/local_path logged in one wandb.log call, then read back from Pluto. Asserts step/epoch land as metrics and the paths land in config. Fails on pre-fix builds, passes on this branch. Uses wandb mode=disabled so no W&B account is needed.

Three fixes from PR #120 review (Cursor Bugbot + Gemini): 1. Config no longer skipped on metric-log failure. update_config() shared the outer try with pluto_run.log(); a metric/media logging exception jumped to the handler and dropped str/bool config from the same call. Metrics and config now send in independent try blocks. 2. OmegaConf values are storable as config. The fallback gate used plain json.dumps, but update_config normalizes via to_native_config (which deep-converts DictConfig/ListConfig). A logged OmegaConf node was wrongly treated as unforwardable (dropped + Sentry-alerted). _config_storable_value now mirrors update_config's normalization, so OmegaConf is kept while tensors/ndarrays/custom objects still fall through to Sentry. 3. Skip redundant config writes. Logging an unchanged str/bool/config value every step re-triggered update_config (a SQLite write) each step. Dedup against the last synced value via self._last_logged_config, updated only on a successful update_config. Uses a _MISSING sentinel so a first-time None is still sent (None != missing). Tests: dedup skip/change behavior, OmegaConf-node fallback.

claude added 4 commits June 5, 2026 00:30

asaiacai temporarily deployed to integration June 5, 2026 23:52 — with GitHub Actions Inactive

asaiacai had a problem deploying to integration June 5, 2026 23:52 — with GitHub Actions Failure

cursor Bot reviewed Jun 5, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

asaiacai had a problem deploying to integration June 6, 2026 00:01 — with GitHub Actions Failure

asaiacai temporarily deployed to integration June 6, 2026 00:01 — with GitHub Actions Inactive

asaiacai force-pushed the claude/confident-lamport-2yeUY branch from 8864c4c to 3d52d9c Compare June 6, 2026 00:09

asaiacai temporarily deployed to integration June 6, 2026 00:09 — with GitHub Actions Inactive

asaiacai had a problem deploying to integration June 6, 2026 00:09 — with GitHub Actions Failure

asaiacai temporarily deployed to integration June 6, 2026 00:14 — with GitHub Actions Inactive

asaiacai deployed to integration June 6, 2026 00:22 — with GitHub Actions Active

asaiacai temporarily deployed to integration June 6, 2026 00:22 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route string/bool values to Pluto config, handle unmapped types#120

Route string/bool values to Pluto config, handle unmapped types#120
asaiacai wants to merge 5 commits into
mainfrom
claude/confident-lamport-2yeUY

asaiacai commented Jun 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 5, 2026

Uh oh!

cursor Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asaiacai commented Jun 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Testing

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

Pluto log failure skips config

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

Config gate rejects OmegaConf values

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Performance Bottleneck: Redundant Config Updates

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asaiacai commented Jun 5, 2026 •

edited by cursor Bot

Loading