Skip to content

Fix processor options sharing stale column values across calls#582

Merged
dale-wahl merged 2 commits intomasterfrom
copilot/fix-processor-options-columns
Mar 10, 2026
Merged

Fix processor options sharing stale column values across calls#582
dale-wahl merged 2 commits intomasterfrom
copilot/fix-processor-options-columns

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 10, 2026

Processor get_options() could return stale or incorrect column options from a previous call when get_columns() returned [], due to shared mutable class state and incorrect dataset references.

Root Causes

  • Mutable shared state: BasicProcessor.get_options() returned a direct reference to cls.options, allowing callers to mutate the class-level dict, which persisted across calls.
  • Wrong dataset in work(): self.get_options(self.dataset, ...) passed the output dataset (no columns yet), so column-dependent options were never populated with actual column names. Should use self.source_dataset (the input).
  • Missing parent_dataset in get_available_processors(): get_options(config=config) omitted parent_dataset, preventing processors from correctly resolving column-dependent options when determining re-runnability.

Changes

  • backend/lib/processor.py

    • Base get_options() now returns copy.deepcopy(cls.options) instead of a direct reference, preventing mutation of shared class state.
    • work() now calls self.get_options(self.source_dataset, config=self.config) so the correct input dataset (with actual columns) is passed.
  • common/lib/dataset.py

    • get_available_processors() now calls get_options(parent_dataset=self, config=config), passing the parent dataset so processors can evaluate their options in context.
Original prompt

This section details on the original issue you should resolve

<issue_title>Processor options may have incorrect column options</issue_title>
<issue_description>Describe the bug
Discovered in a particular dataset whose map_item returned a {} on the first item. This causes a number of issues in itself, but seemed to first strike get_item_keys which would return an empty list and cause various failures. Some very odd behavoir was noted with the processor options on this bad dataset: they returned different column options than expected. It seems probable that something fails when calling get_columns (which in this case returns a []) and instead of raising an error, the options for another dataset or perhaps call of get_options is used instead. Likely something is not being updated properly due to the [].

Interestingly, I was able to modify get_item_keys to correctly return a list of keys (by iterating map_item until a {} was not returned) and updated the view_datasets preview function to use this method. When doing that, the processor options were properly loaded from that dataset. It is my opinion that they should not since get_columns was not updated and it directly collects keys from the first item in the dataset!</issue_description>

<agent_instructions>verify these comment were met: #363 (comment)

i.e. get_options is always used as opposed to accessing the BasicWorker classes option attribute. Also ensure that wherever possible all attributes are passed (they may not always be available). Note any other considerations.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@dale-wahl It's some dictionary sharing across classes.

When we call get_options, we usually set options = cls.options and then check if parent_dataset and parent_dataset.get_columns(): before updating options. That is going to be False if get_columns returns an empty []. And then get_options just returns the cls.options which is whatever it was set to last.

options = {} is actually in BasicProcessor, so it's possible that all of our subclasses can access the same options. We just happen to be setting them and updating them before we use them. Not ideal and possible difficult to fix without breaking anything else. Unsure.</comment_new>
<comment_new>@dale-wahl
Removing options = {} from BasicProcessor not enough. Somehow options are still sharing. Interestingly, I ran a rank values processor and the column options afterwards were "date", "item", and "value". As if that dataset was being used (is get_options being called on that dataset somewhere? Perhaps to find available sub processors...).</comment_new>
<comment_new>@dale-wahl
This was related: a674d0b

We should:

    1. remove all direct setting of options in processors and ensure get_options is always used (it ought to be at this point unless direct calls creeped back in)
    1. ensure all uses of get_options pass config and parent_dataset as those can create variations in the result</comment_new>

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: dale-wahl <32108944+dale-wahl@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incorrect column options in processor Fix processor options sharing stale column values across calls Mar 10, 2026
@dale-wahl dale-wahl marked this pull request as ready for review March 10, 2026 13:49
@dale-wahl dale-wahl merged commit eb544e6 into master Mar 10, 2026
@dale-wahl dale-wahl deleted the copilot/fix-processor-options-columns branch March 10, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Processor options may have incorrect column options

2 participants