Skip to content

Bump flame-sdk version and update project dependencies#13

Merged
antidodo merged 62 commits intomainfrom
canary
Mar 17, 2026
Merged

Bump flame-sdk version and update project dependencies#13
antidodo merged 62 commits intomainfrom
canary

Conversation

@antidodo
Copy link
Member

@antidodo antidodo commented Mar 17, 2026

This pull request introduces improvements to error handling and logging in the StarModelTester and MockFlameCoreSDK, as well as updates dependencies and versioning in pyproject.toml. The most significant changes are the enhanced thread error reporting and iteration tracking, which improve debugging and reliability during test execution.

Error Handling and Reporting:

  • Added detailed exception handling and stack trace reporting for node threads in StarModelTester, including storing errors and printing them if all threads fail. [1] [2]
  • Integrated MockFlameCoreSDK.stop_event to coordinate thread stopping and error propagation between threads. [1] [2]

Logging and Iteration Tracking:

  • Introduced IterationTracker class for accurate iteration counting in MockFlameCoreSDK, replacing the previous integer-based approach.
  • Enhanced log output in MockFlameCoreSDK.__pop_logs__ to display iteration numbers and error messages when exceptions occur.

Dependency and Version Updates:

  • Updated project version to 0.6.1 and bumped flamesdk dependency to tag 0.4.2 i

Summary by CodeRabbit

  • Bug Fixes

    • Improved multi-threaded test runs: per-thread error capture, coordinated stop-on-failure, and aggregated result collection so partial failures no longer produce incomplete outputs.
    • Enhanced logging: clearer iteration-aware logs and explicit failure messages with stack-trace detail and per-thread diagnostics.
  • Chores

    • Bumped project version to 0.6.1 and updated SDK dependency.

antidodo and others added 30 commits August 7, 2025 11:26
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
antidodo and others added 25 commits January 21, 2026 13:45
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
…hutdown

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
…ysis

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
# Conflicts:
#	poetry.lock
#	pyproject.toml
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
# Conflicts:
#	poetry.lock
#	pyproject.toml
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
# Conflicts:
#	poetry.lock
#	pyproject.toml
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
…testing

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
# Conflicts:
#	flame/star/star_model_tester.py
#	flame/utils/mock_flame_core.py
#	poetry.lock
#	pyproject.toml
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

Adds per-thread error capture and coordinated stop signaling for multi-threaded test runs, introduces an IterationTracker and stop_event in the mock core, and bumps project version and flamesdk dependency.

Changes

Cohort / File(s) Summary
Thread error handling & result collection
flame/star/star_model_tester.py
Adds per-thread thread_errors and results_queue, wraps flame construction in try/except to record failures, sets a shared MockFlameCoreSDK.stop_event on error, and conditions final result write on results_queue presence.
Iteration tracking & stop signaling
flame/utils/mock_flame_core.py
Adds IterationTracker class, replaces integer num_iterations with tracker, adds stop_event, changes __pop_logs__ signature to accept failure_message, and adds __get_iteration__ method.
Version & dependency
pyproject.toml
Bumps project version 0.6.0 → 0.6.1 and flamesdk dependency 0.4.1 → 0.4.2.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as Tester (main)
    participant Thread as Worker Thread
    participant Flame as Flame Instance
    participant MockCore as MockFlameCoreSDK

    Tester->>Thread: spawn run_node(role, node_id)
    Thread->>MockCore: create MockFlameCoreSDK (shares stop_event)
    Thread->>Flame: attempt construct/run
    alt Flame construction succeeds
        Flame-->>Thread: final_results_storage
        Thread->>Tester: push results_queue
    else Flame construction fails
        Flame-->>Thread: raises Exception
        Thread->>MockCore: set stop_event (notify others)
        Thread->>Tester: record thread_errors (stack trace)
    end
    Tester->>Thread: join all threads
    Tester->>Tester: if results_queue non-empty -> write final results else -> print error summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through threads both near and far,

caught crashes, set a shared red star.
Iteration counts I tally true,
stop-event bells ring when errs ensue.
A carrot-coded cheer for the test crew!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title focuses narrowly on dependency bumping, but the actual changes include significant error handling improvements in StarModelTester and new IterationTracker functionality in MockFlameCoreSDK. Consider a more comprehensive title such as 'Improve error handling and iteration tracking with dependency updates' that reflects the substantial changes beyond just version bumps.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch canary
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
flame/utils/mock_flame_core.py (3)

216-220: Improve exception handling: provide message and chain exceptions.

The bare raise Exception (Line 218) loses context and makes debugging difficult. Consider providing a meaningful message and chaining from the original KeyError.

♻️ Proposed fix
         except KeyError:
             if self.stop_event:
-                raise Exception
+                raise Exception(f"Thread stopped due to failure in node {self.stop_event[0]}") from None
             time.sleep(.01)
             pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/utils/mock_flame_core.py` around lines 216 - 220, Replace the bare
re-raise in the KeyError handler with an exception that preserves context: catch
KeyError as e, and when self.stop_event is truthy raise a new exception (e.g.,
RuntimeError or Exception) with a clear message describing what failed (mention
the operation or key lookup) and use exception chaining (raise ... from e) so
the original KeyError is preserved; update the except block around the code that
references self.stop_event in mock_flame_core.py accordingly.

344-344: Rename unused loop variable v to _v.

The variable v is not used within the loop body. By convention, prefix unused variables with underscore.

♻️ Proposed fix
-        for k, v in self.logger.items():
+        for k, _v in self.logger.items():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/utils/mock_flame_core.py` at line 344, Rename the unused loop variable
v to _v in the for k, v in self.logger.items() loop to follow the
unused-variable convention; locate the loop in the mock_flame_core class where
self.logger.items() is iterated and change the second tuple variable from v to
_v (e.g., for k, _v in self.logger.items()), and adjust any subsequent
references (none expected) or tests/lint rules if they assert variable names.

47-55: IterationTracker class is simple but not thread-safe.

The increment() method performs a non-atomic read-modify-write on self.iter. Since multiple threads may call __pop_logs__ which calls increment(), this could lead to lost increments (though the practical impact is minimal for logging purposes).

🔧 Optional: Thread-safe increment
+import threading
+
 class IterationTracker:
     def __init__(self):
         self.iter = 0
+        self._lock = threading.Lock()

     def increment(self):
-        self.iter += 1
+        with self._lock:
+            self.iter += 1

     def get_iterations(self):
-        return self.iter
+        with self._lock:
+            return self.iter
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/utils/mock_flame_core.py` around lines 47 - 55, The
IterationTracker.increment() does a non-atomic read-modify-write and must be
made thread-safe because __pop_logs__ may call it from multiple threads; add a
threading.Lock (e.g., self._lock) in IterationTracker.__init__ and acquire it
around increment() (and optionally around get_iterations()) to protect self.iter
from concurrent updates, leaving the class API (increment, get_iterations)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flame/star/star_model_tester.py`:
- Around line 69-74: Reset the shared MockFlameCoreSDK class-level mutable state
at the start of StarModelTester.__init__ (before threads are created) to avoid
cross-test contamination: set MockFlameCoreSDK.stop_event to an empty list,
MockFlameCoreSDK.message_broker to an empty dict,
MockFlameCoreSDK.final_results_storage to None, and
MockFlameCoreSDK.num_iterations to a fresh IterationTracker() instance; update
StarModelTester.__init__ to perform these resets so await_messages and other
methods see a clean slate for each test run.
- Around line 33-34: thread_errors and results_queue are accessed by multiple
threads without synchronization; replace results_queue list with a thread-safe
queue.Queue instance and protect modifications to thread_errors with a
threading.Lock (or alternatively push error records into the same Queue).
Concretely, import queue and threading, change results_queue = [] to
results_queue = queue.Queue(), replace all results_queue.append(...) calls with
results_queue.put(...), introduce thread_errors_lock = threading.Lock() and wrap
any reads/writes to thread_errors (e.g., thread_errors[...] = err or del
thread_errors[...]) inside with thread_errors_lock: blocks (or convert
thread_errors to a queue of (thread_id, error) tuples and avoid the dict
entirely).
- Around line 68-81: The code has a TOCTOU race on MockFlameCoreSDK.stop_event
and stores Exception objects in thread_errors; fix by making the
check-and-append atomic using a lock (e.g., use or add
MockFlameCoreSDK.stop_event_lock and wrap the check and append in a with
MockFlameCoreSDK.stop_event_lock: block) so only one thread can claim first
failure, and ensure you store plain string messages in thread_errors (replace
Exception("Another thread...") with the message string) and include the
formatted stack_trace string for the winning thread; update the block around
stop_event, stop_event.append(...), and thread_errors assignments accordingly.

In `@flame/utils/mock_flame_core.py`:
- Around line 59-63: The class-level mutable state (num_iterations, logger,
message_broker, final_results_storage, stop_event) is shared across instances
and must be reset between tests; update StarModelTester to either (a)
reinitialize those attributes at the start of StarModelTester.__init__ by
setting num_iterations = IterationTracker(), logger = {}, message_broker = {},
final_results_storage = None, stop_event = [] or (b) add a classmethod
reset_state() that performs those exact assignments (reset_state should set
cls.num_iterations = IterationTracker(), cls.logger = {}, cls.message_broker =
{}, cls.final_results_storage = None, cls.stop_event = []) and ensure tests call
StarModelTester.reset_state() before each run. Ensure you reference the
IterationTracker type when reinitializing.

---

Nitpick comments:
In `@flame/utils/mock_flame_core.py`:
- Around line 216-220: Replace the bare re-raise in the KeyError handler with an
exception that preserves context: catch KeyError as e, and when self.stop_event
is truthy raise a new exception (e.g., RuntimeError or Exception) with a clear
message describing what failed (mention the operation or key lookup) and use
exception chaining (raise ... from e) so the original KeyError is preserved;
update the except block around the code that references self.stop_event in
mock_flame_core.py accordingly.
- Line 344: Rename the unused loop variable v to _v in the for k, v in
self.logger.items() loop to follow the unused-variable convention; locate the
loop in the mock_flame_core class where self.logger.items() is iterated and
change the second tuple variable from v to _v (e.g., for k, _v in
self.logger.items()), and adjust any subsequent references (none expected) or
tests/lint rules if they assert variable names.
- Around line 47-55: The IterationTracker.increment() does a non-atomic
read-modify-write and must be made thread-safe because __pop_logs__ may call it
from multiple threads; add a threading.Lock (e.g., self._lock) in
IterationTracker.__init__ and acquire it around increment() (and optionally
around get_iterations()) to protect self.iter from concurrent updates, leaving
the class API (increment, get_iterations) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab424b44-c620-45d7-a084-14f1ba4012ae

📥 Commits

Reviewing files that changed from the base of the PR and between 41d31f0 and a972033.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • flame/star/star_model_tester.py
  • flame/utils/mock_flame_core.py
  • pyproject.toml

Comment on lines +33 to +34
thread_errors = {}
results_queue = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Thread-safety issue: thread_errors and results_queue are shared across threads without synchronization.

Both thread_errors (dict) and results_queue (list) are accessed concurrently by multiple threads. While CPython's GIL makes individual operations like list.append() and dict.__setitem__() atomic, relying on this is fragile and non-portable. Consider using threading.Lock or thread-safe collections like queue.Queue for results_queue.

🔒 Suggested fix using Queue
 import pickle
 import threading
 import uuid
+from queue import Queue
 from typing import Any, Type, Literal, Optional, Union
 import traceback
 ...
         threads = []
         thread_errors = {}
-        results_queue = []
+        results_queue = Queue()
 ...
-                    results_queue.append(flame.final_results_storage)
+                    results_queue.put(flame.final_results_storage)
 ...
-        if results_queue:
-            self.write_result(results_queue[0], output_type, result_filepath, multiple_results)
+        if not results_queue.empty():
+            self.write_result(results_queue.get(), output_type, result_filepath, multiple_results)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
thread_errors = {}
results_queue = []
thread_errors = {}
results_queue = Queue()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/star/star_model_tester.py` around lines 33 - 34, thread_errors and
results_queue are accessed by multiple threads without synchronization; replace
results_queue list with a thread-safe queue.Queue instance and protect
modifications to thread_errors with a threading.Lock (or alternatively push
error records into the same Queue). Concretely, import queue and threading,
change results_queue = [] to results_queue = queue.Queue(), replace all
results_queue.append(...) calls with results_queue.put(...), introduce
thread_errors_lock = threading.Lock() and wrap any reads/writes to thread_errors
(e.g., thread_errors[...] = err or del thread_errors[...]) inside with
thread_errors_lock: blocks (or convert thread_errors to a queue of (thread_id,
error) tuples and avoid the dict entirely).

Comment on lines +68 to +81
except Exception:
stop_event = MockFlameCoreSDK.stop_event
if not stop_event:
stack_trace = traceback.format_exc()#.replace('\n', '\\n').replace('\t', '\\t')
thread_errors[(kwargs['test_kwargs']['role'],
kwargs['test_kwargs']['node_id'])] = f"\033[31m{stack_trace}\033[0m"
stop_event.append(kwargs['test_kwargs']['node_id'])
mock = MockFlameCoreSDK(test_kwargs=kwargs['test_kwargs'])
mock.__pop_logs__(failure_message=True)
else:
thread_errors[(kwargs['test_kwargs']['role'],
kwargs['test_kwargs']['node_id'])] = (Exception("Another thread already failed, "
"stopping this thread as well."))
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TOCTOU race condition on stop_event check-then-append.

The check if not stop_event (Line 70) and subsequent stop_event.append() (Line 74) is not atomic. Multiple threads could pass the check simultaneously before any append occurs, causing multiple threads to believe they are the "first" to fail and each recording their stack traces.

Additionally, Line 79-80 stores an Exception object directly rather than a string, which will print as Exception("Another thread...") rather than the message itself.

🔒 Suggested fix for atomicity and consistent error format
+        error_lock = threading.Lock()
+        first_failure = [False]  # Use list for mutable closure
 ...
             except Exception:
                 stop_event = MockFlameCoreSDK.stop_event
-                if not stop_event:
-                    stack_trace = traceback.format_exc()
-                    thread_errors[(kwargs['test_kwargs']['role'],
-                                   kwargs['test_kwargs']['node_id'])] = f"\033[31m{stack_trace}\033[0m"
-                    stop_event.append(kwargs['test_kwargs']['node_id'])
-                    mock = MockFlameCoreSDK(test_kwargs=kwargs['test_kwargs'])
-                    mock.__pop_logs__(failure_message=True)
-                else:
-                    thread_errors[(kwargs['test_kwargs']['role'],
-                                   kwargs['test_kwargs']['node_id'])] = (Exception("Another thread already failed, "
-                                                                                   "stopping this thread as well."))
+                with error_lock:
+                    is_first = not first_failure[0]
+                    if is_first:
+                        first_failure[0] = True
+                        stop_event.append(kwargs['test_kwargs']['node_id'])
+                if is_first:
+                    stack_trace = traceback.format_exc()
+                    thread_errors[(kwargs['test_kwargs']['role'],
+                                   kwargs['test_kwargs']['node_id'])] = f"\033[31m{stack_trace}\033[0m"
+                    mock = MockFlameCoreSDK(test_kwargs=kwargs['test_kwargs'])
+                    mock.__pop_logs__(failure_message=True)
+                else:
+                    thread_errors[(kwargs['test_kwargs']['role'],
+                                   kwargs['test_kwargs']['node_id'])] = "Another thread already failed, stopping this thread as well."
                 return
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 68-68: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/star/star_model_tester.py` around lines 68 - 81, The code has a TOCTOU
race on MockFlameCoreSDK.stop_event and stores Exception objects in
thread_errors; fix by making the check-and-append atomic using a lock (e.g., use
or add MockFlameCoreSDK.stop_event_lock and wrap the check and append in a with
MockFlameCoreSDK.stop_event_lock: block) so only one thread can claim first
failure, and ensure you store plain string messages in thread_errors (replace
Exception("Another thread...") with the message string) and include the
formatted stack_trace string for the winning thread; update the block around
stop_event, stop_event.append(...), and thread_errors assignments accordingly.

Comment on lines +69 to +74
stop_event = MockFlameCoreSDK.stop_event
if not stop_event:
stack_trace = traceback.format_exc()#.replace('\n', '\\n').replace('\t', '\\t')
thread_errors[(kwargs['test_kwargs']['role'],
kwargs['test_kwargs']['node_id'])] = f"\033[31m{stack_trace}\033[0m"
stop_event.append(kwargs['test_kwargs']['node_id'])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Class-level stop_event is not reset between test runs.

MockFlameCoreSDK.stop_event is a class-level mutable list that persists across all instances. If a test fails and appends to stop_event, subsequent StarModelTester instantiations will see the non-empty list, causing all threads to immediately fail in await_messages.

Consider resetting MockFlameCoreSDK.stop_event = [] at the start of StarModelTester.__init__ (along with other class-level state like message_broker and final_results_storage).

🐛 Proposed fix to reset shared state

Add at the beginning of __init__, before creating threads:

# Reset shared mock state for fresh test run
MockFlameCoreSDK.stop_event = []
MockFlameCoreSDK.message_broker = {}
MockFlameCoreSDK.final_results_storage = None
MockFlameCoreSDK.num_iterations = IterationTracker()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/star/star_model_tester.py` around lines 69 - 74, Reset the shared
MockFlameCoreSDK class-level mutable state at the start of
StarModelTester.__init__ (before threads are created) to avoid cross-test
contamination: set MockFlameCoreSDK.stop_event to an empty list,
MockFlameCoreSDK.message_broker to an empty dict,
MockFlameCoreSDK.final_results_storage to None, and
MockFlameCoreSDK.num_iterations to a fresh IterationTracker() instance; update
StarModelTester.__init__ to perform these resets so await_messages and other
methods see a clean slate for each test run.

Comment on lines +59 to +63
num_iterations: IterationTracker = IterationTracker()
logger: dict[str, list[str]] = {}
message_broker: dict[str, list[dict[str, Any]]] = {}
final_results_storage: Optional[Any] = None
stop_event: list[tuple[str]] = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Class-level mutable attributes persist across test runs and may cause test pollution.

The class-level num_iterations, logger, message_broker, final_results_storage, and stop_event are shared across all instances and persist between test runs. This is intentional for intra-test thread communication, but problematic for test isolation:

  • num_iterations will accumulate across multiple StarModelTester instantiations
  • stop_event will remain populated after failures (flagged in star_model_tester.py)

Consider either:

  1. Resetting these in StarModelTester.__init__ before each test run, or
  2. Adding a class method reset_state() to be called before tests
🔧 Proposed class reset method
`@classmethod`
def reset_state(cls):
    """Reset all shared state for a fresh test run."""
    cls.num_iterations = IterationTracker()
    cls.logger = {}
    cls.message_broker = {}
    cls.final_results_storage = None
    cls.stop_event = []
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 60-60: Mutable default value for class attribute

(RUF012)


[warning] 61-61: Mutable default value for class attribute

(RUF012)


[warning] 63-63: Mutable default value for class attribute

(RUF012)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/utils/mock_flame_core.py` around lines 59 - 63, The class-level mutable
state (num_iterations, logger, message_broker, final_results_storage,
stop_event) is shared across instances and must be reset between tests; update
StarModelTester to either (a) reinitialize those attributes at the start of
StarModelTester.__init__ by setting num_iterations = IterationTracker(), logger
= {}, message_broker = {}, final_results_storage = None, stop_event = [] or (b)
add a classmethod reset_state() that performs those exact assignments
(reset_state should set cls.num_iterations = IterationTracker(), cls.logger =
{}, cls.message_broker = {}, cls.final_results_storage = None, cls.stop_event =
[]) and ensure tests call StarModelTester.reset_state() before each run. Ensure
you reference the IterationTracker type when reinitializing.

Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
flame/star/star_model_tester.py (1)

79-81: ⚠️ Potential issue | 🟡 Minor

Store a string message instead of an Exception object.

Lines 79-81 store an Exception object in thread_errors. When printed at line 100, this will display as Exception('Another thread already failed, ...') rather than just the message text.

🐛 Proposed fix
                 else:
-                    thread_errors[(kwargs['test_kwargs']['role'],
-                                   kwargs['test_kwargs']['node_id'])] = (Exception("Another thread already failed, "
-                                                                                   "stopping this thread as well."))
+                    thread_errors[(kwargs['test_kwargs']['role'],
+                                   kwargs['test_kwargs']['node_id'])] = "Another thread already failed, stopping this thread as well."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/star/star_model_tester.py` around lines 79 - 81, thread_errors
currently stores an Exception object for the key (kwargs['test_kwargs']['role'],
kwargs['test_kwargs']['node_id']) which later gets printed and shows the
Exception wrapper; change the stored value to just the error message string
instead of Exception(...) (e.g., "Another thread already failed, stopping this
thread as well.") so printing shows the plain message; update the assignment in
the block that sets thread_errors and ensure any downstream consumers of
thread_errors (the code that prints the error) expect a string rather than an
Exception object, e.g., by leaving their print/format logic unchanged.
🧹 Nitpick comments (1)
flame/star/star_model_tester.py (1)

76-77: Avoid calling dunder methods directly.

__pop_logs__ is a dunder (double-underscore) method, which by convention indicates internal/private behavior. Calling it directly (mock.__pop_logs__(...)) is unconventional. Consider renaming to a single-underscore method (_pop_logs) or a public method (pop_logs) if it's intended to be called externally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flame/star/star_model_tester.py` around lines 76 - 77, The review flags a
direct call to the dunder method __pop_logs__ on MockFlameCoreSDK; change
external usage to call a non-dunder method instead and update the class
accordingly: rename or add a public method (e.g., pop_logs or _pop_logs) on
MockFlameCoreSDK to expose the same behavior, update the call site from
mock.__pop_logs__(failure_message=True) to mock.pop_logs(failure_message=True)
(or mock._pop_logs(...) if you want semi-private), and ensure any internal
implementation still delegates to the original logic so behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@flame/star/star_model_tester.py`:
- Around line 79-81: thread_errors currently stores an Exception object for the
key (kwargs['test_kwargs']['role'], kwargs['test_kwargs']['node_id']) which
later gets printed and shows the Exception wrapper; change the stored value to
just the error message string instead of Exception(...) (e.g., "Another thread
already failed, stopping this thread as well.") so printing shows the plain
message; update the assignment in the block that sets thread_errors and ensure
any downstream consumers of thread_errors (the code that prints the error)
expect a string rather than an Exception object, e.g., by leaving their
print/format logic unchanged.

---

Nitpick comments:
In `@flame/star/star_model_tester.py`:
- Around line 76-77: The review flags a direct call to the dunder method
__pop_logs__ on MockFlameCoreSDK; change external usage to call a non-dunder
method instead and update the class accordingly: rename or add a public method
(e.g., pop_logs or _pop_logs) on MockFlameCoreSDK to expose the same behavior,
update the call site from mock.__pop_logs__(failure_message=True) to
mock.pop_logs(failure_message=True) (or mock._pop_logs(...) if you want
semi-private), and ensure any internal implementation still delegates to the
original logic so behavior is preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93fba091-e019-4d64-834b-561a15f5df9d

📥 Commits

Reviewing files that changed from the base of the PR and between a972033 and b65c69e.

📒 Files selected for processing (1)
  • flame/star/star_model_tester.py

@antidodo antidodo merged commit 74e9c71 into main Mar 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants