-
Notifications
You must be signed in to change notification settings - Fork 6
Fix lock inversion deadlock in SSE reconnection #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lock inversion deadlock in SSE reconnection #108
Conversation
…outside lock Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request addresses a lock inversion issue in the SSE (Server-Sent Events) error handling logic that could cause deadlocks. The fix refactors _recreate_sse_connection to use a two-phase locking approach: acquiring the lock to save references, performing blocking operations (close/join) outside the lock, then re-acquiring the lock to create the new connection.
Changes:
- Moved potentially blocking
close()andjoin()operations outside the lock to prevent deadlocks when the SSE read thread calls error callbacks that need the same lock - Split the single lock acquisition into two phases: one to save state, one to update state
- Added explanatory comments documenting the locking strategy and rationale
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._sse_manager = SSEManager( | ||
| self.sse_state, | ||
| self.sse_error, | ||
| self.sse_message, |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in log message: "Devcyle" should be "DevCycle" to match the spelling used in most other log messages in this file (e.g., lines 131, 150, 165, 168, 193, 202, 255). While this appears to be a pre-existing issue, since this line was moved as part of the refactoring, it would be a good opportunity to fix the spelling for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in eb7844f (already corrected before these changes).
| def _recreate_sse_connection(self): | ||
| """Recreate the SSE connection with the current config.""" | ||
| # Acquire lock to check state and save references to old connection | ||
| with self._sse_reconnect_lock: | ||
| if self._config is None or self._options.disable_realtime_updates: | ||
| logger.debug( | ||
| "Devcycle: Skipping SSE recreation - no config or updates disabled" | ||
| ) | ||
| return | ||
|
|
||
| try: | ||
| # Close existing connection if present | ||
| if ( | ||
| self._sse_manager is not None | ||
| and self._sse_manager.client is not None | ||
| ): | ||
| self._sse_manager.client.close() | ||
| if self._sse_manager.read_thread.is_alive(): | ||
| self._sse_manager.read_thread.join(timeout=1.0) | ||
| # Save references to old SSE manager and config while holding the lock | ||
| # Note: current_config may become stale if another thread updates _config | ||
| # between releasing and reacquiring the lock, but this is acceptable as | ||
| # the SSE stream will receive updates to sync to the latest config | ||
| old_sse_manager = self._sse_manager | ||
| current_config = self._config | ||
|
|
||
| # Perform potentially blocking operations outside the lock to avoid deadlock | ||
| # The SSE read thread may call sse_error/sse_state which need the lock | ||
| try: | ||
| if old_sse_manager is not None and old_sse_manager.client is not None: | ||
| old_sse_manager.client.close() | ||
| if old_sse_manager.read_thread.is_alive(): | ||
| old_sse_manager.read_thread.join(timeout=1.0) | ||
| except Exception as e: | ||
| logger.debug(f"Devcycle: Error closing old SSE connection: {e}") | ||
|
|
||
| # Re-acquire lock to create new connection and update state | ||
| try: | ||
| with self._sse_reconnect_lock: | ||
| # Create new SSE manager | ||
| self._sse_manager = SSEManager( | ||
| self.sse_state, | ||
| self.sse_error, | ||
| self.sse_message, | ||
| ) | ||
| self._sse_manager.update(self._config) | ||
| logger.info("Devcyle: SSE connection created successfully") | ||
| except Exception as e: | ||
| logger.debug(f"Devcycle: Failed to recreate SSE connection: {e}") | ||
| self._sse_manager.update(current_config) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to _recreate_sse_connection that address the lock inversion issue lack test coverage. The existing test suite (test/managers/test_config_manager.py) only tests with disable_realtime_updates=True, which completely bypasses this code path.
This is particularly important because:
- The fix involves complex concurrency logic with two-phase locking
- The issue being fixed was a deadlock scenario that would only manifest under specific timing conditions
- The behavior during the window between releasing and re-acquiring the lock is critical
Consider adding tests that verify:
- Concurrent calls to
_recreate_sse_connectionare handled correctly - The SSE manager state remains consistent when errors occur during reconnection
- The interaction between
sse_errorcallbacks and the reconnection logic works without deadlocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comprehensive test coverage in 89a8467. Tests now verify: (1) _sse_manager is cleared before blocking operations, (2) latest config is used after re-acquiring lock, (3) concurrent calls are handled safely, and (4) proper behavior when config is cleared during reconnection.
| # Save references to old SSE manager and config while holding the lock | ||
| # Note: current_config may become stale if another thread updates _config | ||
| # between releasing and reacquiring the lock, but this is acceptable as | ||
| # the SSE stream will receive updates to sync to the latest config | ||
| old_sse_manager = self._sse_manager | ||
| current_config = self._config | ||
|
|
||
| # Perform potentially blocking operations outside the lock to avoid deadlock | ||
| # The SSE read thread may call sse_error/sse_state which need the lock | ||
| try: | ||
| if old_sse_manager is not None and old_sse_manager.client is not None: |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential race condition in this implementation. Between releasing the lock at line 72 and re-acquiring it at line 85, other threads could access or modify self._sse_manager. Specifically:
- The
_get_configmethod (lines 144-152) readsself._sse_managerwithout holding the lock - The
closemethod (lines 266-267) accessesself._sse_managerwithout holding the lock - Another thread could call
_recreate_sse_connectionconcurrently
This means self._sse_manager could be replaced or set to None while the saved old_sse_manager is being closed outside the lock. While this may not cause immediate crashes, it creates undefined behavior where the old manager might be replaced mid-cleanup.
Consider setting self._sse_manager = None before releasing the lock at line 72, so that concurrent accesses see a consistent state indicating "no active connection" during the transition. This would also prevent the SSE read thread's error callbacks from trying to use a manager that's being torn down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 741b526. Now setting self._sse_manager = None before releasing the lock, ensuring concurrent accesses see a consistent "no active connection" state during the transition.
| # between releasing and reacquiring the lock, but this is acceptable as | ||
| # the SSE stream will receive updates to sync to the latest config | ||
| old_sse_manager = self._sse_manager | ||
| current_config = self._config | ||
|
|
||
| # Perform potentially blocking operations outside the lock to avoid deadlock | ||
| # The SSE read thread may call sse_error/sse_state which need the lock | ||
| try: | ||
| if old_sse_manager is not None and old_sse_manager.client is not None: | ||
| old_sse_manager.client.close() | ||
| if old_sse_manager.read_thread.is_alive(): | ||
| old_sse_manager.read_thread.join(timeout=1.0) | ||
| except Exception as e: | ||
| logger.debug(f"Devcycle: Error closing old SSE connection: {e}") | ||
|
|
||
| # Re-acquire lock to create new connection and update state | ||
| try: | ||
| with self._sse_reconnect_lock: | ||
| # Create new SSE manager | ||
| self._sse_manager = SSEManager( | ||
| self.sse_state, | ||
| self.sse_error, |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current_config reference is saved while holding the lock and then used outside the lock at line 92. While Python dictionaries are reference types and won't be garbage collected, there's a subtle race condition: self._config could be reassigned by another thread (e.g., in _get_config at line 137) between line 71 and line 92.
This means the SSE manager could be initialized with a stale config that's different from the current self._config. While this may eventually self-correct, it could lead to inconsistent state where the SSE manager is connected to a URL that doesn't match the current configuration.
Consider re-reading self._config inside the second lock acquisition to ensure the SSE manager is always initialized with the most current configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 741b526. Now re-reading self._config when re-acquiring the lock to ensure the SSE manager is always initialized with the most current configuration.
|
@copilot re-work the PR to fix the identified issues. |
…thon-server-sdk into copilot/sub-pr-107
Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
|
@copilot get rid of the lock setup |
Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Could you clarify what you mean by "get rid of the lock setup"? The current implementation uses a two-phase locking approach to prevent deadlocks. Are you asking to:
The lock is needed because the SSE read thread can call |
_sse_manager = Nonebefore releasing lock (741b526)self._configwhen re-acquiring lock (741b526)💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.