Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 27, 2026

  • Fix typo: "Devcyle" → "DevCycle" in log message (eb7844f)
  • Fix race condition: Set _sse_manager = None before releasing lock (741b526)
  • Fix stale config: Re-read self._config when re-acquiring lock (741b526)
  • Add test coverage for concurrent reconnection scenarios (89a8467)
  • Validate changes with tests and linter
  • Run security scan (no issues found)
  • Reply to all review comments

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…outside lock

Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
@luxscious luxscious marked this pull request as ready for review January 27, 2026 20:48
@luxscious luxscious requested a review from a team as a code owner January 27, 2026 20:48
Copilot AI review requested due to automatic review settings January 27, 2026 20:48
@luxscious luxscious requested a review from JamieSinn January 27, 2026 20:50
Copilot AI and others added 2 commits January 27, 2026 20:51
Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses a lock inversion issue in the SSE (Server-Sent Events) error handling logic that could cause deadlocks. The fix refactors _recreate_sse_connection to use a two-phase locking approach: acquiring the lock to save references, performing blocking operations (close/join) outside the lock, then re-acquiring the lock to create the new connection.

Changes:

  • Moved potentially blocking close() and join() operations outside the lock to prevent deadlocks when the SSE read thread calls error callbacks that need the same lock
  • Split the single lock acquisition into two phases: one to save state, one to update state
  • Added explanatory comments documenting the locking strategy and rationale

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self._sse_manager = SSEManager(
self.sse_state,
self.sse_error,
self.sse_message,
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in log message: "Devcyle" should be "DevCycle" to match the spelling used in most other log messages in this file (e.g., lines 131, 150, 165, 168, 193, 202, 255). While this appears to be a pre-existing issue, since this line was moved as part of the refactoring, it would be a good opportunity to fix the spelling for consistency.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eb7844f (already corrected before these changes).

Comment on lines 59 to 95
def _recreate_sse_connection(self):
"""Recreate the SSE connection with the current config."""
# Acquire lock to check state and save references to old connection
with self._sse_reconnect_lock:
if self._config is None or self._options.disable_realtime_updates:
logger.debug(
"Devcycle: Skipping SSE recreation - no config or updates disabled"
)
return

try:
# Close existing connection if present
if (
self._sse_manager is not None
and self._sse_manager.client is not None
):
self._sse_manager.client.close()
if self._sse_manager.read_thread.is_alive():
self._sse_manager.read_thread.join(timeout=1.0)
# Save references to old SSE manager and config while holding the lock
# Note: current_config may become stale if another thread updates _config
# between releasing and reacquiring the lock, but this is acceptable as
# the SSE stream will receive updates to sync to the latest config
old_sse_manager = self._sse_manager
current_config = self._config

# Perform potentially blocking operations outside the lock to avoid deadlock
# The SSE read thread may call sse_error/sse_state which need the lock
try:
if old_sse_manager is not None and old_sse_manager.client is not None:
old_sse_manager.client.close()
if old_sse_manager.read_thread.is_alive():
old_sse_manager.read_thread.join(timeout=1.0)
except Exception as e:
logger.debug(f"Devcycle: Error closing old SSE connection: {e}")

# Re-acquire lock to create new connection and update state
try:
with self._sse_reconnect_lock:
# Create new SSE manager
self._sse_manager = SSEManager(
self.sse_state,
self.sse_error,
self.sse_message,
)
self._sse_manager.update(self._config)
logger.info("Devcyle: SSE connection created successfully")
except Exception as e:
logger.debug(f"Devcycle: Failed to recreate SSE connection: {e}")
self._sse_manager.update(current_config)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to _recreate_sse_connection that address the lock inversion issue lack test coverage. The existing test suite (test/managers/test_config_manager.py) only tests with disable_realtime_updates=True, which completely bypasses this code path.

This is particularly important because:

  1. The fix involves complex concurrency logic with two-phase locking
  2. The issue being fixed was a deadlock scenario that would only manifest under specific timing conditions
  3. The behavior during the window between releasing and re-acquiring the lock is critical

Consider adding tests that verify:

  • Concurrent calls to _recreate_sse_connection are handled correctly
  • The SSE manager state remains consistent when errors occur during reconnection
  • The interaction between sse_error callbacks and the reconnection logic works without deadlocks

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comprehensive test coverage in 89a8467. Tests now verify: (1) _sse_manager is cleared before blocking operations, (2) latest config is used after re-acquiring lock, (3) concurrent calls are handled safely, and (4) proper behavior when config is cleared during reconnection.

Comment on lines 69 to 79
# Save references to old SSE manager and config while holding the lock
# Note: current_config may become stale if another thread updates _config
# between releasing and reacquiring the lock, but this is acceptable as
# the SSE stream will receive updates to sync to the latest config
old_sse_manager = self._sse_manager
current_config = self._config

# Perform potentially blocking operations outside the lock to avoid deadlock
# The SSE read thread may call sse_error/sse_state which need the lock
try:
if old_sse_manager is not None and old_sse_manager.client is not None:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential race condition in this implementation. Between releasing the lock at line 72 and re-acquiring it at line 85, other threads could access or modify self._sse_manager. Specifically:

  1. The _get_config method (lines 144-152) reads self._sse_manager without holding the lock
  2. The close method (lines 266-267) accesses self._sse_manager without holding the lock
  3. Another thread could call _recreate_sse_connection concurrently

This means self._sse_manager could be replaced or set to None while the saved old_sse_manager is being closed outside the lock. While this may not cause immediate crashes, it creates undefined behavior where the old manager might be replaced mid-cleanup.

Consider setting self._sse_manager = None before releasing the lock at line 72, so that concurrent accesses see a consistent state indicating "no active connection" during the transition. This would also prevent the SSE read thread's error callbacks from trying to use a manager that's being torn down.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 741b526. Now setting self._sse_manager = None before releasing the lock, ensuring concurrent accesses see a consistent "no active connection" state during the transition.

Comment on lines 71 to 92
# between releasing and reacquiring the lock, but this is acceptable as
# the SSE stream will receive updates to sync to the latest config
old_sse_manager = self._sse_manager
current_config = self._config

# Perform potentially blocking operations outside the lock to avoid deadlock
# The SSE read thread may call sse_error/sse_state which need the lock
try:
if old_sse_manager is not None and old_sse_manager.client is not None:
old_sse_manager.client.close()
if old_sse_manager.read_thread.is_alive():
old_sse_manager.read_thread.join(timeout=1.0)
except Exception as e:
logger.debug(f"Devcycle: Error closing old SSE connection: {e}")

# Re-acquire lock to create new connection and update state
try:
with self._sse_reconnect_lock:
# Create new SSE manager
self._sse_manager = SSEManager(
self.sse_state,
self.sse_error,
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current_config reference is saved while holding the lock and then used outside the lock at line 92. While Python dictionaries are reference types and won't be garbage collected, there's a subtle race condition: self._config could be reassigned by another thread (e.g., in _get_config at line 137) between line 71 and line 92.

This means the SSE manager could be initialized with a stale config that's different from the current self._config. While this may eventually self-correct, it could lead to inconsistent state where the SSE manager is connected to a URL that doesn't match the current configuration.

Consider re-reading self._config inside the second lock acquisition to ensure the SSE manager is always initialized with the most current configuration.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 741b526. Now re-reading self._config when re-acquiring the lock to ensure the SSE manager is always initialized with the most current configuration.

@luxscious luxscious marked this pull request as draft January 27, 2026 20:53
Copilot AI changed the title [WIP] Address feedback on SSE error handling fix Fix lock inversion deadlock in SSE reconnection Jan 27, 2026
@JamieSinn
Copy link
Member

@copilot re-work the PR to fix the identified issues.

luxscious and others added 2 commits January 27, 2026 16:54
@luxscious
Copy link
Contributor

@copilot get rid of the lock setup

Co-authored-by: JamieSinn <1538232+JamieSinn@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 27, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • config-cdn.devcycle.com
    • Triggering command: /usr/bin/python python -m pytest test/managers/test_config_manager.py -v (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Contributor Author

Copilot AI commented Jan 27, 2026

@copilot get rid of the lock setup

Could you clarify what you mean by "get rid of the lock setup"? The current implementation uses a two-phase locking approach to prevent deadlocks. Are you asking to:

  1. Revert to the original single-lock implementation (which had the deadlock issue)?
  2. Remove all locking entirely (which would create race conditions)?
  3. Use a different synchronization approach?

The lock is needed because the SSE read thread can call sse_error/sse_state callbacks that need to acquire the same lock, so holding it during blocking operations (close/join) can cause deadlock. Please let me know which approach you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants