fix: resolve CI test failures (CORS, HSTS, presence)#882
Conversation
Tests that override the environment to Production via WithWebHostBuilder need a non-placeholder JWT secret because Production mode validates that the secret is not a known placeholder (FirstRunBootstrapper). Without this, the test host fails to start with a security validation error.
Same root cause as the CORS tests: the HSTS test overrides the environment to Production, which triggers JWT secret validation that rejects the placeholder key. Provide a real test secret.
Increase timeouts from 10s to 20s and check all collected snapshots (not just the last one) so that out-of-order SignalR delivery on slow CI runners does not cause spurious timeout failures.
Adversarial self-reviewWhat could go wrong
VerdictLow-risk test-infrastructure fix. No production code changes. All 4,477 backend tests pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7048790988
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount) | ||
| ?? all.LastOrDefault(s => s.Members.Count >= expectedMemberCount); |
There was a problem hiding this comment.
Wait for exact presence count before returning
WaitForPresenceCountAsync now returns a snapshot when Members.Count >= expectedMemberCount, which breaks the leave-path semantics where the expected count decreases. In RapidJoinLeave_EventuallyConsistent, if only an intermediate snapshot (e.g. pre-leave count) has arrived, this branch returns immediately and the subsequent exact-count assertion fails even though the correct snapshot could arrive moments later; this reintroduces CI flakiness instead of reducing it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request improves test stability on slow CI runners by increasing timeouts and refining the logic for waiting on board presence snapshots. It also introduces a non-placeholder JWT secret for tests running in Production mode to satisfy security validation requirements. A logic error was identified in the snapshot matching logic where a greater-than-or-equal check could cause premature failures during leave operations, and it is recommended to centralize the duplicated JWT secret constant to improve maintainability.
| var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount) | ||
| ?? all.LastOrDefault(s => s.Members.Count >= expectedMemberCount); |
There was a problem hiding this comment.
The addition of the fallback ?? all.LastOrDefault(s => s.Members.Count >= expectedMemberCount) introduces a logic error when waiting for a decrease in member count (e.g., after users leave a board). If the current count is already greater than the expected count, the method will return immediately with the current snapshot instead of waiting for the count to drop. Since the subsequent assertions (like on line 174) expect an exact match, this will cause tests to fail prematurely. It is safer to only check for the exact match across the collected snapshots to handle out-of-order delivery without breaking the decreasing case.
var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount);| private const string ProductionTestJwtSecret = | ||
| "VGVzdE9ubHlKd3RTZWNyZXRGb3JQcm9kdWN0aW9uTW9kZVRlc3Rz"; |
There was a problem hiding this comment.
The ProductionTestJwtSecret constant is duplicated in SecurityHeadersApiTests.cs. To improve maintainability and ensure consistency across the test suite, consider moving this constant to a shared location, such as TestWebApplicationFactory or a dedicated test constants class. Alternatively, you could set a default non-placeholder JWT secret in TestWebApplicationFactory.ConfigureAppConfiguration so that individual tests don't need to override it manually when switching to the Production environment.
| private const string ProductionTestJwtSecret = | ||
| "VGVzdE9ubHlKd3RTZWNyZXRGb3JQcm9kdWN0aW9uTW9kZVRlc3Rz"; |
There was a problem hiding this comment.
Move the duplicated JWT secret constant from CorsApiTests and SecurityHeadersApiTests into ApiTestHarness.ProductionTestJwtSecret so future Production-mode tests can reference a single location.
There was a problem hiding this comment.
Pull request overview
Fixes CI-only backend test failures caused by Production-mode JWT secret validation and flaky SignalR presence timing under load.
Changes:
- Injects a non-placeholder
Jwt:SecretKeyinto tests that forceProductionenvironment so the in-memory host can start. - Makes the board presence concurrency test more tolerant to slow/out-of-order SignalR delivery by increasing timeouts and scanning collected snapshots.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| backend/tests/Taskdeck.Api.Tests/SecurityHeadersApiTests.cs | Supplies a non-placeholder JWT secret when forcing Production to keep HSTS test host startup passing. |
| backend/tests/Taskdeck.Api.Tests/CorsApiTests.cs | Supplies a non-placeholder JWT secret when forcing Production so CORS tests can run. |
| backend/tests/Taskdeck.Api.Tests/Concurrency/BoardPresenceConcurrencyTests.cs | Adjusts presence polling logic and timeouts to reduce CI flakiness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount) | ||
| ?? all.LastOrDefault(s => s.Members.Count >= expectedMemberCount); |
There was a problem hiding this comment.
WaitForPresenceCountAsync is documented/named as waiting for an exact expected member count, but the implementation can return a snapshot where Members.Count is greater than expected (the ">= expectedMemberCount" fallback). That breaks the method contract and can cause this test to fail early in cases where presence temporarily overshoots before settling. Consider either removing the ">= expected" fallback (keep waiting for an exact match), or updating the callers/assertions and the method name/docs to explicitly accept "at least" semantics.
| var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount) | |
| ?? all.LastOrDefault(s => s.Members.Count >= expectedMemberCount); | |
| var match = all.LastOrDefault(s => s.Members.Count == expectedMemberCount); |
|
|
||
| private readonly TestWebApplicationFactory _baseFactory; | ||
|
|
||
| public CorsApiTests(TestWebApplicationFactory baseFactory) | ||
| { | ||
| _baseFactory = baseFactory; | ||
| } | ||
|
|
There was a problem hiding this comment.
ProductionTestJwtSecret is duplicated across multiple test classes. To avoid drift and make future changes to FirstRunBootstrapper placeholder validation easier, consider moving this constant into a shared test helper (e.g., TestWebApplicationFactory or a Tests.Support constants class) and referencing it from here.
|
|
||
| private readonly TestWebApplicationFactory _factory; | ||
|
|
||
| public SecurityHeadersApiTests(TestWebApplicationFactory factory) | ||
| { | ||
| _factory = factory; | ||
| } | ||
|
|
There was a problem hiding this comment.
ProductionTestJwtSecret is duplicated across multiple test classes. To avoid drift and keep test configuration consistent, consider extracting this into a shared constant/helper used by all tests that set the environment to Production.
Add entries for 8 merged PRs: MCP API key management UI (#877), MCP structured logging (#879), cloud deployment preparation (#878), connector framework foundation (#880), INT-06 merge resolution (#841), E2E scenario fixes (#822), wave index sweep (#844), CI test fixes (#882). Update backend/frontend implementation snapshots with new components.
Remove the >= fallback in BoardPresenceConcurrencyTests.WaitForPresenceCountAsync that returned stale snapshots with more members than expected during leave scenarios. The fallback caused RapidJoinLeave_EventuallyConsistent to fail because it returned a pre-leave snapshot instead of waiting for the count to decrease. All three review bots (Gemini, Copilot, Codex) flagged this same issue.
- Increase BoardPresenceConcurrencyTests default timeout from 10s to 15s for CI stability (mirrors fix from #882) - Sanitize all user-controlled values in McpTelemetryMiddleware (correlationId, userId, method, path) to resolve CodeQL log-injection alerts (CWE-117) - Sanitize operation names and arguments in McpOperationLogger.BeginOperation to prevent log injection from MCP tool names - Sanitize errorMessage in McpOperationScope.Fail(string) to avoid leaking user content into log sinks - Register OpenTelemetry (AddTaskdeckObservability) in standalone MCP HTTP host so traces and metrics are not silently dropped
Add entries for 8 merged PRs: MCP API key management UI (#877), MCP structured logging (#879), cloud deployment preparation (#878), connector framework foundation (#880), INT-06 merge resolution (#841), E2E scenario fixes (#822), wave index sweep (#844), CI test fixes (#882). Update backend/frontend implementation snapshots with new components.
PRs #822, #841, #877-#880, #882 are still open and not merged to main. Remove documentation that claims these artifacts exist or are shipped: - STATUS.md: remove connector-providers controller, api-keys settings route, MCP structured logging, connector framework service, cloud deployment artifacts, and inflated test totals - TESTING_GUIDE.md: remove post-validation wave section and test count references to unmerged PRs - IMPLEMENTATION_MASTERPLAN.md: narrow item 25 to only the merged #844 - ISSUE_EXECUTION_GUIDE.md: revert #98 delivery annotation (#880 open) - MCP_TOOLING_GUIDE.md: mark structured logging and API key UI as in-progress rather than delivered Addresses ChatGPT P1 and Copilot review findings.
* feat: add MCP structured logging and telemetry middleware Add McpOperationLogger and McpTelemetryMiddleware for structured observability of MCP tool calls and resource access. Includes ActivitySource spans, request/error counters, duration histograms, and ILogger structured properties. Sensitive data (API keys, request bodies) is never logged. Tests verify logging behavior and data protection. * fix: add LogSanitizer utility to prevent log injection (CWE-117) Provides SanitizeForLog (strips control chars/newlines, truncates) and SafeExceptionDescription (type name only) for safe telemetry output. Addresses CodeQL cs/log-forging findings on PR #879. * fix: sanitize user input in MCP telemetry middleware - Sanitize request path before logging to prevent log injection (CodeQL) - Use exception type name instead of message in Activity.SetStatus - Record McpRequestDurationMs in exception path to avoid biased metrics * fix: sanitize exception message in MCP operation scope trace exports Use exception type name for Activity.SetStatus and fixed string for Fail(string) to prevent user content leaking into tracing backends. * fix: remove unused McpApiKeyId tag to prevent PII trap Tag was declared but never populated; removing it avoids future contributors accidentally setting it to raw API key values. * fix: consolidate MCP telemetry registration and add CorrelationIdMiddleware - Extract AddMcpTelemetry() extension to replace three separate AddSingleton<McpOperationLogger> calls across hosting modes - Add CorrelationIdMiddleware to standalone MCP HTTP mode so client X-Request-Id headers are honoured for cross-service tracing * test: add Fail-then-Dispose idempotency tests for McpOperationScope Verify that calling Dispose after Fail(Exception) or Fail(string) does not overwrite the failure with a spurious completion entry. * Potential fix for pull request finding 'CodeQL / Log entries created from user input' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: resolve CI failures for PR #879 - Increase BoardPresenceConcurrencyTests default timeout from 10s to 15s for CI stability (mirrors fix from #882) - Sanitize all user-controlled values in McpTelemetryMiddleware (correlationId, userId, method, path) to resolve CodeQL log-injection alerts (CWE-117) - Sanitize operation names and arguments in McpOperationLogger.BeginOperation to prevent log injection from MCP tool names - Sanitize errorMessage in McpOperationScope.Fail(string) to avoid leaking user content into log sinks - Register OpenTelemetry (AddTaskdeckObservability) in standalone MCP HTTP host so traces and metrics are not silently dropped * fix: sanitize control chars before truncation to preserve [truncated] marker LogSanitizer.SanitizeForLog has a 200-char MaxSanitizedLength that was clipping the "...[truncated]" suffix appended by McpOperationLogger's own 200-char truncation. Fix by adding LogSanitizer.StripControlChars that strips CWE-117 control characters without length truncation, then letting McpOperationLogger handle truncation itself so the marker survives. * fix: use LoggerMessage source generators in McpTelemetryMiddleware (CWE-117) Replace direct _logger.Log* calls with [LoggerMessage]-attributed source- generated methods. This resolves 3 CodeQL CWE-117 findings at lines 67, 92, and 123 where user-controlled values were passed as log parameters. Source-generated log methods produce compile-time structured templates that avoid string interpolation of user-provided values, satisfying CodeQL's taint analysis. Also adds OperationCanceledException handling so client disconnects are recorded as cancellations rather than errors, and includes the transport tag on the MCP error counter for consistency with the success path. * test: add cancellation handling test for McpTelemetryMiddleware Verify that OperationCanceledException from client disconnects is logged as a cancellation (Information level) rather than an error, preventing noise in error metrics from normal client behavior. * fix: move MCP telemetry middleware before API key auth Reorder middleware registration so McpTelemetryMiddleware runs before ApiKeyMiddleware in both co-hosted and standalone MCP HTTP modes. This ensures telemetry captures all /mcp requests including those rejected with 401 for missing/invalid/revoked API keys. * fix: include transport tag on MCP error counters Add McpTransport tag to McpErrors.Add calls in both Fail(Exception) and Fail(string) overloads of McpOperationScope. The success path already included transport on request/duration metrics but the error counter omitted it, preventing transport-level error rate breakdowns. * fix: capture userId after downstream auth in McpTelemetryMiddleware Move the HttpUserContextProvider.UserIdItemKey read to after await _next(context) returns, so API key auth has already populated the user ID in HttpContext.Items before telemetry captures it. --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Summary
Fixes test failures that surface when running the full API test suite in CI, particularly after the cloud-deployment-preparation branch introduces
ValidateProductionSecrets:WithWebHostBuilderfailed because Production mode validates that the JWT secret is not a known placeholder. The test host could not start, throwingInvalidOperationException. Fixed by supplying a non-placeholder JWT secret in each affected test.RapidJoinLeave_EventuallyConsistenttimed out waiting for a presence snapshot with the expected member count. Fixed by increasing timeouts from 10s to 20s and checking all collected snapshots (not just the last one) for the expected count, tolerating out-of-order SignalR delivery on slow CI runners.Root causes
CorsApiTests.Cors_ShouldRejectDevelopmentFallbackOriginsOutsideDevelopmentJwt:SecretKeyviaUseSettingCorsApiTests.Cors_ShouldIgnoreDevelopmentConfiguredAlternateOriginOutsideDevelopmentSecurityHeadersApiTests.SecurityHeaders_ShouldEmitHsts_ForHttpsRequestsOutsideDevelopmentBoardPresenceConcurrencyTests.RapidJoinLeave_EventuallyConsistentTest plan