Skip to content

Conversation

@addisonbeck
Copy link
Contributor

@addisonbeck addisonbeck commented Oct 28, 2025

🎟️ Tracking

https://bitwarden.atlassian.net/browse/PM-22218
bitwarden/clients#17075

📔 Objective

Implement automated SDK breaking change detection that provides immediate feedback when SDK PRs introduce TypeScript compilation issues for client applications. This system catches breaking changes at SDK development time rather than during client integration.

Previously, breaking changes in the SDK weren't discovered until someone tried to update the SDK version in client repositories, making fixes difficult and disruptive.

This PR implements a cross-repository workflow system that:

  1. Triggers when SDK PRs have successful WASM artifacts built
  2. Downloads and tests the new SDK artifacts against the typechecker in clients
  3. Provides immediate feedback via PR comments and labels

This was designed to be modular: hopefully we can just "slot in" mobile with any appropriate available job in those repos.

Screenshots

An example failure comment. It links to this run. I created a breaking change on purpose and then reverted it.
Screenshot 2025-11-04 at 3 03 36 PM

The comment when no breaking changes are detected.
Screenshot 2025-11-04 at 4 22 17 PM

⏰ Reminders before review

  • Contributor guidelines followed
  • All formatters and local linters executed and passed
  • Written new unit and / or integration tests where applicable
  • Protected functional changes with optionality (feature flags)
  • Used internationalization (i18n) for all UI strings
  • CI builds passed
  • Communicated to DevOps any deployment requirements
  • Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

  • 👍 (:+1:) or similar for great changes
  • 📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
  • ❓ (:question:) for questions
  • 🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed
    issue and could potentially benefit from discussion
  • 🎨 (:art:) for suggestions / improvements
  • ❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
  • 🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
  • ⛏ (:pick:) for minor or nitpick changes

@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from 204dc59 to 011bf58 Compare October 28, 2025 15:56
@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2025

Logo
Checkmarx One – Scan Summary & Details2ce66423-1ff6-4d9c-8f49-94f3de8b8bf5

Great job! No new security vulnerabilities introduced in this pull request

@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch 10 times, most recently from 307ed34 to 235c6d1 Compare October 28, 2025 19:42
@bitwarden bitwarden deleted a comment from bw-ghapp bot Oct 29, 2025
@bitwarden bitwarden deleted a comment from bw-ghapp bot Oct 29, 2025
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from dd85de4 to 9bc7923 Compare November 3, 2025 15:30
@bitwarden bitwarden deleted a comment from bw-ghapp bot Nov 3, 2025
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.95%. Comparing base (045ced5) to head (a49b9d1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
- Coverage   78.95%   78.95%   -0.01%     
==========================================
  Files         296      296              
  Lines       30884    30904      +20     
==========================================
+ Hits        24385    24400      +15     
- Misses       6499     6504       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bitwarden bitwarden deleted a comment from bw-ghapp bot Nov 3, 2025
@bw-ghapp
Copy link
Contributor

bw-ghapp bot commented Nov 3, 2025

🔍 SDK Breaking Change Detection Results

SDK Version: ci-warn-on-breaking-changes (d98d4d8)
Completed: 2025-11-05 20:51:19 UTC
Total Time: 248s

Client Status Details
typescript ✅ No breaking changes detected TypeScript compilation passed with new SDK version - View Details

Breaking change detection completed. View SDK workflow

@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch 5 times, most recently from d69fa9b to ad95525 Compare November 4, 2025 15:25
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch 2 times, most recently from 9bcca58 to 4fb46c0 Compare November 4, 2025 20:05
@bw-ghapp bw-ghapp bot removed the breaking-change label Nov 4, 2025
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch 2 times, most recently from c752bf4 to bda758c Compare November 4, 2025 21:21
@addisonbeck addisonbeck changed the title feat: Add cross-repo breaking change detection system build: implement SDK breaking change detection Nov 4, 2025
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch 3 times, most recently from 0afb087 to 089aee8 Compare November 4, 2025 22:45
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from 58444f0 to df8bb50 Compare November 4, 2025 23:38
@bw-ghapp bw-ghapp bot removed the breaking-change label Nov 4, 2025
- name: Checkout repo
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was recommended as a warning by the workflow linter so I did it.

- name: Retrieve github PAT secrets
id: retrieve-secret-pat
- name: Get Azure Key Vault secrets
id: get-kv-secrets
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BRE let me know that use of the devops bot is deprecated, so I went ahead and refactored to that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, there's no sensible alternative for commit signing yet (ie, workflows that retrieve the devops bot gpg key to setup signing), signing with GHApps is a nightmare. Went through the workflow and that's not the case here, great call on refactoring it! 🏅

@addisonbeck addisonbeck marked this pull request as ready for review November 5, 2025 00:46
@addisonbeck addisonbeck requested a review from a team as a code owner November 5, 2025 00:46
@claude
Copy link

claude bot commented Nov 5, 2025

Claude finished @addisonbeck's task —— View job


PR Review: SDK Breaking Change Detection

Summary of Changes Since Last Review

Two commits since the initial implementation addressed review feedback from @vvolkgang:

Commit 7551f37 - Added explicit GitHub App token permissions (.github/workflows/build-wasm-internal.yml:137)
Commit d98d4d8 - Converted errors to warnings for non-critical failures (.github/workflows/detect-breaking-changes.yml)

Critical Issues

1. Race Condition in Workflow Discovery (Medium Severity)

Location: .github/workflows/detect-breaking-changes.yml:250-259

The workflow uses displayTitle matching with the SDK version string to identify the triggered workflow run. This creates a race condition vulnerability:

JQ_FILTER='.[] | select(.status as $s | (["requested", "queued", "in_progress", "waiting"] | contains([$s])) and (.displayTitle | contains("'"$SDK_VERSION"'")) and (.name | startswith("SDK breaking change check"))) | .databaseId'

Problem: If multiple PRs with similar branch names trigger simultaneously (e.g., ci-warn-on-breaking-changes-v1 and ci-warn-on-breaking-changes-v2), the contains() check could match the wrong workflow run.

Impact: Breaking change detection results could be reported to the wrong PR, causing confusion and potentially allowing breaking changes to slip through.

Recommendation: Use a more robust identification mechanism:

  • Pass a unique correlation ID through the payload
  • Have the client workflow echo this ID in an early step
  • Search for this ID in workflow logs or job summaries
  • Alternatively, use timestamps with tighter bounds (dispatch time + 60s window)

2. Potential Token Expiration During Long Waits (Low-Medium Severity)

Location: .github/workflows/detect-breaking-changes.yml:196, 279

The workflow has a 15-minute timeout but uses the GitHub App token for operations that span this entire duration:

timeout-minutes: 15

GitHub App tokens typically expire after 1 hour, but if the workflow is queued or if multiple sequential operations occur, there's no token refresh mechanism.

Recommendation: Consider adding token refresh logic for operations after the initial dispatch, or document the maximum expected duration.

3. Silent Failure Mode for Comment Updates (Low Severity)

Location: .github/workflows/detect-breaking-changes.yml:384-387

Comment update failures are silently caught and logged as warnings:

gh api --method PATCH repos/${{ github.repository }}/issues/comments/$COMMENT_ID \
  --field body="$FINAL_COMMENT" || {
  echo "::warning::Failed to update comment, but continuing"
}

Problem: If the final comment update fails, the PR will have stale status information showing "in progress" indefinitely, which is misleading.

Recommendation: At minimum, add a fallback that attempts to create a new comment if updating fails, or set the job status to failure if this critical update fails.

Suggested Improvements

4. Hardcoded Values Reduce Reusability

Location: .github/workflows/detect-breaking-changes.yml:103, 100

Several values are hardcoded that should be parameterized for true modularity:

--arg artifact_name "sdk-internal" \
--arg pr_base_ref "main" \

Issue: The PR description states "This was designed to be modular: hopefully we can just 'slot in' mobile," but these hardcoded values would need manual updates for each new client type.

Recommendation: Make these workflow inputs:

  • artifact_name (currently hardcoded to "sdk-internal")
  • pr_base_ref (currently hardcoded to "main")

5. No Validation of Client Workflow Success Criteria

Location: .github/workflows/detect-breaking-changes.yml:279-288

The workflow uses gh run watch --exit-status which returns the workflow's exit code, but there's no verification that the workflow actually performed the breaking change check:

if ! gh run watch $WORKFLOW_RUN_ID --repo $_CLIENT_REPO --compact --exit-status --interval 30; then

Issue: If the client workflow fails for unrelated reasons (infrastructure issues, timeout, etc.), it's reported as "breaking changes detected."

Recommendation: Consider having the client workflow set a specific output or create a specific artifact that indicates whether breaking changes were actually checked vs. other failure modes.

6. Inefficient Retry Logic

Location: .github/workflows/detect-breaking-changes.yml:208-233

The dispatch retry logic retries even for errors that won't be resolved by retrying (e.g., authentication failures, malformed payload):

while [ $RETRY_COUNT -lt $_MAX_RETRIES ]; do
  # Retries all failures indiscriminately
done

Recommendation: Check the HTTP status code and only retry on 5xx errors or network failures. Exit early on 4xx errors (client errors).

7. Missing Timeout Handling in Initial Comment Creation

Location: .github/workflows/detect-breaking-changes.yml:149-192

The initial comment creation step has no timeout or retry logic, but if this fails, the entire workflow continues with an empty COMMENT_ID.

Recommendation: Add retry logic similar to the dispatch step, or fail the workflow if comment creation fails (since it's critical for user visibility).

8. Incomplete Error Context in Workflow Summary

Location: .github/workflows/detect-breaking-changes.yml:236-241

When dispatch fails, the error message lacks actionable information:

echo "❌ **$_CLIENT_REPO**: Failed to trigger - [Manual Check Required](https://github.com/$_CLIENT_REPO)" >> $GITHUB_STEP_SUMMARY

Recommendation: Include the actual error message from the failed gh api call to help with debugging.

9. Label Management Race Condition

Location: .github/workflows/detect-breaking-changes.yml:299-326

The label management step runs with if: always() but doesn't verify the comment was updated or that results are valid:

- name: Manage breaking change labels
  if: always()

Issue: If the workflow is cancelled mid-execution, labels could be in an inconsistent state.

Recommendation: Change condition to if: always() && steps.trigger-dispatch-and-watch.conclusion != 'cancelled'

Security Considerations

10. GitHub App Token Scope Appropriate ✅

Location: .github/workflows/build-wasm-internal.yml:132-137

Good: The token scope was properly restricted per @vvolkgang's feedback:

permission-actions: write

11. Secrets Handled Properly ✅

No secrets are logged or exposed in error messages. Good practice maintained throughout.

12. Cross-Repository Trust Model

Location: .github/workflows/detect-breaking-changes.yml:194-298

The workflow trusts the exit code of the client repository's workflow. This is appropriate for Bitwarden's internal repos but worth noting:

Consideration: If client repositories are ever less trusted, additional verification would be needed.

Performance Considerations

13. Polling Interval Could Be Optimized

Location: .github/workflows/detect-breaking-changes.yml:279

The gh run watch uses a 30-second interval:

gh run watch $WORKFLOW_RUN_ID --repo $_CLIENT_REPO --compact --exit-status --interval 30

Suggestion: This is reasonable, but for faster feedback, 15-20 seconds might be better balanced.

14. Unnecessary Azure Login/Logout for Token Retrieval

Location: .github/workflows/detect-breaking-changes.yml:123-128, 391-392

The workflow logs into Azure just to retrieve GitHub App credentials, then logs out at the end:

Consideration: If these secrets were GitHub environment secrets instead of Azure Key Vault, it would eliminate 2 steps and reduce dependencies. However, this may be an organizational security requirement.

Good Practices Observed

  • Comprehensive error handling with graceful degradation
  • Clear user-facing status comments with progress updates
  • Proper use of workflow outputs for data passing between steps
  • GitHub workflow commands (::warning::, ::error::) for enhanced visibility
  • Reusable workflow design with workflow_call
  • Status comment includes direct links to relevant workflow runs
  • Breaking change label automation

Action Items for Author

Must Fix

  1. Address race condition in workflow discovery - Implement unique correlation ID or tighter timestamp-based matching
  2. Handle comment update failures more robustly - Don't leave PRs in perpetual "in progress" state

Should Fix

  1. Make artifact_name and pr_base_ref parameterizable for true modularity
  2. Distinguish between breaking changes vs. infrastructure failures in status reporting
  3. Add retry logic to initial comment creation or fail if it cannot be created

Nice to Have

  1. Implement smart retry logic that exits early on client errors
  2. Include actual error messages in failure summaries for debugging
  3. Refine label management cancellation handling
  4. Consider optimizing polling interval to 15-20s for faster feedback

Documentation

  1. Document expected token lifetime vs. workflow duration
  2. Document the cross-repository trust model and assumptions

Changes Since Last Review

Fixed Issues:

  • ✅ GitHub App token now has explicit permissions (addressed vvolkgang's feedback)
  • ✅ Non-critical failures converted to warnings (addressed vvolkgang's feedback)

New Issues: None introduced by the recent commits.

Status of Previous Review Comments:

  • The author has successfully addressed the feedback from @vvolkgang's review
  • The modular design question around repository_dispatch vs. workflow_dispatch is still valid but acknowledged as an architectural choice

@addisonbeck
Copy link
Contributor Author

addisonbeck commented Nov 5, 2025

I'm not particularly interested in any of Claude's feedback but @dereknance let me know if you are

EDIT: I wasn't trying to get Claude to not give any feedback 😅 I was just saying that what was there at the time wasn't particularly valid.

@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from df8bb50 to 1d675df Compare November 5, 2025 01:06
- Add dedicated detect-breaking-changes.yml workflow with matrix strategy
- Implement cross-repository coordination using repository_dispatch
- Add GitHub App authentication with Azure Key Vault integration
- Create synchronous monitoring with gh run watch --exit-status
- Add comprehensive PR comment system with status tracking
- Include automatic breaking-change label management
- Support workflow_call integration with build-wasm-internal.yml

Provides immediate feedback on TypeScript breaking changes when SDK PRs
are created, catching issues before client integration attempts.

Resolves: PM-22218
@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from 1d675df to a49b9d1 Compare November 5, 2025 01:11
- name: Retrieve github PAT secrets
id: retrieve-secret-pat
- name: Get Azure Key Vault secrets
id: get-kv-secrets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, there's no sensible alternative for commit signing yet (ie, workflows that retrieve the devops bot gpg key to setup signing), signing with GHApps is a nightmare. Went through the workflow and that's not the case here, great call on refactoring it! 🏅

RETRY_COUNT=0
DISPATCH_SUCCESS=false
while [ $RETRY_COUNT -lt $_MAX_RETRIES ]; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Unless I'm missing something you can skip retries for gh cli calls in general, when they fail we have bigger problems :feelsgood:

We only need it in the "fetch workflow ID" part because (1) the dispatch call doesn't return the ID and (2) lead time to Run start varies a lot, between triggering a workflow and it actually running I've seen it take between 1-30s.

🤔 🎨 Saw the npm i usecase, we could turn this into a reusable action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this off for now, but if you think now is the time to make that action let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need!

"no-breaking-changes")
echo "Removing breaking-change label from PR #$PR_NUMBER"
gh issue edit $PR_NUMBER --remove-label "breaking-change" --repo ${{ github.repository }} || {
echo "⚠️ Label may not exist or failed to remove, but continuing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 You can increase visibility of these by showing them in the Run Summary with workflow commands echo "::warning::Label may not exist or failed to remove, but continuing".

I tend to only use them for warnings and errors to keep the summary light and focused on what needs to be actioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped for this and a few other warnings on d98d4d8

client_payload: $client_payload
}')
if echo "$DISPATCH_PAYLOAD" | gh api repos/$_CLIENT_REPO/dispatches \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Only relevant difference I noticed from my PoC (and all of my workflows in general), I only use workflow_dispatch.

https://github.com/bitwarden/sdk-internal/pull/407/files#diff-990269d44b14b14a290b6d577fde890babb7b7f98e88f9e33753db2d5fca4570R196

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repository_dispatch is more event driven and so I suppose that is where my intuition took me. Both are valid and could be abstracted well enough to serve n hooks for repos we want to detect breaking changes in.

Copy link
Member

@vvolkgang vvolkgang Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Had to brush off on the shortcomings of repository_dispatch, supporting both triggers works for the mobile side but I would still suggest using workflow_dispatch for yours too:

  1. repository_dispatch allowing multiple workflows handling the same event and the same workflow handling multiple events adds to the list of things someone else has to validate when stepping in to troubleshoot issues
  2. repository_dispatch only triggers workflows in the default branch, it's harder to test new changes. workflow_dispatch solved it.
  3. workflow_dispatch is the same trigger used to manually trigger workflows, helps with testing and implementing new changes - you'll find a "Run Workflow" button in Github (e.g.), the vscode GitHub Actions extension leverages this too allowing you to quickly trigger runs in the current branch, pretty handy.

Easy to transition, add all of these as workflow_trigger inputs (e.g.) and then update this workflow to trigger a gh workflow run and send your json payload for the inputs (example). GHApp step will need permission-actions: write instead of contents.

@addisonbeck addisonbeck force-pushed the ci-warn-on-breaking-changes branch from 0aa626c to d98d4d8 Compare November 5, 2025 20:40
@addisonbeck addisonbeck requested a review from vvolkgang November 5, 2025 20:53
Comment on lines +40 to +42
contents: read
actions: write
pull-requests: write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that didn't click in the first review, these permissions are not being used. You can either remove them or - my suggestion - transition the steps that update the PR in this repo to use ${{ secrets.GITHUB_TOKEN }} and removing the sdk-internal repository from the GHApp step.

echo "✅ Final comment updated"
- name: Log out from Azure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this previously, you can logout right after the get-kv-secrets step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants