Skip to content

[PM-35062] llm: Add converting-xctest-to-swift-testing skill#2539

Merged
KatherineInCode merged 13 commits into
mainfrom
pm-35062/swift-testing-playbook
Apr 23, 2026
Merged

[PM-35062] llm: Add converting-xctest-to-swift-testing skill#2539
KatherineInCode merged 13 commits into
mainfrom
pm-35062/swift-testing-playbook

Conversation

@KatherineInCode
Copy link
Copy Markdown
Contributor

@KatherineInCode KatherineInCode commented Apr 13, 2026

🎟️ Tracking

PM-35062

📔 Objective

Adds a project-level Claude skill for converting XCTest files to Swift Testing. The skill covers Bitwarden-specific patterns (processor, coordinator, and service tests), includes a do-not-convert checklist (ViewInspector, snapshot, UI automation tests), and bundles a copy of steipete's Swift Testing playbook and full Apple API documentation, with modifications, as references.

@KatherineInCode KatherineInCode added the ai-review-vnext Request a Claude code review using the vNext workflow label Apr 13, 2026
@github-actions github-actions Bot added the t:llm Change Type - LLM related change (e.g. CLAUDE.md files) label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the addition of a project-level Claude Code skill for converting XCTest files to Swift Testing. The change consists of a new 508-line SKILL.md and a single-line registration entry in .claude/CLAUDE.md; there is no production, test, or dependency code involved. Technical guidance (BitwardenTestCase UI.animated coupling, assertAsyncThrows handling, struct-per-test isolation) was spot-checked against the codebase and is accurate. Prior review feedback across the PR's 12+ iterative commits has been incorporated, including the assertion-table fixes, the Equatable requirement for typed error values, the test_ prefix cleanup, and the exclusion of alert/loading overlay tests from conversion.

Code Review Details

No actionable findings. The skill documentation is internally consistent, matches the codebase's real test-helper patterns, and appropriately scopes which test files should remain in XCTest (ViewInspector, snapshot, UI automation, performance, Objective-C, alert-dependent, and loading-overlay-dependent).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

Logo
Checkmarx One – Scan Summary & Detailsae8483f3-b5fe-44f4-b163-6499b2a46443

Great job! No new security vulnerabilities introduced in this pull request

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.08%. Comparing base (b479386) to head (f104d61).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2539      +/-   ##
==========================================
- Coverage   87.08%   86.08%   -1.01%     
==========================================
  Files        1872     2116     +244     
  Lines      165835   181513   +15678     
==========================================
+ Hits       144417   156253   +11836     
- Misses      21418    25260    +3842     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@KatherineInCode KatherineInCode marked this pull request as ready for review April 13, 2026 18:38
@KatherineInCode KatherineInCode requested review from a team as code owners April 13, 2026 18:38

- [`Copyable`](https://developer.apple.com/documentation/Swift/Copyable?changes=_8)
- [`CustomDebugStringConvertible`](https://developer.apple.com/documentation/Swift/CustomDebugStringConvertible?changes=_8)
- [`CustomStringConvertible`](https://d
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. LLM wise we need to figure something else that isn't 14k lines of unprocessed crawled docs.
  2. We need to be careful with these, happened to be a referral link but could as easily be a prompt injection attempt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure what a good solution would be when it's information the model hasn't been trained on already, but having the information contained in the docs is useful (and per the blog post, it can't just query Apple's online docs).
  2. Agreed, and part of the reason to bring this into the repo rather than referring to the original gist was to avoid potential future prompt injection issues.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oi vey... We're between a rock and a hard place here, huh?
Apple won't let their docs be crawled but we need Claude to have the most reliable and up-to-date patterns.

Couple questions:

  1. Was this the raw file we tried to add to our repo? file-swift-testing_api-md
  2. Would you be open to connecting with me this sprint to see how/if I could break this down into byte-sized skills/references for Claude Code to consume? I don't have Swift talents (yet), but I'm willing to help y'all out wherever I can for you to succeed with Claude. Lemme know

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes. As noted, my original idea was to actually just have this skill point at steipete's playbook in general, which would also get that raw file, but that would potentially introduce a supply chain attack, so it was suggested I bring that into our repo.
  2. Sure

That said, I have to wonder how much of this is getting updated into each successive Claude model. Steinberger wrote that guide back when Swift testing was very new, so you could assume the model hadn't been trained on the API and such yet. It wouldn't surprise me if Anthropic has found a way to crawl Apple's docs (after all, Steinberger wrote a tool for it) in terms of sourcing training data. Still runs into the problem of "we can't point at new docs easily", though.

| Pattern | Why |
|---------|-----|
| **ViewInspector tests** | `imports ViewInspector`; requires XCTest infrastructure |
| **Snapshot tests** | Functions named `disabletest_*`, uses `assertSnapshot` or `SnapshotTesting`; depends on `BitwardenTestCase` |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n🧊


### Migration Action Items
- [ ] Ensure all developer machines and CI runners are on macOS 15+ and Xcode 16+.
- [ ] For projects supporting Linux/Windows, add the `swift-testing` SPM package to your `Package.swift`. It's bundled in Xcode and not needed for Apple platforms.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Your skill is doing a much better job addressing our needs, feels like the value add of this playbook could be consolidated and merged into it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Part of the reason I like having the playbook more or less wholesale is that when he updates it with new information from e.g. WWDC, we can more easily update our copy, rather than having to look over the diff

Copy link
Copy Markdown
Contributor

@theMickster theMickster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference file of over 14K lines of code is impossible to peer review. I also doubt that Claude will properly consume that massive file either.

Please break the work into smaller PRs and smaller reference files. Thank you much!

theMickster
theMickster previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Contributor

@theMickster theMickster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.

Please note that a local Claude Code review produced the following suggestions. I'm adding it because I am field testing a potential addition to our plugin marketplace and am interested in feedback as to whether or not the output is correct. I'm not super concerned about whether or not Claude chose the correct level/emoji at the moment, more focused on content correctness.

Please let me know what you think of the feedback.

Thanks!

Local multi-agent review [PM-35062] llm: Add converting-xctest-to-swift-testing skill (#2539)

Date: 2026-04-22 | Reviewed by: Claude Code (Opus 4.7)

Summary

Severity Count
🛑 Blocker 0
⚠️ Important 0
♻️ Refactor 0
💡 Suggestion 4

Documentation-only PR adding a project-level Claude Code skill (506-line markdown) plus a 1-line registration in .claude/CLAUDE.md. No production code changes, no security impact. The skill will be executed mechanically by an LLM, so the findings target rules or examples that could mislead a mechanical pass. Finding #1 is the one worth prioritizing.

Findings

💡 Suggestions

BitwardenTestCasestruct rule drops base-class setUp defaults without flagging them

.claude/skills/converting-xctest-to-swift-testing/SKILL.md:14-20

Details

BitwardenTestCase.setUp() sets UI.animated = false and UI.sizeCategory = .large; class-level setUp() runs UI.applyDefaultAppearances(). The exclusion table covers snapshot/ViewInspector/UI/perf/ObjC tests, and the Class → Struct rule (file lines 51-53) then unconditionally drops BitwardenTestCase inheritance.

Current processor/coordinator/service tests rarely exercise the code paths that read UI.animated / UI.sizeCategory, so the practical blast radius today is small. The concern is that the skill is executed mechanically by an LLM — any future test that does reference those globals would silently lose the override with no guard against it. Add a one-line precondition to Step 2 or the exclusion table: "If the file reads UI.animated / UI.sizeCategory directly, verify the defaults are still satisfied before dropping BitwardenTestCase inheritance." Highest-priority of the suggestions because it's the only one that can cause a silent behavioral change, not just doc drift.

"Specific Error value" match requires Equatable conformance

.claude/skills/converting-xctest-to-swift-testing/SKILL.md:247

Details

#expect(throws: BitwardenTestError.example) (error-value form) requires the error type to conform to Equatable. A new error enum with associated values won't auto-conform and will fail to compile when a converter follows the row literally. Add a one-line note naming the precondition so readers don't chase a confusing compile error.

Parameterized examples retain test prefix, contradicting Step 5's own rename rule

.claude/skills/converting-xctest-to-swift-testing/SKILL.md:187,292

Details

Step 5 tells readers to remove the test_ prefix when adding @Test. The examples func testUserCount_minimum() (line 187) and func testFlavorContainsNuts(...) (line 292) keep the prefix. An LLM consuming this skill will pattern-match on the examples as much as on the rule; the inconsistency produces drift. Rename to userCount_minimum / flavorContainsNuts, or annotate them as illustrative side-by-side comparisons.

assertAsyncThrows row is missing the third "Notes" column cell

.claude/skills/converting-xctest-to-swift-testing/SKILL.md:250

Details

The assertion-conversion table is three columns (XCTest | Swift Testing | Notes), but the assertAsyncThrows row only supplies two cells, so the row renders malformed. Add an empty | | cell or a brief note.

Comment thread .claude/CLAUDE.md
| `planning-ios-implementation` | "plan implementation", "design approach", "architecture plan" |
| `implementing-ios-code` | "implement", "write code", "add screen", "create feature" |
| `testing-ios-code` | "write tests", "add test coverage", "unit test" |
| `converting-xctest-to-swift-testing` | "convert to Swift Testing", "migrate XCTest", "xctest to swift testing" |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ In the skill frontmatter, we have different casing in the wording. I doubt it matters, but something I saw.
In the long run, I don't think that this table will be necessary in CLAUDE.md because this feels like human documentation that's a burden to maintain. I also have my doubts as to whether or not this helps Claude whatsoever in its progressive disclosure.

@SaintPatrck Are we doing this type of documenting in other places mapping skills to triggers?

Skill frontmatter

description: Convert an XCTest file to Swift Testing framework. Use when asked to "convert to Swift Testing", "migrate XCTest", "convert test file", "xctest to swift testing", "migrate tests to Swift Testing", or when explicitly asked to convert existing XCTest-based tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Swift Testing" and "XCTest" are proper nouns, at least, so the capitalization nominally makes sense 🤔

Are you thinking just the front matter is sufficient for progressive disclosure?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda thought that the frontmatter is sufficient but wanted Patrick to chime in too.

Copy link
Copy Markdown
Contributor

@SaintPatrck SaintPatrck Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theMickster the table is an additional "nudge" to encourage usage of the repo-local skills in appropriate situations. It is present in the Android repo also. After doing evals I saw a significant increase in hit-rates when the repo-local skills are present in CLAUDE.md. The improvement was enough to warrant the small maintenance burden, imo. The most we would realistically be doing for maintenance is adding/removing skills.

The table's value manifests even more when agents are executed in a parent directory and navigate down into this project directory (e.g., run /plan-implement-review from a "workspace" type of environment). Not only do they immediately get all of the project specific guidance from CLAUDE.md, they also get a nice "cheatsheet" of skills that can aid in their task(s).

🌱 That being said, my tests were done several weeks ago and the frontier models have gotten more reliable in regards to skill discovery, so it would be worth re-running those evals with the latest models. The new /skill-creator should make that relatively easy. Out of scope for this PR, obviously, but something we can easily follow-up on.

💭 Also, when_to_use was recently introduced, which may be a better place to improve hit-rates in place of this table.

TLDR; Keep. Re-assess value after adding when_to_use to skill frontmatter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaintPatrck
Makes sense. Sure, but until then, will you also please add these block to the clients and server repos to match? That way we are consistent across the board?

@KatherineInCode
Copy link
Copy Markdown
Contributor Author

@theMickster The things that review skill generated up were reasonable to fix, so I've done that to clean it up a bit. Something like that skill would be really useful to run locally; is that more generally available, or are you still working on it?

@KatherineInCode KatherineInCode merged commit e790288 into main Apr 23, 2026
36 of 37 checks passed
@KatherineInCode KatherineInCode deleted the pm-35062/swift-testing-playbook branch April 23, 2026 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review-vnext Request a Claude code review using the vNext workflow t:llm Change Type - LLM related change (e.g. CLAUDE.md files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants