Questions on Mobile ATs Screen Reader Capture approach #1367

howard-e · 2025-04-09T15:29:07Z

Background

We need to collect accurate screen reader utterances from mobile devices to help with testers manually collecting test results. That is for:

TalkBack on Android
VoiceOver on iOS

There are two primary approaches under consideration:

1. On-Device Collection

Collect utterances directly from the device through available system APIs, integrations, hooks, etc.

Pros

Direct and real-time collection.
Output should be easier to parse and more reliable.
Better suit the needs of future efforts to automate this collection. Having direct access to on-device accessibility-related APIs and logs would maintain consistent results.

Cons

Feasibility in iOS is currently unknown (due to Apple's sandboxing of VoiceOver on iOS).
Susceptible to breakage from software updates.

2. Video Processing of Screen Recording

Better described in #1315 (comment)

Processing videos where the Assistive Technology's utterances are displayed on screen to be extracted via OCR and collected. These recordings should ideally be at the "system" level (device mic not included) to avoid any outside noise.

Pros

Platform-agnostic
No dependency on internal APIs

Cons

Potentially resource intensive (background worker(s) processing multiple videos at once).
In the long term, we'd like to re-use our solution in the automation system and this may not be well suited. Unknowns around screen recording in automated environments, no access to on-device debugging logs in the case of unexpected failures and others.

Current Status

TalkBack on Android, On-Device

Work started here with success through scripts hosted at bocoup/aria-at-talkback-capture. Is able to capture utterances from Android when a test's "Run Test Setup" button is activated.

VoiceOver on iOS, On-Device

A similar effort has not yet been started but has been planned. Will most likely circumvent OS' "protections" and likely to require discussions and approval directly from Apple.

Video Processing of Screen Recording

Planned. Not yet started.

Questions

Should we make an effort to build a user interface so others can more easily evaluate the aforementioned TalkBack on Android solution?
Should work on the on-device approach be halted until the VoiceOver on iOS prototype is in a place to better facilitate discussions with Apple?
With short-term collection for manual testing being the focus, the video processing route seems to be the fastest. Should any work be started on this as a "just-in-case"?
If 3 is done, would a hybrid approach of the TalkBack collection + video processing for iOS be appropriate in the short term while discussions with Apple continues fully commit to the video processing route until discussions with Apple are resolved?

mcking65 · 2025-04-17T00:48:15Z

@howard-e

Could you add a comment with a high-level outline of what the user flow would be like for the on-device android prototype?

Would users be able to run individual tests on an android device and then go to the test runner on a laptop and fill in assertion verdicts? Could they have the runner open on both at the same time and see AT responses get populated after running each command?

Would the responses be fed to the test runner via browser? Or, would we connect the android device to a laptop and collect the responses via a laptop that is running the test runner?

howard-e · 2025-04-17T20:12:43Z

Sharing follow up thoughts on a workflow here:

User opens test page on computer
User ensures Android device is connected to computer
User can click "Open Test Page" or a new button button saying "Open Test Page on connected Android" (since this would be an Android-only test, the language could be more direct).
- Some appropriate error displayed if device not found.
- If TalkBack isn't already enabled, it can be automatically done at this point.
User activates the "Run Test Setup" button on their Android device.
User is notified that the TalkBack utterances are being captured up until they get to the end of the example or close the window on the device.
Relevant utterances will be getting captured in the background as they move through the test on their device. User completes their testing (getting to end of example or closing the test window on the device)
Utterances are available on connected computer's clipboard which they can then paste in the relevant test page's output textbox

@mcking65 @jscholes does this align with our earlier discussion?

cc @ccanash

howard-e added the discussion label Apr 9, 2025

howard-e assigned ccanash Apr 9, 2025

ccanash added the agenda label Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on Mobile ATs Screen Reader Capture approach #1367

Questions on Mobile ATs Screen Reader Capture approach #1367

howard-e commented Apr 9, 2025 •

edited

Loading

mcking65 commented Apr 17, 2025 •

edited

Loading

Uh oh!

howard-e commented Apr 17, 2025 •

edited

Loading

Uh oh!

Questions on Mobile ATs Screen Reader Capture approach #1367

Questions on Mobile ATs Screen Reader Capture approach #1367

Comments

howard-e commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

1. On-Device Collection

Pros

Cons

2. Video Processing of Screen Recording

Pros

Cons

Current Status

TalkBack on Android, On-Device

VoiceOver on iOS, On-Device

Video Processing of Screen Recording

Questions

mcking65 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

howard-e commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

howard-e commented Apr 9, 2025 •

edited

Loading

mcking65 commented Apr 17, 2025 •

edited

Loading

howard-e commented Apr 17, 2025 •

edited

Loading