You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The initial implementation works well for text that is all in one page (Chrome or apps with webviews like Gospel Library), broken down by paragraph into small AccessibilityNodeInfos that can be accessed from rootInActiveWindow. (#6 is about communicating this to users)
There are many use-cases I would like to support. Here are some thoughts:
Use cases
(See sample code for an example of how to print a11y tree)
Scrolling apps where text is in one giant node or not exposed in a11y tree (Google Docs, Drive, Gmail)
If we can't access a11y tree, we might use OCR and swipe gestures to scroll. One difficulty is knowing if we should continue scrolling when images filling the whole screen or if we miss scrolling and audio goes beyond the current screen.
Create a generic "ctrl + f" macro functionality to find text in the current app?
PDFs
Could be very useful. OCR and swipe gestures. In addition to images, another complication is knowing which way to swipe (how to handle multiple columns on the same page)
Page-turning apps (Hoopla, Libby, Google Play Books)
Some may provide good a11y info, but only for the current page. We can probably turn page quite reliably with just a tap or swipe gesture. Also may support NEXT_PAGE action.
How to know which apps support page turning (can AccessibilityService query support for NEXT_PAGE?)
How to decide when to turn the page? (Once we've matched text on the page, turn page immediately to stay ahead? Wait until audio goes to next page and try to keep up? What if there are images?)
Kindle (page-turn or continuous)
This may be lower priority because some Kindle books have WhisperSync with Audible to automatically sync text/audio. Live Scroll would extend support for using different versions of the media and work for virtually any title.
Page-turn is same as other page-turning apps and provides a11y info for current page.
Continuous scroll provides no text a11y info (confirm if implementing), but we could scroll with OCR and gestures.
We can know we're in page-turn or continuous mode from package name (Kindle) and content description in KRFView (has text for page, empty for continuous). Confirm if implementing.
Same concerns about images and how to recover if we fall behind.
Photos (panning instead of scrolling)
Very low priority, but it could be neat.
Requires OCR and gestures.
Solutions
OCR
How to decide where to search for text.
Everywhere except in Live Caption box? Exclude system headers?
May be helpful to use window changed a11y events to know when current screen has changed (maybe user switched apps so we should start or stop trying to use OCR to scroll).
Gestures
We may prefer sending more direct commands if possible (show_on_screen, next paragraph, scroll, etc) because they're more targeted and probably efficient.
Cons: could interfere with other apps. If we fall behind audio, it's hard to recover.
How to determine how to swipe/tap?
Curated package/view list? (Specific apps to allow or block)
ML model to determine given a screenshot and/or a11y tree how to interact with a given app?
Fallback after attempting to use a11y tree.
Need to notify user if this is happening so we're not needlessly trying to scroll when user doesn't want scrolling.
Can a11y service determine if current screen is scrollable or supports page-turn?
How to determine where to swipe?
Start from where word is matched and swipe to top of screen?
Sample code
// Logging current a11y tree (very similar to getNodesContainingWord)privatefunprintAccessibilityTree(root:AccessibilityNodeInfo, level:Int) {
if (root ==null) returnLog.d(tag, "Node at level %s with childCount %s: %s".format(level, root.getChildCount(), root))
for (i in1..root.childCount) {
root.getChild(i -1)?.let { printAccessibilityTree(it, level +1) }
}
}
// From AccessibilityService:
printAccessibilityTree(this.rootInActiveWindow, 0)
privateGestureDescriptionadvanceTextGestureDescription() {
if (currGestureRegion.equals(paginatedAppGestureRegion)) {
returntapRightSideOfScreen(); // swipeLeftGestureDescription();
} elseif (currGestureRegion.equals(scrollableAppGestureRegion)) {
returnswipeUpGestureDescription(currGestureRegion.bottom);
}
returnnull;
}
privateGestureDescriptionswipeUpGestureDescription(intinitialY) {
// ** Swipe up (e.g. to scroll down). */Pathpath = newPath();
path.moveTo(currGestureRegion.left, initialY);
path.lineTo(currGestureRegion.left, currGestureRegion.top);
StrokeDescriptionstrokeDescription =
newStrokeDescription(path, /*startTime=*/0L, /*duration (in ms)=*/500L);
returnnewGestureDescription.Builder().addStroke(strokeDescription).build();
}
privateGestureDescriptionswipeLeftGestureDescription() {
// ** Swipe left (e.g. to turn to next page). */PathlongSlowPath = newPath();
longSlowPath.moveTo(900, 1000);
longSlowPath.lineTo(200, 1000);
PathflickPath = newPath();
flickPath.moveTo(200, 1000);
flickPath.lineTo(100, 1000);
StrokeDescriptionstrokeDescription =
newStrokeDescription(
longSlowPath, /*startTime=*/0L, /*duration (in ms)=*/400L, /* willContinue= */true);
strokeDescription.continueStroke(
flickPath, /*startTime=*/0L, /*duration (in ms)=*/100L, /* willContinue= */false);
returnnewGestureDescription.Builder().addStroke(strokeDescription).build();
}
privateGestureDescriptiontapRightSideOfScreen() {
// ** Tap right side of screen (e.g. to turn to next page). */Pathpath = newPath();
path.moveTo(screenHeight / 2, 5 * (screenWidth / 6));
StrokeDescriptionstrokeDescription =
newStrokeDescription(path, /*startTime=*/0L, /*duration (in ms)=*/10L);
returnnewGestureDescription.Builder().addStroke(strokeDescription).build();
}
The text was updated successfully, but these errors were encountered:
The initial implementation works well for text that is all in one page (Chrome or apps with webviews like Gospel Library), broken down by paragraph into small
AccessibilityNodeInfo
s that can be accessed fromrootInActiveWindow
. (#6 is about communicating this to users)There are many use-cases I would like to support. Here are some thoughts:
Use cases
(See sample code for an example of how to print a11y tree)
Solutions
OCR
Gestures
GestureDescription
code below.How to determine how to swipe/tap?
How to determine where to swipe?
Sample code
The text was updated successfully, but these errors were encountered: