Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support apps without reliable a11y trees #7

Open
KyleFin opened this issue Oct 29, 2021 · 0 comments
Open

Support apps without reliable a11y trees #7

KyleFin opened this issue Oct 29, 2021 · 0 comments

Comments

@KyleFin
Copy link
Owner

KyleFin commented Oct 29, 2021

The initial implementation works well for text that is all in one page (Chrome or apps with webviews like Gospel Library), broken down by paragraph into small AccessibilityNodeInfos that can be accessed from rootInActiveWindow. (#6 is about communicating this to users)

There are many use-cases I would like to support. Here are some thoughts:

Use cases

(See sample code for an example of how to print a11y tree)

  • Scrolling apps where text is in one giant node or not exposed in a11y tree (Google Docs, Drive, Gmail)
    • If we can't access a11y tree, we might use OCR and swipe gestures to scroll. One difficulty is knowing if we should continue scrolling when images filling the whole screen or if we miss scrolling and audio goes beyond the current screen.
    • Create a generic "ctrl + f" macro functionality to find text in the current app?
  • PDFs
    • Could be very useful. OCR and swipe gestures. In addition to images, another complication is knowing which way to swipe (how to handle multiple columns on the same page)
  • Page-turning apps (Hoopla, Libby, Google Play Books)
    • Some may provide good a11y info, but only for the current page. We can probably turn page quite reliably with just a tap or swipe gesture. Also may support NEXT_PAGE action.
    • How to know which apps support page turning (can AccessibilityService query support for NEXT_PAGE?)
    • How to decide when to turn the page? (Once we've matched text on the page, turn page immediately to stay ahead? Wait until audio goes to next page and try to keep up? What if there are images?)
  • Kindle (page-turn or continuous)
    • This may be lower priority because some Kindle books have WhisperSync with Audible to automatically sync text/audio. Live Scroll would extend support for using different versions of the media and work for virtually any title.
    • Page-turn is same as other page-turning apps and provides a11y info for current page.
    • Continuous scroll provides no text a11y info (confirm if implementing), but we could scroll with OCR and gestures.
      • We can know we're in page-turn or continuous mode from package name (Kindle) and content description in KRFView (has text for page, empty for continuous). Confirm if implementing.
    • Same concerns about images and how to recover if we fall behind.
  • Photos (panning instead of scrolling)
    • Very low priority, but it could be neat.
    • Requires OCR and gestures.

Solutions

  • OCR

    • How to decide where to search for text.
      • Everywhere except in Live Caption box? Exclude system headers?
    • May be helpful to use window changed a11y events to know when current screen has changed (maybe user switched apps so we should start or stop trying to use OCR to scroll).
  • Gestures

    • We may prefer sending more direct commands if possible (show_on_screen, next paragraph, scroll, etc) because they're more targeted and probably efficient.
    • See dispatchGesture documentation and sample GestureDescription code below.
    • Pros: work with any app
    • Cons: could interfere with other apps. If we fall behind audio, it's hard to recover.
  • How to determine how to swipe/tap?

    • Curated package/view list? (Specific apps to allow or block)
      • ML model to determine given a screenshot and/or a11y tree how to interact with a given app?
    • Fallback after attempting to use a11y tree.
      • Need to notify user if this is happening so we're not needlessly trying to scroll when user doesn't want scrolling.
    • Can a11y service determine if current screen is scrollable or supports page-turn?
  • How to determine where to swipe?

    • Start from where word is matched and swipe to top of screen?

Sample code

  // Logging current a11y tree (very similar to getNodesContainingWord)
  private fun printAccessibilityTree(root: AccessibilityNodeInfo, level: Int) {
        if (root == null) return
        Log.d(tag, "Node at level %s with childCount %s: %s".format(level, root.getChildCount(), root))
        for (i in 1..root.childCount) {
            root.getChild(i - 1)?.let { printAccessibilityTree(it, level + 1) }
        }
    }

  // From AccessibilityService:
  printAccessibilityTree(this.rootInActiveWindow, 0)
  private GestureDescription advanceTextGestureDescription() {
    if (currGestureRegion.equals(paginatedAppGestureRegion)) {
      return tapRightSideOfScreen(); // swipeLeftGestureDescription();
    } else if (currGestureRegion.equals(scrollableAppGestureRegion)) {
      return swipeUpGestureDescription(currGestureRegion.bottom);
    }
    return null;
  }

  private GestureDescription swipeUpGestureDescription(int initialY) {
    // ** Swipe up (e.g. to scroll down). */
    Path path = new Path();
    path.moveTo(currGestureRegion.left, initialY);
    path.lineTo(currGestureRegion.left, currGestureRegion.top);
    StrokeDescription strokeDescription =
        new StrokeDescription(path, /*startTime=*/ 0L, /*duration (in ms)=*/ 500L);
    return new GestureDescription.Builder().addStroke(strokeDescription).build();
  }

  private GestureDescription swipeLeftGestureDescription() {
    // ** Swipe left (e.g. to turn to next page). */
    Path longSlowPath = new Path();
    longSlowPath.moveTo(900, 1000);
    longSlowPath.lineTo(200, 1000);

    Path flickPath = new Path();
    flickPath.moveTo(200, 1000);
    flickPath.lineTo(100, 1000);

    StrokeDescription strokeDescription =
        new StrokeDescription(
            longSlowPath, /*startTime=*/ 0L, /*duration (in ms)=*/ 400L, /* willContinue= */ true);
    strokeDescription.continueStroke(
        flickPath, /*startTime=*/ 0L, /*duration (in ms)=*/ 100L, /* willContinue= */ false);

    return new GestureDescription.Builder().addStroke(strokeDescription).build();
  }

  private GestureDescription tapRightSideOfScreen() {
    // ** Tap right side of screen (e.g. to turn to next page). */
    Path path = new Path();
    path.moveTo(screenHeight / 2, 5 * (screenWidth / 6));
    StrokeDescription strokeDescription =
        new StrokeDescription(path, /*startTime=*/ 0L, /*duration (in ms)=*/ 10L);
    return new GestureDescription.Builder().addStroke(strokeDescription).build();
  }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant