Problem
Use My Instructions mode should eventually be the personalized, context-aware completion path, but we should not assume screenshots are the only or best source of context. Tabby already walks the Accessibility tree to find focused fields, so nearby AX text may provide useful context without requiring Screen Recording.
Goal
Design and implement a local context pipeline for Use My Instructions mode that can gather relevant surrounding context safely, starting with the least invasive source and falling back to screenshots only if needed.
Proposed Scope
- Explore AX tree context first: nearby labels, editor text, window title, document title, selected text, and relevant sibling/ancestor text nodes.
- Compare AX-derived context against screenshot/OCR context for apps where AX does not expose enough useful text.
- Decide whether screenshot capture is necessary, optional, or only a fallback for specific apps.
- Produce a compact context summary that can be injected into the
Use My Instructions prompt.
- Keep all context processing local and opt-in.
- Add clear privacy copy explaining what context is read and when.
Acceptance Criteria
- There is a documented recommendation for AX tree context vs screenshot/OCR context.
Use My Instructions mode can receive a short context payload when the feature is enabled.
- Context collection does not run for secure fields or per-app disabled apps.
- Context collection has latency bounds and does not block normal typing responsiveness.
- The implementation can be disabled globally from Settings.
Open Questions
- What AX nodes provide the best signal without accidentally collecting too much unrelated text?
- Should screenshot/OCR be a separate explicit setting because it requires Screen Recording?
- Should context be injected as raw excerpts, summaries, or structured key-value facts?
- Should Apple Intelligence and Open Source engines receive context differently?
Problem
Use My Instructionsmode should eventually be the personalized, context-aware completion path, but we should not assume screenshots are the only or best source of context. Tabby already walks the Accessibility tree to find focused fields, so nearby AX text may provide useful context without requiring Screen Recording.Goal
Design and implement a local context pipeline for
Use My Instructionsmode that can gather relevant surrounding context safely, starting with the least invasive source and falling back to screenshots only if needed.Proposed Scope
Use My Instructionsprompt.Acceptance Criteria
Use My Instructionsmode can receive a short context payload when the feature is enabled.Open Questions