Skip to content

feat(core,mcp,web-integration): enhance aiKeyboardPress to support key combinations #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

xnmeet
Copy link
Contributor

@xnmeet xnmeet commented Jun 3, 2025

Overview

usage

await agent.aiKeyboardPress(["Meta","A"]);
await agent.aiKeyboardPress("Meta+A");

Features Implemented
✅ AI-powered element location: keyBoardPress uses the same intelligent element detection as other actions
✅ YAML workflow support: keyBoardPress actions can be included in automated workflows
✅ Caching support: Leverages existing task caching mechanisms

Testing
Comprehensive test coverage has been added:

Unit tests: Core type validation and plan building logic
Integration tests: End-to-end keyBoardPress execution scenarios

Validation Steps
To verify this implementation:

Manual testing:

const agent = new PageAgent(page);
await agent.aiKeyboardPress(["Meta","A"]);

Backward Compatibility
✅ No breaking changes: All existing APIs remain unchanged
✅ Type safety: Full TypeScript support with proper type inference

Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for midscene ready!

Name Link
🔨 Latest commit e0b6b44
🔍 Latest deploy log https://app.netlify.com/projects/midscene/deploys/683eff3400b72200083b112e
😎 Deploy Preview https://deploy-preview-799--midscene.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@xnmeet
Copy link
Contributor Author

xnmeet commented Jun 9, 2025

is there anyone can help review this pr @zhoushaw @yuyutaotao

@quanru
Copy link
Collaborator

quanru commented Jun 11, 2025

I understand that the combination keys can be used directly on the page object. Have you encountered a situation that cannot be resolved with this method?

@xnmeet
Copy link
Contributor Author

xnmeet commented Jun 12, 2025

I understand that the combination keys can be used directly on the page object. Have you encountered a situation that cannot be resolved with this method?

This change targets two scenarios: the first is to use AI to locate the area and then trigger the key combination. The second is to eliminate the accuracy of some commands caused by AI method reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants