Ml 685 agent commands move mouse to target without click#28
Conversation
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: Samir Mlika <105347215+mlikasam-askui@users.noreply.github.com>
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
…patible-with-25-2-1-remote-device-controller feat: #ML-718 add support for new RemoteDeviceController
adi-wan-askui
left a comment
There was a problem hiding this comment.
Nice :) It is coming together
| @@ -94,7 +94,7 @@ def click(self, instruction: Optional[str] = None, button: Literal['left', 'midd | |||
| def __mouse_move(self, instruction: str, model_name: Optional[str] = None) -> None: | |||
There was a problem hiding this comment.
What about also renaming instruction in all of the public api that used ModelRouter.locate() underneath to locator?
There was a problem hiding this comment.
I thought about it, too.
but I wouldn't do this. I want to avoid that the users have tu deal with the concept of the locator.
We (PM and me) want to expose as less internals alá GroundingModel and the locator concept, ..., to the users. So they have not to learn new concepts.
There was a problem hiding this comment.
I agree that we don't want the user to learn new concepts if we can omit it. But in this case I think we actually have different concepts but just gave them the same name which I think is even less optimal.
- While
act()actually accepts aninstruction(either as high levelgoal, e.g., Book a flight from Frankfurt to San Francisco, or step by step description of what to do which works better in my experience than just agoal, e.g., Open skyscanner.com, search for flights from Frankfurt to San Francisco, choose the first one etc.), get()(orask()) actually needs a query, e.g., Is there a flight cheaper than $ 1000?.- Last but not least,
click(),mouse_move()etc. are actually about describing the element that you want to interact with, e.g., click on the magnifying glass icon. That is why I would go for different names for the parameters:
So I would propose going for the following
act()⟶instruction(like it better thangoalas it makes no assumption about how abstract it needs to be)ask()⟶queryclick(),mouse_move()etc. ⟶locator(which for me is a description of the element that you would like to interact with which may also be an image)
Also the term locator is not super new to users that are engineers working on automation (RPA, test etc.), see
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
…k' of github.com:askui/vision-agent into ML-685-Agent-Commands-Move-mouse-to-target-without-click
|
I merge this to AI element. and then test everything through. so we can move in direction of the release |
…kui-inference' into ML-685-Agent-Commands-Move-mouse-to-target-without-click
dd6651e
into
ML-649-ask-ui-integration-add-ai-element-command-via-askui-inference
Hello,
I added a
mouse_movecommand