Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement model cursor for visual feedback #760

Open
abrichr opened this issue Jun 16, 2024 · 6 comments
Open

Implement model cursor for visual feedback #760

abrichr opened this issue Jun 16, 2024 · 6 comments
Labels
$ bounty $ Please suggest a price range 🙏 💎 Bounty enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@abrichr
Copy link
Contributor

abrichr commented Jun 16, 2024

Feature request

We want to be able to give the model the ability to:

  1. paint a red dot on its suggested target location
  2. look at the screenshot with the dot on it,
  3. optionally self correct.

Thank you @LunjunZhang for the suggestion 🙏

This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.

Motivation

Correct errors, e.g. missed segmentations.

Possibly related: https://arxiv.org/abs/2406.09403:

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn.
...
Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.

@abrichr abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed $ bounty $ Please suggest a price range 🙏 💎 Bounty and removed 💎 Bounty labels Jun 16, 2024
@abrichr
Copy link
Contributor Author

abrichr commented Jun 17, 2024

/bounty $1000

Copy link

algora-pbc bot commented Jun 17, 2024

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

  1. Start working: Comment /attempt #760 with your implementation plan
  2. Submit work: Create a pull request including /claim #760 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

  • 🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to OpenAdaptAI/OpenAdapt!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @Amanullah1002 Jun 17, 2024, 3:18:43 AM WIP
🔴 @Subh231004 Jun 17, 2024, 6:29:42 AM WIP
🔴 @Anshgrover23 Jun 17, 2024, 6:31:46 AM WIP
🟢 @onyedikachi-david Jun 25, 2024, 3:47:44 PM WIP

@Subh231004
Copy link

Subh231004 commented Jun 17, 2024

/attempt #760

@Anshgrover23
Copy link

Anshgrover23 commented Jun 17, 2024

/attempt #760

Implementation Plan for Model Cursor Feedback (Issue #760)
Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy.
Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image.
Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it.
Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot.
Testing: I'll write and execute unit tests to ensure the functionality works as intended.
Documentation: I'll update the project documentation to include usage instructions for the new strategy.
Pull Request: I'll submit a PR for review, incorporating any feedback provided.
This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.

@abrichr abrichr changed the title Implement model cursor feedback Implement model cursor for visual feedback Jun 17, 2024
@abrichr
Copy link
Contributor Author

abrichr commented Jun 20, 2024

@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.

@onyedikachi-david
Copy link

onyedikachi-david commented Jun 25, 2024

/attempt #760

Algora profile Completed bounties Tech Active attempts Options
@onyedikachi-david 2 bounties from 1 project
JavaScript, Shell
﹟764
Cancel attempt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
$ bounty $ Please suggest a price range 🙏 💎 Bounty enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants