Implement model cursor for visual feedback #760

abrichr · 2024-06-16T12:58:58Z

Feature request

We want to be able to give the model the ability to:

paint a red dot on its suggested target location
look at the screenshot with the dot on it,
optionally self correct.

Thank you @LunjunZhang for the suggestion 🙏

This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.

Motivation

Correct errors, e.g. missed segmentations.

Possibly related: https://arxiv.org/abs/2406.09403:

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn.
...
Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.

abrichr · 2024-06-17T00:31:01Z

/bounty $1000

algora-pbc · 2024-06-17T00:31:05Z

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

Start working: Comment /attempt #760 with your implementation plan
Submit work: Create a pull request including /claim #760 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to OpenAdaptAI/OpenAdapt!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @Amanullah1002	Jun 17, 2024, 3:18:43 AM	WIP
🔴 @Subh231004	Jun 17, 2024, 6:29:42 AM	WIP
🔴 @Anshgrover23	Jun 17, 2024, 6:31:46 AM	WIP
🟢 @onyedikachi-david	Jun 25, 2024, 3:47:44 PM	WIP

Subh231004 · 2024-06-17T06:29:40Z

/attempt #760

Options

Cancel my attempt

Anshgrover23 · 2024-06-17T06:31:44Z

/attempt #760

Implementation Plan for Model Cursor Feedback (Issue #760)
Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy.
Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image.
Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it.
Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot.
Testing: I'll write and execute unit tests to ensure the functionality works as intended.
Documentation: I'll update the project documentation to include usage instructions for the new strategy.
Pull Request: I'll submit a PR for review, incorporating any feedback provided.
This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.

Options

Cancel my attempt

abrichr · 2024-06-20T13:52:20Z

@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.

onyedikachi-david · 2024-06-25T15:47:42Z

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@onyedikachi-david	2 bounties from 1 project	JavaScript, Shell	﹟764	Cancel attempt

abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed $ bounty $ Please suggest a price range 🙏 💎 Bounty and removed 💎 Bounty labels Jun 16, 2024

algora-pbc bot added the 💎 Bounty label Jun 17, 2024

abrichr changed the title ~~Implement model cursor feedback~~ Implement model cursor for visual feedback Jun 17, 2024

R-ohit-B-isht mentioned this issue Jun 18, 2024

Implement CursorReplayStrategy for Visual Feedback R-ohit-B-isht/OpenAdapt#2

Open

Subh231004 mentioned this issue Jun 19, 2024

Implemented model cursor for visual feedback #760 #781

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement model cursor for visual feedback #760

Implement model cursor for visual feedback #760

abrichr commented Jun 16, 2024 •

edited

Loading

abrichr commented Jun 17, 2024

algora-pbc bot commented Jun 17, 2024 •

edited

Loading

Subh231004 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

Anshgrover23 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

abrichr commented Jun 20, 2024

onyedikachi-david commented Jun 25, 2024 •

edited by algora-pbc bot

Loading

Implement model cursor for visual feedback #760

Implement model cursor for visual feedback #760

Comments

abrichr commented Jun 16, 2024 • edited Loading

Feature request

Motivation

abrichr commented Jun 17, 2024

algora-pbc bot commented Jun 17, 2024 • edited Loading

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

Additional opportunities:

Subh231004 commented Jun 17, 2024 • edited by algora-pbc bot Loading

Anshgrover23 commented Jun 17, 2024 • edited by algora-pbc bot Loading

abrichr commented Jun 20, 2024

onyedikachi-david commented Jun 25, 2024 • edited by algora-pbc bot Loading

abrichr commented Jun 16, 2024 •

edited

Loading

algora-pbc bot commented Jun 17, 2024 •

edited

Loading

Subh231004 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

Anshgrover23 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

onyedikachi-david commented Jun 25, 2024 •

edited by algora-pbc bot

Loading