Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README #70

Closed
wants to merge 10 commits into from

Conversation

klxu03
Copy link
Contributor

@klxu03 klxu03 commented Dec 4, 2023

Closes #77

@klxu03
Copy link
Contributor Author

klxu03 commented Dec 4, 2023

some premature work on having the model pick which grid to choose on for revamped -accuracy mode. properly display the grid coordinates now.

I plan on modifying the idea, I first cut out a 400px x 400px area around the originally guessed location, and then have the model continually pick which grid option/quadrant to click on from there, cropping out the selected grid in the process and x2 upsampling the image after each crop every time before passing it once again to GPT.

@klxu03
Copy link
Contributor Author

klxu03 commented Dec 4, 2023

Adding an implementation note for my future self:

A clean way to implement the picking which grid to zoom in on when deciding which pixel to click can be cleanly implemented by the loop constantly storing the top left percentages and the bottom right percentages at each iteration of the loop. That way, at the end, you can just average the two percentages and return that as the pixel clicked.

Additionally, maybe at first do 4 grid lines (dividing the area into 16 grids). but later, when more narrowed down, only do 2 grid lines (so dividing the area into fourths). Something like two 4 grid lines, and two 2 grid lines will yield a final pixel area of 400/(4^2 * 2^2) = 6.25, or a pixel mistake up to 3 pixels in any dimension. That is pretty darn accurate assuming the model correctly picks the correct grid every time.

Additionally, look into polling the model. So ask the model to generate 9 responses, and then choose the most popular grid selection. Fail-safing the chance that a wrong grid choice was picked.

@klxu03 klxu03 changed the title adding README comment on -accuracy adding README comment on -accuracy and beginning of the -accuracy grid rewrite Dec 4, 2023
@klxu03 klxu03 changed the title adding README comment on -accuracy and beginning of the -accuracy grid rewrite adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README Dec 4, 2023
@joshbickett
Copy link
Contributor

Hmm, I'm curious for a bit more context for some on this commit. Hoping to keep most the none -accurate code the same when making -accurate improvements

@klxu03
Copy link
Contributor Author

klxu03 commented Dec 7, 2023

@joshbickett hey sorry just seeing this now, what do you mean by most of the none code the same? I'm planning on basically having two different draw_labels. for normal mouse clicking it shows the percentages in black with a white background. But when choosing grid, I forego the white rectangle and just display the text in a green color (this is because at some point, it gets zoomed in a lot to like a 6px x 6px range so having a white rectangle taking up pixels doesn't seem like the best idea). Esp when the model should know the top left corner is grid 0, then goes down then right (column major order)

@joshbickett
Copy link
Contributor

joshbickett commented Dec 9, 2023

@klxu03 you can ignore my last comment. I thought draw_label_with_background changed significantly but now I just see you added a condition for your -accurate method. All good, no concerns.

I am taking a closer look now. Got an error I haven't seen running normal operate without -accurate. Maybe a fluke, I'll look closer
image

@joshbickett
Copy link
Contributor

joshbickett commented Dec 9, 2023

@klxu03 Tried -accurate mode on a task got this error. I'm very interested to see where this PR goes. Let me know when you think it is ready for more testing!
image

@slavakurilyak
Copy link

+1 for accuracy mode

@joshbickett
Copy link
Contributor

I am taking a closer look now. Got an error I haven't seen running normal operate without -accurate. Maybe a fluke, I'll look closer

@klxu03 let me know if you have any updates or thoughts on this. Thanks

@joshbickett
Copy link
Contributor

@klxu03 curious if you have any updates. Looks like -accurate may still have issues. May make sense to remove for now until there are updates

@klxu03
Copy link
Contributor Author

klxu03 commented Jan 7, 2024

For sure remove, it's likely outdated. My bad I've been offline for a while on vacation. Returning later

@joshbickett
Copy link
Contributor

@klxu03 did a rewrite of the project without accuracy mode. I think that multimodal are going to solve this mouse click problem pretty soon. See CogAgent: https://arxiv.org/abs/2312.08914

I'll close this for now. If you have additional updates, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants