-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README #70
Conversation
…EADME to include example of -accurate run
…nd and color in green, otherwise draw with white background and black color
some premature work on having the model pick which grid to choose on for revamped -accuracy mode. properly display the grid coordinates now. I plan on modifying the idea, I first cut out a 400px x 400px area around the originally guessed location, and then have the model continually pick which grid option/quadrant to click on from there, cropping out the selected grid in the process and x2 upsampling the image after each crop every time before passing it once again to GPT. |
Adding an implementation note for my future self: A clean way to implement the picking which grid to zoom in on when deciding which pixel to click can be cleanly implemented by the loop constantly storing the top left percentages and the bottom right percentages at each iteration of the loop. That way, at the end, you can just average the two percentages and return that as the pixel clicked. Additionally, maybe at first do 4 grid lines (dividing the area into 16 grids). but later, when more narrowed down, only do 2 grid lines (so dividing the area into fourths). Something like two 4 grid lines, and two 2 grid lines will yield a final pixel area of 400/(4^2 * 2^2) = 6.25, or a pixel mistake up to 3 pixels in any dimension. That is pretty darn accurate assuming the model correctly picks the correct grid every time. Additionally, look into polling the model. So ask the model to generate 9 responses, and then choose the most popular grid selection. Fail-safing the chance that a wrong grid choice was picked. |
Hmm, I'm curious for a bit more context for some on this commit. Hoping to keep most the none |
@joshbickett hey sorry just seeing this now, what do you mean by most of the none code the same? I'm planning on basically having two different draw_labels. for normal mouse clicking it shows the percentages in black with a white background. But when choosing grid, I forego the white rectangle and just display the text in a green color (this is because at some point, it gets zoomed in a lot to like a 6px x 6px range so having a white rectangle taking up pixels doesn't seem like the best idea). Esp when the model should know the top left corner is grid 0, then goes down then right (column major order) |
…pture a mini screenshot based on top left and bottom right percentages
@klxu03 you can ignore my last comment. I thought I am taking a closer look now. Got an error I haven't seen running normal |
@klxu03 Tried |
+1 for accuracy mode |
@klxu03 let me know if you have any updates or thoughts on this. Thanks |
@klxu03 curious if you have any updates. Looks like |
For sure remove, it's likely outdated. My bad I've been offline for a while on vacation. Returning later |
@klxu03 did a rewrite of the project without accuracy mode. I think that multimodal are going to solve this mouse click problem pretty soon. See CogAgent: https://arxiv.org/abs/2312.08914 I'll close this for now. If you have additional updates, let me know. |
Closes #77