Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qrdet sometimes give wrong quad_xy coords? #7

Open
vladimir-dudnik-1 opened this issue Feb 21, 2024 · 9 comments
Open

qrdet sometimes give wrong quad_xy coords? #7

vladimir-dudnik-1 opened this issue Feb 21, 2024 · 9 comments

Comments

@vladimir-dudnik-1
Copy link

vladimir-dudnik-1 commented Feb 21, 2024

qrdet v2.4 installed from PyPi seems give wrong quad_xy coords, see resulting image below
out

This result was obtained from this test image
test
using code below:
`#!/usr/bin/python3
from qrdet import QRDetector
import numpy as np
import cv2

detector = QRDetector(model_size = 's')
image = cv2.imread(filename='test.jpg')
detections = detector.detect(image=image, is_bgr=True)

Draw the detections

for detection in detections:
x1, y1, x2, y2 = np.array(detection['bbox_xyxy'], np.int32)
cv2.rectangle(image, (x1, y1), (x2, y2), color = (0, 255, 0), thickness = 1)

(qx1, qy1), (qx2, qy2), (qx3, qy3), (qx4, qy4) = np.array(detection['quad_xy'], np.int32)
cv2.circle(image, (qx1, qy1), 4, color = (0, 0, 255), thickness = -1)
cv2.circle(image, (qx2, qy2), 4, color = (0, 0, 255), thickness = -1)
cv2.circle(image, (qx3, qy3), 4, color = (0, 0, 255), thickness = -1)
cv2.circle(image, (qx4, qy4), 4, color = (0, 0, 255), thickness = -1)

confidence = detection['confidence']
cv2.putText(image, f'{confidence:.2f}', (x1, y1 - 10), fontFace = cv2.FONT_HERSHEY_SIMPLEX, fontScale = 1, color = (0, 255, 0), thickness = 2)

Save the result

cv2.imwrite(filename='out.jpg', img=image)
`

@Eric-Canas
Copy link
Owner

Hi,

You are True. qrdet runs a segmentation model under the hood, so getting the four corners from the segmentation mask is a bit tricky and needs to be improved.

The method I'm using right now, is defined in quadrilateral-fitter.

That was my first approach, but that's definitely not perfect, and I should think about it again and find a better one.

Please, any ideas are welcomed!

Thanks for sharing your test case, I will use it to validate the next quadrilateral-fitter approach.

@Trichy-man
Copy link

Hi!

I was reading this paper Identification of QR Code Perspective Distortion Based on Edge Directions and Edge Projections Analysis (https://www.mdpi.com/2313-433X/6/7/67) maybe you can find it useful. Right now I am trying to implement it and understand how it could fit in the repo.

Thanks for all the work done so far.

@Eric-Canas
Copy link
Owner

123948234 million thanks, @Trichy-man!

I'll read it. That's something I tried and failed.

The yolov8_results_to_dict function, is where I'm actually fitting the quadrilateral from each one of the detected polygon masks, and then, I'm building the dictionary that contains each polygon features.

These dicts are the ones that users are getting when calling detect().
My first plan about that, was to use the accurate_polygon_xy mask to crop the subimage containing the QR, find the edges in that subimage and then, include in that dictionary the coords of those three finding patterns. Something like... edge_coords_xy: {bl: (x, y), tl: (x, y), tr: (x, y)}

But I never found a valid way of finding those edges.

One of the main reasons why finding them would be extremely relevant, is because with them it could be inferred a very relevant value: the QR rotation_degrees. That value would be extremely useful in a lot of applications, as for example, for camera alignment, perspective correction, finding the rotation of objects...

Those are QR use cases that doesn't even need to have a readable QR.

I tried it and I couldn't find a way. My second thought was to train a second CV model, based on some adaptation over human-pose detection models. But that approach is quite time-demanding too to implement.

If you are able to make edge detection work with that's paper approach, I'll be eternally grateful to you!

@Trichy-man
Copy link

Hi @Eric-Canas

probably we could use the subimage containing the QR, use Canny image detection to 1) find the 1:1:3:1:1 ratio of the finder pattern 2) compute the centroids 3) find the 4th vertex 4) apply transorms (rotation, perspective and cylindrical).

@Eric-Canas
Copy link
Owner

It looks like a good approach.

Some notes about that:

I think 4th point should be modified if placed here. Applying the transform is a step that should help on reading the QR, but qrdet purpose is constrained to detection. QR detection+reading is done at QReader. This way people using qr's only as key marks, doesn't need to get the rest of the overhead and dependencies implied in the decoding part. QReader is actually doing that homography when trying to decode, but it is actually based on the quad_xy_coords that are not stable enough, and could be improved this way.

However, something about that 4th point that I think could be super useful, would be to directly calculate and get in that results_dict the transformation matrix. So in case the user needs to apply the transform over the qr, or even the full image, we are already returning that matrix.
And it could be directly used on QReader to improve the stabilization of the decoding.

In a future, I think that the edge detection Machine Learning model could work as a fallback for difficult use cases, where the tricky position or occlusions over the QR, cramped papers... would make it hard to find the finder patterns with the Canny based method. But having to run the model only in those tricky wild cases where the Canny approach didn't get a result, should heavily reduce the overhead, speed up the detection and improve precision.

@Trichy-man
Copy link

A fast QR code detector based on a similar idea of the paper is already present in the Open CV library modules -> objdetect -> src -> qrcode.cpp.

@Eric-Canas
Copy link
Owner

I heavily used it as a help while tagging the dataset. My experience was that it worked quite well for "test cases" when the QR is perfectly and clearly visible. And segmentation when working was almost pixel-perfect. But didn't give very good results when working in the wild, with QRs that are not as clean, big or plain (which is actually the most common usage of qrdet).

In fact, that's the one I used in the benchmark for QReader.

Do you think it would work well for detecting the corners once the subimage is already cropped? Maybe we can use a subpart of it? Maybe we could even adapt the algorithm of the paper to improve that detection rate of the finder patterns by taking profit of the segmentation mask we have. Such as for example... Enhancing the sensitivity of the algorithm, or being a bit more tolerant in that 1:1:3:1:1 ratio and take profit of the fact that we already know that there are likely no other elements in the image others than the QR, as it is already segmented. So we can discard False Positives by keeping only the three finder patterns that are closer to the edges.

Or something alike. I think that there is a clear advantage in the fact that original algorithm is intended for detecting QR in scenarios with other distracting elements that will also produce edges when applying Canny. While this case is an over simplified task where we already have a cropped image containing a QR, with no background, were we only want to find where those finding pattern corners are. In order to calculate rotations and transformation matrices, and give 3 (or 4 if we include the 4th vertex) anchor points to simplify the noisy segmentation mask.

Just to be in the same page, when thinking about the intended use case for qrdet, I think of something like a slightly crampled paper containing a qr, lying in the middle of the grass. More or less, that kind of "in the wild".

@Trichy-man
Copy link

There is quite a bit to discuss. Assuming that the quad fitted on the QR is correct, my first idea was to compute the angle of rotation (by interpolating a straight line with a side of the quad, again assuming it is fitted correctly) and "guess" the correct rotation by rotating at most 4 times. Otherwise we can find the FP (finder patterns) in the cropped image to recollect the information for rotation.

As for perspective and cylindrical tranformations we do not need the FPs, but actually only the fitted quad (in the case of cylindrical we would need a curved quad, so my idea was to fill all the QR to obtain a single black square e get the edges from that). The only implementation I was able to find for finding FPs was this https://github.com/omargamal253/Automatic-Segmentation-and-Alignment-of-QR-Codes/blob/main/IP%20Project/Find_And_Detect_Corners.m based on black and white connected components.

Also, i was reading about Orientated Bounding Boxes (OBB) (https://docs.ultralytics.com/tasks/obb/).

@Eric-Canas
Copy link
Owner

Eric-Canas commented May 25, 2024

Hi!

About substituting the segmentator by Oriented Bounding Boxes: I don't think they would fit as they couldn't represent persepective. And in the way we would lost the precise segmentation mask, that could be useful for some cases.

About the collect of the finder patterns: If we assume the quad is correctly calculated, we could do that rotation until fitting the straight line. In fact, something similar is done at QReader for actually correcting the perspective when decoding the QR. That assumption of quad coords being actually correct is not always true, but is a first approach. They are actually calculated by assigning each point of the mask to one of the four sides of the quadrilateral and fitting a line for them. Sometimes work, sometimes not.

Any effort on calculating that transformation matrices now, based on the quad_coords, although based in non-reliable coords right now, could become automatically true once the calculation of that coords will be improved by detecting the finding patterns. And at the end of the day, if finding them becomes difficult or erratic, I could finally tag them as a dataset and look for a way of modifying a pose-detection model to find these three keypoints. A simple and small yolo-pose-n would work for that purpose likely, as the problem is too constrained.

So if you find the task of detecting the finding patterns difficult, we can focus on the cylindrical transformation, and I'll just look for a way of training that model

The part of finding the perspective + rotation transformation matrix is actually done here. Rotation is not completely real, as we don't have yet the finding patterns information that tag precisely which of the quad_coords are tr, br, bl & tl. But it would be just a minor change once they will be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants