CollectorVision Part 2: Cornelius — How the Corner Detector Finds Cards in the Wild

> Part 2 of the CollectorVision series. [Part 1](https://github.com/HanClinto/SimpleGitBlog/issues/19) has the overview.

Before you can identify a card you have to find it. Cornelius is the model responsible for that — it takes a camera frame and predicts where the four corners of the card are.

This sounds easy. It's not.

---

## What the detector has to deal with

Cards in a real scene come in a lot of shapes. They're held at angles, rotated, half-off-screen, in colored sleeves, on patterned table surfaces, or overlapping other cards. There's no clean white rectangle to look for. Classical approaches based on edge detection and contour finding work reasonably well in controlled conditions, but in practice they require a lot of tuning and tend to fail on anything unusual.

Using a learned model means the detector can be trained on examples of what "card in hand" actually looks like, rather than what a theoretical textbook expects it to look like.

---

## Architecture

Cornelius is a MobileViT-XXS backbone with SimCC coordinate heads. Input is a 384×384 RGB image, ImageNet-normalized. The output is four normalized (x, y) corner coordinates — one per corner of the card — plus a sharpness score.

[MobileViT-XXS](https://arxiv.org/abs/2110.02178) is a hybrid CNN/Transformer architecture designed for mobile inference. The full thing runs in around 10–15ms per frame on a laptop CPU, 30–50ms on a phone.

[SimCC](https://arxiv.org/abs/2107.03332) (Simple Coordinate Classification) is the interesting part. Rather than predicting corner positions directly as floating-point numbers, it turns each coordinate axis into a classification problem over a discretized grid. The model predicts a probability distribution over bins, and the expected value of that distribution is the corner position. This lets the model express uncertainty — a sharp peak in the distribution means high confidence; a flat distribution means the model doesn't know.

---

## The sharpness gate

The sharpness score is the mean peak value across all eight softmax distributions (four corners times two axes). It turned out to be a useful proxy for "is there a well-framed card here?"

The model also outputs a card presence logit, but that one isn't reliable — it fires strongly on blank images and hands without cards. The sharpness signal is better. When the card is clearly in frame with clean corners, all eight distributions are sharply peaked. When there's nothing there, or the image is blurry, the distributions go flat.

In practice, frames below a sharpness threshold of around 0.08–0.10 are just skipped. Nothing is forced.

```python
detection = detector.detect(image)
if detection.sharpness < 0.10:
    continue  # try the next frame
```

---

## Corner ordering

Four predicted corners still need to be assigned to the right positions — top-left, top-right, bottom-right, bottom-left. The ordering uses a standard geometric trick:

```python
s = pts.sum(axis=1)   # x + y
d = np.diff(pts, axis=1).ravel()  # x - y

tl = pts[np.argmin(s)]   # smallest x+y
tr = pts[np.argmin(d)]   # smallest x-y
br = pts[np.argmax(s)]   # largest x+y
bl = pts[np.argmax(d)]   # largest x-y
```

This works for any convex quadrilateral regardless of how skewed the card is.

---

## Dewarp

Once the corners are ordered, a perspective transform maps the card to a fixed output rectangle. The canonical output size is 252 × 352 pixels — proportional to the physical card dimensions at 4 pixels per millimeter.

`detection.dewarp(image)` returns a PIL Image, always the same shape, regardless of how the card was held. That consistency matters a lot for the embedder — Milo always sees the same-shaped input.

---

## Failure modes

Cornelius has trouble with:

- Cards lying flat on a complex surface, where the card border blends into the background
- Very high viewing angles (more than about 60 degrees off axis)
- Full-art cards with no clear light-colored border

For most practical use — card held roughly flat, camera more or less overhead — it works reliably. The sharpness gate handles bad frames by skipping them rather than producing wrong answers.

---

## The pluggable interface

The library's `CornerDetector` interface is a protocol, not a base class. Anything with a `detect(image) -> DetectionResult` method works as a drop-in replacement. There's an example in the repo of a Canny-edge-based detector implemented in about 30 lines, mostly to show how the interface works.

---

Next: [Part 3 — Milo, the embedding model](https://github.com/HanClinto/SimpleGitBlog/issues/21)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollectorVision Part 2: Cornelius — How the Corner Detector Finds Cards in the Wild #20

What the detector has to deal with

Architecture

The sharpness gate

Corner ordering

Dewarp

Failure modes

The pluggable interface

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CollectorVision Part 2: Cornelius — How the Corner Detector Finds Cards in the Wild #20

Description

What the detector has to deal with

Architecture

The sharpness gate

Corner ordering

Dewarp

Failure modes

The pluggable interface

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions