Skip to content

CAP-94 Add embeddings module#16

Draft
tylerlu-hivemapper wants to merge 14 commits intomainfrom
cap-94-embeddings-module
Draft

CAP-94 Add embeddings module#16
tylerlu-hivemapper wants to merge 14 commits intomainfrom
cap-94-embeddings-module

Conversation

@tylerlu-hivemapper
Copy link
Copy Markdown
Contributor

@tylerlu-hivemapper tylerlu-hivemapper commented Apr 2, 2026

Summary

Two new beeutil modules for plugin developers to query scene embeddings and find matches.

beeutil.embeddings

  • fetch_and_match(since, query_embeddings) — fetch new embeddings, compare, return matches
  • list_embeddings(), cosine_similarity(), find_matches() — low-level primitives
  • load_query_embeddings() — stubbed, blocked on CAP-103 (endpoint TBD)

beeutil.recordings

  • get_videos_by_timerange(start_ms, end_ms) — find video files by time range

Also: _constants.py for shared ODC base URL, 26 tests.

Key decisions

  • Plugin does the comparison, not the pipeline
  • Query embeddings format: {label, embedding, threshold} — consistent across backend, odc-api, SDK
  • Uses numpy for cosine similarity (np.dot)
  • list_embeddings() is a thin wrapper — returns whatever odc-api gives us, no SDK-side field filtering
  • Cursor always advances in fetch_and_match() even with no matches
  • load_query_embeddings() stubbed — endpoint TBD pending team alignment on where query embeddings live on device (dashcam user config vs plugin KV store vs separate collection)

@tylerlu-hivemapper tylerlu-hivemapper marked this pull request as ready for review April 3, 2026 21:15
Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 issue(s).

  • [NIT] ODC_API_BASE now defined in three places (src/beeutil/_constants.py:2)
  • [NIT] Exception chaining lost in except blocks (src/beeutil/embeddings.py:65, src/beeutil/recordings.py:53)

raise EmbeddingsError(f'Expected list from odc-api, got {type(items).__name__}')

valid = []
for item in items:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, it's checking for sensor data. WDYT about if we move this to a method in the records module (and can parameterize it as an argument to do the checks in case there's a use-case that's not sensor data in the future)? That way we can leave embeddings.py be only concerned about embeddings operations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this logic. I think 1) it was checking for an edge case that's unlikely and 2) it shouldn't be the responsibility of this module/function.

Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

I found 3 issue(s).

  • [P2] load_query_embeddings returns unvalidated data (src/beeutil/embeddings.py:253)
  • [P3] cosine_similarity silently produces wrong results for non-normalized vectors (src/beeutil/embeddings.py:155)
  • [INFO] Duplicate ODC_API_BASE definition in secrets.py (src/beeutil/secrets.py:33)

Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

Copy link
Copy Markdown
Contributor

@jimmdd jimmdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

I found 1 issue(s).

  • [P3] Param forwarding and timeout no longer tested (tests/test_embeddings.py:161)

}
]

with patch('beeutil.embeddings.requests.get', return_value=mock_resp) as mock_get:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Param forwarding and timeout no longer tested

The old test_list_embeddings_returns_valid_items verified that since/until are correctly forwarded as query params and that timeout=10 is set. The new version only asserts on the returned timestamps. Combined with the removal of test_list_embeddings_no_params, test_list_embeddings_since_only, test_list_embeddings_until_only, and test_poll_and_match_forwards_since_param, there is now no test verifying the HTTP call contract (correct params, timeout). A regression silently dropping the timeout would cause production hangs. - [NIT] Extra blank lines left behind — tests/test_embeddings.py:52-53, 140-141, 213-214, 328-330, 366 Several spots have double/triple blank lines where tests were deleted. Minor cleanup.

Copy link
Copy Markdown
Contributor

@jimmdd jimmdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

@tylerlu-hivemapper tylerlu-hivemapper marked this pull request as draft April 6, 2026 20:46
Copy link
Copy Markdown
Contributor

@jimmdd jimmdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

# Conflicts:
#	requirements.txt
Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

# Conflicts:
#	src/beeutil/__init__.py
@tylerlu-hivemapper tylerlu-hivemapper force-pushed the cap-94-embeddings-module branch from 214e606 to 2754d52 Compare April 7, 2026 23:49
Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

Copy link
Copy Markdown

@hiveclawgit hiveclawgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (changes since last review):

✅ Looks good. I did not identify a discrete regression in this diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants