Skip to content

Fix candidate search ordering and test schema#150

Merged
michaelmwu merged 1 commit into
mainfrom
michaelmwu/candidate-search-ordering
Mar 4, 2026
Merged

Fix candidate search ordering and test schema#150
michaelmwu merged 1 commit into
mainfrom
michaelmwu/candidate-search-ordering

Conversation

@michaelmwu
Copy link
Copy Markdown
Member

@michaelmwu michaelmwu commented Mar 4, 2026

Description

  • Aggregate discord member rows before joining and order by coalesced member status.
  • Align candidate search E2E test schema with migrations (constraints, timestamps, triggers, indexes).

Related Issue

  • N/A

How Has This Been Tested?

Summary by CodeRabbit

  • New Features

    • Enhanced candidate search ranking to prioritize skill strength and preferred-match metrics alongside member status for improved result ordering.
  • Tests

    • Updated test infrastructure and database schema with new fields for tracking timestamps and member information.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

The PR refactors the candidate search query to aggregate Discord user information into a new CTE (dm_agg) and expands result ordering criteria. It simultaneously updates test database schemas to include new metadata columns (created_at, updated_at, is_member, github_username, seniority, etc.) with auto-update triggers, removes the UNIQUE constraint on crm_contact_id, and extends fixture logic to populate the new fields.

Changes

Cohort / File(s) Summary
Candidate Search Query Logic
packages/shared/src/five08/candidate_search.py
Introduces a new dm_agg CTE to pre-aggregate Discord user metadata (discord_user_id, discord_username, display_name, roles) from discord_members. Replaces direct join to discord_members with join to dm_agg. Extends ORDER BY from just p.is_member to include required_skill_score and preferred_matched for broader ranking criteria.
Test Database Schemas & Fixtures
tests/integration/test_candidate_search_e2e.py
Updates people table schema: removes UNIQUE constraint from crm_contact_id, adds created_at/updated_at (TIMESTAMPTZ with auto-update triggers), contact_type, address_city, github_username, is_member, seniority, linkedin, latest_resume_id, latest_resume_name, and sync_status CHECK constraint. Updates discord_members table with roles JSONB default and timestamp columns. Adds indexes on people (email, discord_user_id, skills GIN, is_member, seniority, address_country) and discord_members. Introduces CREATE\_UPDATED_AT_FUNCTION and CREATE\_UPDATED_AT_TRIGGER for auto-timestamp updates. Updates _insert helper to populate new columns with Jsonb wrappers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A CTE blooms where Discord dwells,
Aggregating tales and bells,
With timestamps auto-ticking true,
The schema dances, fresh and new—
Sorting by skill and favor's gleam! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix candidate search ordering and test schema' accurately reflects the main changes: ordering logic updates in candidate_search.py and schema alignment in the test integration file.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch michaelmwu/candidate-search-ordering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/shared/src/five08/candidate_search.py (1)

195-200: ⚠️ Potential issue | 🟠 Major

SQL ranking is still being overridden downstream.

This ORDER BY is not final because Line 264 re-sorts in Python with a different precedence (it omits timezone_matched and uses Discord role count instead of the SQL binary match), so candidate order can still diverge from the intended SQL ranking.

💡 Suggested alignment fix
-    # Secondary sort: preserve primary SQL ranking, break ties with seniority alignment.
-    results.sort(
-        key=lambda c: (
-            not c.is_member,
-            -len(c.matched_required_skills),
-            -len(c.matched_discord_roles),
-            -c.required_skill_score,
-            -len(c.matched_preferred_skills),
-            -c.seniority_score,
-        )
-    )
+    preferred_tz_set = set(preferred_timezones)
+    results.sort(
+        key=lambda c: (
+            not c.is_member,
+            -(1 if preferred_tz_set and c.timezone in preferred_tz_set else 0),
+            -len(c.matched_required_skills),
+            -(1 if c.matched_discord_roles else 0),
+            -c.required_skill_score,
+            -len(c.matched_preferred_skills),
+            -c.seniority_score,
+        )
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/candidate_search.py` around lines 195 - 200, The
SQL ORDER BY you added (is_member, timezone_matched, required_matched,
discord_role_matched, required_skill_score, preferred_matched) is being
overridden by the Python re-sort later (around line 264); update the downstream
sort used in the function in this module so it preserves the same precedence and
semantics as the SQL ORDER BY (or remove the Python re-sort entirely).
Specifically, ensure the Python comparator/key uses the same tuple ordering:
is_member, timezone_matched (include this boolean), required_matched,
discord_role_matched as a binary match (not a count), required_skill_score,
preferred_matched — so the final ordering matches the SQL ranking.
🧹 Nitpick comments (2)
tests/integration/test_candidate_search_e2e.py (1)

58-66: Remove the duplicate people(discord_user_id) index.

CONSTRAINT uq_people_discord_user_id UNIQUE (discord_user_id) (Line 54) already creates a btree index, so idx_people_discord_user_id duplicates it and adds unnecessary setup/write overhead.

🧹 Suggested cleanup
 _CREATE_PEOPLE_INDEXES = [
     "CREATE INDEX IF NOT EXISTS idx_people_email ON people (email)",
     "CREATE INDEX IF NOT EXISTS idx_people_email_508 ON people (email_508)",
-    "CREATE INDEX IF NOT EXISTS idx_people_discord_user_id ON people (discord_user_id)",
     "CREATE INDEX IF NOT EXISTS idx_people_skills ON people USING gin (skills)",
     "CREATE INDEX IF NOT EXISTS idx_people_is_member ON people (is_member)",
     "CREATE INDEX IF NOT EXISTS idx_people_seniority ON people (seniority)",
     "CREATE INDEX IF NOT EXISTS idx_people_address_country ON people (address_country)",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_candidate_search_e2e.py` around lines 58 - 66, The
_CREATE_PEOPLE_INDEXES list contains a redundant index entry
"idx_people_discord_user_id" on people(discord_user_id) that duplicates the
btree index already created by the unique constraint CONSTRAINT
uq_people_discord_user_id UNIQUE (discord_user_id); remove the "CREATE INDEX IF
NOT EXISTS idx_people_discord_user_id ON people (discord_user_id)" element from
_CREATE_PEOPLE_INDEXES to avoid duplicate index creation and extra write
overhead.
packages/shared/src/five08/candidate_search.py (1)

109-112: Make dm_agg.roles ordering deterministic.

jsonb_agg(DISTINCT role) has unspecified element order; adding ORDER BY role avoids unstable role ordering in results.

♻️ Suggested query tweak
-                COALESCE(
-                    jsonb_agg(DISTINCT role) FILTER (WHERE role IS NOT NULL),
-                    '[]'::jsonb
-                ) AS roles
+                COALESCE(
+                    jsonb_agg(DISTINCT role ORDER BY role) FILTER (WHERE role IS NOT NULL),
+                    '[]'::jsonb
+                ) AS roles
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/candidate_search.py` around lines 109 - 112, The
roles aggregation is nondeterministic because jsonb_agg(DISTINCT role) does not
guarantee element order; update the aggregate expression (the COALESCE wrapping
jsonb_agg(DISTINCT role) FILTER (WHERE role IS NOT NULL) AS roles) to include a
deterministic ordering (e.g., add ORDER BY role inside the jsonb_agg call) so
dm_agg.roles are returned in a stable, predictable order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/shared/src/five08/candidate_search.py`:
- Around line 195-200: The SQL ORDER BY you added (is_member, timezone_matched,
required_matched, discord_role_matched, required_skill_score, preferred_matched)
is being overridden by the Python re-sort later (around line 264); update the
downstream sort used in the function in this module so it preserves the same
precedence and semantics as the SQL ORDER BY (or remove the Python re-sort
entirely). Specifically, ensure the Python comparator/key uses the same tuple
ordering: is_member, timezone_matched (include this boolean), required_matched,
discord_role_matched as a binary match (not a count), required_skill_score,
preferred_matched — so the final ordering matches the SQL ranking.

---

Nitpick comments:
In `@packages/shared/src/five08/candidate_search.py`:
- Around line 109-112: The roles aggregation is nondeterministic because
jsonb_agg(DISTINCT role) does not guarantee element order; update the aggregate
expression (the COALESCE wrapping jsonb_agg(DISTINCT role) FILTER (WHERE role IS
NOT NULL) AS roles) to include a deterministic ordering (e.g., add ORDER BY role
inside the jsonb_agg call) so dm_agg.roles are returned in a stable, predictable
order.

In `@tests/integration/test_candidate_search_e2e.py`:
- Around line 58-66: The _CREATE_PEOPLE_INDEXES list contains a redundant index
entry "idx_people_discord_user_id" on people(discord_user_id) that duplicates
the btree index already created by the unique constraint CONSTRAINT
uq_people_discord_user_id UNIQUE (discord_user_id); remove the "CREATE INDEX IF
NOT EXISTS idx_people_discord_user_id ON people (discord_user_id)" element from
_CREATE_PEOPLE_INDEXES to avoid duplicate index creation and extra write
overhead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec63434a-3cb3-45ca-9d8d-791e9ba5681d

📥 Commits

Reviewing files that changed from the base of the PR and between f071694 and 11805f1.

📒 Files selected for processing (2)
  • packages/shared/src/five08/candidate_search.py
  • tests/integration/test_candidate_search_e2e.py

@michaelmwu michaelmwu merged commit 82a5f9b into main Mar 4, 2026
5 checks passed
@michaelmwu michaelmwu deleted the michaelmwu/candidate-search-ordering branch March 4, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant