fix: candidate role search SQL#149
Conversation
📝 WalkthroughWalkthroughAdds Discord member snapshotting and role-based matching: new Changes
Sequence Diagram(s)sequenceDiagram
participant Bot as Discord Bot
participant Audit as five08.audit (upsert_discord_member)
participant DB as Postgres (discord_members)
participant Search as Candidate Search (SQL)
participant Mapper as CandidateMatch mapper
Bot->>Audit: upsert_discord_member(discord_user_id, guild_id, ..., roles)
Audit->>DB: INSERT ... ON CONFLICT UPDATE discord_members
DB-->>Audit: ack
Search->>DB: candidate search query JOIN people FULL OUTER JOIN discord_members CROSS JOIN rtypes
DB-->>Search: rows (discord_user_id, has_crm_link, matched_discord_roles, ...)
Search->>Mapper: map rows -> CandidateMatch(discord_user_id, has_crm_link, ...)
Mapper-->>Search: ranked/filtered results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/shared/src/five08/job_match.py (1)
57-70: Add defensive type checks in normalization.Line 62 assumes every element is a string. If
discord_role_typesis built from partially validated input, this can raise at runtime. A small guard would make this helper safer.🛡️ Suggested hardening
def _normalize_discord_role_types(values: list[str]) -> list[str]: """Normalize and validate discord role types against the canonical list.""" normalized: list[str] = [] seen: set[str] = set() for raw in values: + if not isinstance(raw, str): + continue key = raw.strip().casefold() if not key: continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/src/five08/job_match.py` around lines 57 - 70, The helper _normalize_discord_role_types assumes every item is a str; add a defensive type check inside the loop (e.g., if not isinstance(raw, str): continue) before calling strip() and casefold() so non-string inputs are skipped (or coerce via str(raw) if you prefer conversion), keeping the rest of the canonical lookup and duplicate suppression logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/shared/src/five08/job_match.py`:
- Around line 120-125: extract_job_requirements is pre-filtering
discord_role_types using exact-name matching before the dataclass __post_init__
(which calls _normalize_discord_role_types) runs, so alias variants get dropped;
fix by normalizing the incoming discord_role_types in the extraction path (call
_normalize_discord_role_types or use the same normalization helper) inside
extract_job_requirements prior to any pre-filtering/filtering logic so the rest
of the function and downstream matching operate on the canonical names (refer to
__post_init__, _normalize_discord_role_types, extract_job_requirements, and the
discord_role_types parameter).
---
Nitpick comments:
In `@packages/shared/src/five08/job_match.py`:
- Around line 57-70: The helper _normalize_discord_role_types assumes every item
is a str; add a defensive type check inside the loop (e.g., if not
isinstance(raw, str): continue) before calling strip() and casefold() so
non-string inputs are skipped (or coerce via str(raw) if you prefer conversion),
keeping the rest of the canonical lookup and duplicate suppression logic
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a995a64b-b855-410c-b533-9dc203162ead
📒 Files selected for processing (1)
packages/shared/src/five08/job_match.py
| def __post_init__(self) -> None: | ||
| object.__setattr__( | ||
| self, | ||
| "discord_role_types", | ||
| _normalize_discord_role_types(self.discord_role_types), | ||
| ) |
There was a problem hiding this comment.
Alias normalization is still bypassed in the extraction path.
__post_init__ normalizes on Line 124, but extract_job_requirements pre-filters discord_role_types using exact-name matching on Line 273-Line 280. Variants like devops/fullstack can be dropped before this normalization hook runs, causing missed matches downstream.
💡 Suggested fix (use a single normalization path)
- # Accept values from the known Discord role names list, tolerant of case/whitespace drift.
- role_name_by_key = {name.casefold(): name for name in DISCORD_SKILL_ROLE_NAMES}
- discord_role_types: list[str] = []
- seen_roles: set[str] = set()
- for raw_role in _coerce_str_list(data.get("discord_role_types")):
- canonical = role_name_by_key.get(raw_role.strip().casefold())
- if canonical and canonical not in seen_roles:
- seen_roles.add(canonical)
- discord_role_types.append(canonical)
+ # Let JobRequirements.__post_init__ apply canonicalization + dedupe consistently.
+ discord_role_types = _coerce_str_list(data.get("discord_role_types"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/src/five08/job_match.py` around lines 120 - 125,
extract_job_requirements is pre-filtering discord_role_types using exact-name
matching before the dataclass __post_init__ (which calls
_normalize_discord_role_types) runs, so alias variants get dropped; fix by
normalizing the incoming discord_role_types in the extraction path (call
_normalize_discord_role_types or use the same normalization helper) inside
extract_job_requirements prior to any pre-filtering/filtering logic so the rest
of the function and downstream matching operate on the canonical names (refer to
__post_init__, _normalize_discord_role_types, extract_job_requirements, and the
discord_role_types parameter).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/test_candidate_search_e2e.py (1)
48-56: Align testdiscord_membersDDL with migration schema to avoid driftLine 48-56 defines a reduced table shape vs production migration (
apps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.py, Lines 15-41), which includescreated_at,updated_at, index, and update trigger. Keeping E2E setup closer to migration DDL will reduce false confidence when query/model usage evolves.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_candidate_search_e2e.py` around lines 48 - 56, Test DDL for _CREATE_DISCORD_MEMBERS_TABLE is missing production fields and metadata; update the SQL string to match the migration by adding created_at and updated_at columns (with TIMESTAMP WITH TIME ZONE and sensible DEFAULT now()), create the same index on guild_id used in the migration, and attach the same update trigger/function that maintains updated_at (ensure the trigger function referenced in the migration is created in the test setup or invoked the same way) so the test table shape and behavior mirror production.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@apps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.py`:
- Around line 19-20: The migration currently sets discord_user_id as the primary
key which causes rows for the same user in different guilds to overwrite each
other; update the table definition in the migration (the columns named
"discord_user_id" and "guild_id") to use a composite primary key of
("discord_user_id", "guild_id") instead of making discord_user_id the sole
primary key (or alternatively add a separate surrogate id primary key and index
discord_user_id and guild_id together); modify the sa.Column definitions and the
table primary_key constraint accordingly so each (discord_user_id, guild_id)
pair is unique.
In `@packages/shared/src/five08/candidate_search.py`:
- Around line 105-120: The query is incorrectly setting crm_contact_id to a
synthetic "discord:*" value which mixes real CRM IDs with discord identifiers
and breaks downstream URL generation (see crm_contact_id usage in discord_bot
cog); revert crm_contact_id to only real CRM IDs by removing the COALESCE
fallback that prefixes discord_user_id, keep COALESCE(p.crm_contact_id, ...) AS
crm_contact_id so it remains NULL when no CRM record exists, and instead ensure
downstream logic uses has_crm_link (p.crm_contact_id IS NOT NULL) and
COALESCE(p.discord_user_id, dm.discord_user_id) AS discord_user_id for
routing/display when there is no CRM link.
---
Nitpick comments:
In `@tests/integration/test_candidate_search_e2e.py`:
- Around line 48-56: Test DDL for _CREATE_DISCORD_MEMBERS_TABLE is missing
production fields and metadata; update the SQL string to match the migration by
adding created_at and updated_at columns (with TIMESTAMP WITH TIME ZONE and
sensible DEFAULT now()), create the same index on guild_id used in the
migration, and attach the same update trigger/function that maintains updated_at
(ensure the trigger function referenced in the migration is created in the test
setup or invoked the same way) so the test table shape and behavior mirror
production.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c162891c-607e-4f86-a6ac-5fa75636a19a
📒 Files selected for processing (5)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pyapps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.pypackages/shared/src/five08/audit.pypackages/shared/src/five08/candidate_search.pytests/integration/test_candidate_search_e2e.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/test_candidate_search_e2e.py (1)
48-57: Keep E2E schema aligned with migrations to catch real join/constraint bugs.The test DDL adds
discord_members, but it omits migration-level constraints/index/metadata (e.g., peopleUNIQUE(discord_user_id)from20260221_0200_create_people_and_audit_events.pyLine 58 and discord_members timestamp/trigger/index from20260304_0200_create_discord_members_table.pyLine 48-Line 68). This weakens parity and can hide duplicate-row regressions.Also applies to: 77-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_candidate_search_e2e.py` around lines 48 - 57, Update the test DDL to match migration-level constraints and indexes: modify the CREATE_DISCORD_MEMBERS_TABLE test constant to include the same timestamp columns, trigger, and any indexes present in the 20260304_0200_create_discord_members_table migration, and ensure the people table DDL used in tests includes the UNIQUE(discord_user_id) constraint as in 20260221_0200_create_people_and_audit_events.py; specifically adjust the CREATE_DISCORD_MEMBERS_TABLE block (and the other DDL block referenced around lines 77-85) so test schema, constraints (UNIQUE(discord_user_id)), timestamps, triggers and indexes mirror the migrations exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/shared/src/five08/candidate_search.py`:
- Around line 162-164: The ORDER BY is using the raw p.is_member which can be
NULL for discord-only rows and therefore sorts ahead of false in DESC order;
update the ordering to use the same coalesced value used in the SELECT (e.g.,
COALESCE(p.is_member, false) or the selected alias like is_member) so
member=true rows consistently sort before non-members before the LIMIT is
applied (replace references to p.is_member in ORDER BY with the coalesced
expression/alias).
- Around line 162-163: The FULL OUTER JOIN against discord_members using only
discord_user_id can produce duplicate candidate rows when a user exists in
multiple guilds; modify the query in candidate_search.py so that discord_members
is pre-aggregated (e.g., a derived table or CTE) grouped by discord_user_id to
collapse multi-guild rows into one before performing the FULL OUTER JOIN (the
JOIN that references discord_members dm ON dm.discord_user_id =
p.discord_user_id), ensuring the rest of the query (including rtypes CROSS JOIN)
operates on unique discord_user_id rows and preserves correct LIMIT and ranking.
---
Nitpick comments:
In `@tests/integration/test_candidate_search_e2e.py`:
- Around line 48-57: Update the test DDL to match migration-level constraints
and indexes: modify the CREATE_DISCORD_MEMBERS_TABLE test constant to include
the same timestamp columns, trigger, and any indexes present in the
20260304_0200_create_discord_members_table migration, and ensure the people
table DDL used in tests includes the UNIQUE(discord_user_id) constraint as in
20260221_0200_create_people_and_audit_events.py; specifically adjust the
CREATE_DISCORD_MEMBERS_TABLE block (and the other DDL block referenced around
lines 77-85) so test schema, constraints (UNIQUE(discord_user_id)), timestamps,
triggers and indexes mirror the migrations exactly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c795bd7d-b258-4c37-b7f3-a2a3a81d1540
📒 Files selected for processing (6)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pyapps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.pypackages/shared/src/five08/audit.pypackages/shared/src/five08/candidate_search.pypackages/shared/src/five08/job_match.pytests/integration/test_candidate_search_e2e.py
🚧 Files skipped from review as they are similar to previous changes (3)
- apps/discord_bot/src/five08/discord_bot/cogs/crm.py
- packages/shared/src/five08/audit.py
- apps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.py
| FULL OUTER JOIN discord_members dm ON dm.discord_user_id = p.discord_user_id | ||
| CROSS JOIN rtypes | ||
| WHERE (p.sync_status = 'active' OR p.sync_status IS NULL) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /repo && find . -name "candidate_search.py" -type fRepository: 508-dev/508-workflows
Length of output: 120
🏁 Script executed:
pwd && git ls-files | grep candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 166
🏁 Script executed:
wc -l /home/jailuser/git/packages/shared/src/five08/candidate_search.py && head -200 /home/jailuser/git/packages/shared/src/five08/candidate_search.py | tail -100Repository: 508-dev/508-workflows
Length of output: 4150
🏁 Script executed:
sed -n '140,170p' /home/jailuser/git/packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 1454
🏁 Script executed:
sed -n '162,180p' /home/jailuser/git/packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 819
🏁 Script executed:
sed -n '180,195p' /home/jailuser/git/packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 512
Fix member-priority ordering for FULL OUTER JOIN rows before LIMIT.
After the FULL OUTER JOIN at line 162, p.is_member can be NULL for discord-only members. The SELECT clause correctly coalesces this to false, but the ORDER BY at line 182 uses the raw p.is_member column. In PostgreSQL, NULL sorts before false in DESC order, causing discord-only members to rank ahead and skew top-N results before LIMIT. Update the ORDER BY to use the coalesced expression for consistency:
Ordering fix
ORDER BY
- p.is_member DESC,
+ COALESCE(p.is_member, false) DESC,
timezone_matched DESC,
required_matched DESC,
discord_role_matched DESC,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/src/five08/candidate_search.py` around lines 162 - 164, The
ORDER BY is using the raw p.is_member which can be NULL for discord-only rows
and therefore sorts ahead of false in DESC order; update the ordering to use the
same coalesced value used in the SELECT (e.g., COALESCE(p.is_member, false) or
the selected alias like is_member) so member=true rows consistently sort before
non-members before the LIMIT is applied (replace references to p.is_member in
ORDER BY with the coalesced expression/alias).
| FULL OUTER JOIN discord_members dm ON dm.discord_user_id = p.discord_user_id | ||
| CROSS JOIN rtypes |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "candidate_search.py" -type fRepository: 508-dev/508-workflows
Length of output: 113
🏁 Script executed:
find . -path "*/migrations/*discord_members*" -o -path "*/migrations/*20260304*"Repository: 508-dev/508-workflows
Length of output: 262
🏁 Script executed:
find . -type f -name "*.py" | grep -i migration | head -20Repository: 508-dev/508-workflows
Length of output: 708
🏁 Script executed:
wc -l ./packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 117
🏁 Script executed:
sed -n '155,175p' ./packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 892
🏁 Script executed:
cat -n ./apps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.pyRepository: 508-dev/508-workflows
Length of output: 2870
🏁 Script executed:
rg "SELECT|FROM|JOIN|WHERE" ./packages/shared/src/five08/candidate_search.py -A 1 -B 1 | head -80Repository: 508-dev/508-workflows
Length of output: 2010
🏁 Script executed:
cat -n ./packages/shared/src/five08/candidate_search.pyRepository: 508-dev/508-workflows
Length of output: 11833
🏁 Script executed:
rg "password|secret|key|token|api_key|credential|api|auth" ./packages/shared/src/five08/candidate_search.py -iRepository: 508-dev/508-workflows
Length of output: 88
Guard against duplicate candidates from multi-guild Discord rows.
Line 162 joins discord_members only on discord_user_id. Because discord_members is keyed by (guild_id, discord_user_id) (see migration at apps/worker/src/five08/worker/migrations/versions/20260304_0200_create_discord_members_table.py lines 29–33), one person can appear in multiple rows for different guilds. This causes the FULL OUTER JOIN to duplicate candidate rows, distorting the LIMIT and ranking results.
Aggregate discord_members by discord_user_id before the join to consolidate multi-guild membership into a single row per user:
Suggested SQL shape
WITH
req AS (SELECT %s::text[] AS skills),
pref AS (SELECT %s::text[] AS skills),
- rtypes AS (SELECT %s::text[] AS types)
+ rtypes AS (SELECT %s::text[] AS types),
+ dm_agg AS (
+ SELECT
+ discord_user_id,
+ MAX(discord_username) AS discord_username,
+ MAX(display_name) AS display_name,
+ COALESCE(
+ jsonb_agg(DISTINCT role) FILTER (WHERE role IS NOT NULL),
+ '[]'::jsonb
+ ) AS roles
+ FROM discord_members dm
+ LEFT JOIN LATERAL jsonb_array_elements_text(dm.roles) AS role ON TRUE
+ GROUP BY discord_user_id
+ )
...
-FROM people p
-FULL OUTER JOIN discord_members dm ON dm.discord_user_id = p.discord_user_id
+FROM people p
+FULL OUTER JOIN dm_agg dm ON dm.discord_user_id = p.discord_user_id🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/src/five08/candidate_search.py` around lines 162 - 163, The
FULL OUTER JOIN against discord_members using only discord_user_id can produce
duplicate candidate rows when a user exists in multiple guilds; modify the query
in candidate_search.py so that discord_members is pre-aggregated (e.g., a
derived table or CTE) grouped by discord_user_id to collapse multi-guild rows
into one before performing the FULL OUTER JOIN (the JOIN that references
discord_members dm ON dm.discord_user_id = p.discord_user_id), ensuring the rest
of the query (including rtypes CROSS JOIN) operates on unique discord_user_id
rows and preserves correct LIMIT and ranking.
Description
Fix discord role matching query to avoid text/text[] operator errors by joining rtypes.
Expand candidate search e2e schema and add a role-type match test.
Related Issue
N/A
How Has This Been Tested?
Not run (pre-commit ran ruff, ruff format, mypy).
Summary by CodeRabbit
New Features
Bug Fixes
Database
Tests