Skip to content

feat: add job match extraction and ranked candidate search#142

Merged
michaelmwu merged 7 commits into
mainfrom
michaelmwu/job-match-cmd
Mar 3, 2026
Merged

feat: add job match extraction and ranked candidate search#142
michaelmwu merged 7 commits into
mainfrom
michaelmwu/job-match-cmd

Conversation

@michaelmwu
Copy link
Copy Markdown
Member

@michaelmwu michaelmwu commented Mar 3, 2026

Summary

Adds a complete job matching pipeline: OpenAI-powered job requirement extraction, PostgreSQL-based ranked candidate search with skill/seniority/location matching, and a Discord /match-candidates command to find top candidates from the people cache for a given job posting.

Key Changes

  • Database: Extended people table with profile fields and separate skills (TEXT[] with GIN index) + skill_attrs (JSONB) for efficient PostgreSQL skill matching.
  • Job Match Extractor (job_match.py): Parses job postings with OpenAI (structured JSON output), uses regex pre-scan as hints. Reports configuration errors to webhook.
  • Candidate Search (candidate_search.py): Ranked PostgreSQL query (members first, location match, required/preferred skills with strength scoring, seniority alignment).
  • Discord Integration: /match-candidates command replies to thread with top 10 candidates including member/prospect status, contact info, CRM links, LinkedIn, resume names, and matched skills.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a Discord slash command to analyze job postings and return ranked candidate matches; includes contact, skills, resume/profile links, seniority, timezone, and paginated results.
    • Added job-posting analysis and a PostgreSQL-backed candidate search/ranking engine.
  • Chores / Database

    • Extended people schema to store membership, location, seniority, LinkedIn, skills, skill attributes, and latest resume metadata with indexes.
  • Tests

    • Added unit tests for job extraction and candidate search logic.

Add complete job matching pipeline:

- **DB Migration**: Extend people table with profile fields (contact_type, is_member, address_country, address_city, timezone, seniority, linkedin) and separate skills (TEXT[] with GIN index) + skill_attrs (JSONB) columns.

- **Job Requirements Extraction**: OpenAI-powered job_match.py parser with regex hints. Extracts required/preferred skills, seniority, location, timezones. Logs to webhook if unconfigured.

- **Candidate Search**: PostgreSQL-based candidate_search.py with multi-level ranking by member status, required skills, skill strength, preferred skills. Seniority scoring in Python (1.0 exact, 0.7 one level up).

- **Discord Command**: /match-candidates slash command parses job postings, searches ranked candidates, replies with top 10 results (member/prospect label, contact, CRM link, LinkedIn, resume, matched skills).

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

Warning

Rate limit exceeded

@michaelmwu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 0 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 80d9446 and 696ef7a.

📒 Files selected for processing (1)
  • apps/discord_bot/src/five08/discord_bot/cogs/crm.py
📝 Walkthrough

Walkthrough

Job-posting text is analyzed via an OpenAI pipeline to extract structured requirements, then PostgreSQL-backed candidate search ranks people by skills, seniority, membership, and location. Worker sync and DB schema are extended to surface skills, resumes, timezone, seniority, and LinkedIn. Discord bot exposes a /match_candidates command to run this flow.

Changes

Cohort / File(s) Summary
Discord Bot Cog
apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Adds CRMCog.match_candidates slash command: defers, calls extract_job_requirements, calls search_candidates, formats paginated candidate results and handles errors.
Job extraction (LLM)
packages/shared/src/five08/job_match.py
New module: JobRequirements dataclass and extract_job_requirements() that builds prompts, calls OpenAI, parses JSON response, normalizes skills/seniority/location.
Candidate search & ranking
packages/shared/src/five08/candidate_search.py
New module: CandidateMatch dataclass and search_candidates() which runs a single SQL query with CTEs/laterals to compute matched skills and scores, then post-processes and ranks results.
Shared audit / people cache
packages/shared/src/five08/audit.py
Extends PersonRecord with contact_type, is_member, address_country, address_city, timezone, seniority, linkedin, skills, skill_attrs, latest_resume_id/name; adds _normalize_text and updates upsert/select to include new fields.
Worker people sync
apps/worker/src/five08/worker/crm/people_sync.py
Expands CRM field selection and enriches person conversion: extracts resumes, linkedin, skills/skill_attrs, timezone, seniority, contact_type, is_member; adds helpers for coercion and field resolution.
DB migration
apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py
Adds columns (contact_type, is_member, address_country, address_city, timezone, seniority, linkedin, skills TEXT[], skill_attrs JSONB, latest_resume_id/name) and concurrent indices (skills GIN, is_member, seniority, address_country); provides upgrade/downgrade.
Tests
tests/unit/test_candidate_search.py, tests/unit/test_job_match.py
Adds unit tests for seniority scoring, search ranking behavior, DB row mapping, LLM parsing, prompt building, and error cases for job requirement extraction.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Discord as "Discord Bot"
    participant Job as "Job Match\n(OpenAI)"
    participant Search as "Candidate Search"
    participant DB as "PostgreSQL DB"

    User->>Discord: /match_candidates (posting_text)
    Discord->>Discord: defer interaction
    Discord->>Job: extract_job_requirements(posting_text, api_key)
    Job->>Job: build prompt & hints
    Job->>Job: call OpenAI API
    Job-->>Discord: JobRequirements (skills, seniority, location, timezones)
    alt extraction failed
        Discord-->>User: Error response
    else no required skills
        Discord-->>User: Inform no required skills found
    else success
        Discord->>Search: search_candidates(requirements, limit=10)
        Search->>DB: SQL with CTEs/lateral (skill match, scores, filters)
        DB-->>Search: candidate rows
        Search->>Search: compute seniority_score & assemble CandidateMatch list
        Search-->>Discord: list[CandidateMatch]
        Discord->>Discord: format paginated messages
        Discord-->>User: candidate result pages
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I nibbled through postings, smelled skills in the air,
I asked a wise model which talents were there,
I hopped to the database, matched resumes with care,
Sent results in pages — hooray, hires to prepare! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add job match extraction and ranked candidate search' clearly and specifically describes the two main features added: job match extraction and ranked candidate search functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch michaelmwu/job-match-cmd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (1)
apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py (1)

54-70: Use concurrent index operations to avoid write-blocking locks.

Index creation on a large people table can block writes during migration. Use postgresql_concurrently=True with autocommit_block() in both upgrade and downgrade.

Recommended migration pattern
     op.add_column("people", sa.Column("latest_resume_id", sa.Text(), nullable=True))
     op.add_column("people", sa.Column("latest_resume_name", sa.Text(), nullable=True))
 
+    with op.get_context().autocommit_block():
-    op.create_index(
+        op.create_index(
             "idx_people_skills",
             "people",
             ["skills"],
             postgresql_using="gin",
+            postgresql_concurrently=True,
         )
-    op.create_index("idx_people_is_member", "people", ["is_member"])
-    op.create_index("idx_people_seniority", "people", ["seniority"])
-    op.create_index("idx_people_address_country", "people", ["address_country"])
+        op.create_index(
+            "idx_people_is_member",
+            "people",
+            ["is_member"],
+            postgresql_concurrently=True,
+        )
+        op.create_index(
+            "idx_people_seniority",
+            "people",
+            ["seniority"],
+            postgresql_concurrently=True,
+        )
+        op.create_index(
+            "idx_people_address_country",
+            "people",
+            ["address_country"],
+            postgresql_concurrently=True,
+        )
 
 
 def downgrade() -> None:
     """Remove job-match fields from the people table."""
+    with op.get_context().autocommit_block():
-    op.drop_index("idx_people_address_country", table_name="people")
-    op.drop_index("idx_people_seniority", table_name="people")
-    op.drop_index("idx_people_is_member", table_name="people")
-    op.drop_index("idx_people_skills", table_name="people")
+        op.drop_index(
+            "idx_people_address_country",
+            table_name="people",
+            postgresql_concurrently=True,
+        )
+        op.drop_index(
+            "idx_people_seniority",
+            table_name="people",
+            postgresql_concurrently=True,
+        )
+        op.drop_index(
+            "idx_people_is_member",
+            table_name="people",
+            postgresql_concurrently=True,
+        )
+        op.drop_index(
+            "idx_people_skills",
+            table_name="people",
+            postgresql_concurrently=True,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py`
around lines 54 - 70, The migration uses op.create_index and op.drop_index in
upgrade() and downgrade() and must perform them without write-blocking locks;
modify both upgrade() and downgrade() to run index operations inside the
migration's autocommit_block() context and call op.create_index(...,
postgresql_concurrently=True) for each index and op.drop_index(...,
postgresql_concurrently=True) for each drop so Postgres creates/drops indexes
concurrently on the people table; ensure you still keep the same index names
("idx_people_skills", "idx_people_is_member", "idx_people_seniority",
"idx_people_address_country") and the current column lists when applying these
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 6211-6328: The match_candidates command (function
match_candidates) currently returns on several early/error paths without calling
the CRM audit helper; call the audit helper _audit_command on every exit: on
parse failure (in the RuntimeError except), on no required_skills early return,
on candidate search exception, and on successful completion after sending
messages; include context such as interaction.user/id, the posting summary or
id, the outcome (error message or success) and metadata like len(candidates) and
requirements (e.g., required_skills, title) so the worker audit ingest receives
a consistent record for use by apps/worker/src/five08/backend/api.py.

In `@apps/worker/src/five08/worker/crm/people_sync.py`:
- Around line 344-353: The loop in people_sync.py appends raw skill keys to
skills but stores strengths under raw.casefold() in attrs, causing lookups like
p.skill_attrs ->> s to miss values; make the key format consistent by either
storing strengths under the exact raw string or normalizing both collections the
same way — e.g., change the attrs assignment in the block inside the for-loop
(the code that calls attrs[raw.casefold()] = ...) so it uses the same key as
appended to skills (attrs[raw] = ...) or instead append raw.casefold() to
skills; ensure the chosen normalization (raw or raw.casefold()) is used for both
skills.append(...) and attrs[...] so subsequent ranking reads find the strength
values.

In `@packages/shared/src/five08/audit.py`:
- Around line 152-158: The VALUES clause in the upsert_person SQL has one fewer
%s placeholder than the 21 columns and parameters supplied; update the VALUES
string in the upsert_person function (the INSERT ... VALUES definition) to
include the missing %s so there are 21 placeholders matching the parameter tuple
used later, ensuring the parameter count aligns with the INSERT column list.

In `@packages/shared/src/five08/candidate_search.py`:
- Around line 188-192: The secondary sort currently uses key=lambda c: (not
c.is_member, -c.required_skill_score, -c.seniority_score) which omits the
primary/SQL match metrics and therefore lets seniority override stronger skill
matches; update the sort key in the results.sort call (the lambda referenced
above) to include the primary skill-match metrics (e.g., c.sql_score or both
c.required_skill_score and c.preferred_skill_score) before seniority so primary
ranking is preserved — for example include (-c.sql_score, not c.is_member,
-c.required_skill_score, -c.preferred_skill_score, -c.seniority_score) or
otherwise ensure the primary score fields appear earlier in the tuple so
seniority only breaks ties.

In `@packages/shared/src/five08/job_match.py`:
- Around line 29-39: The seniority mapping keys in _SENIORITY_KEYWORDS must
match the normalization performed by _regex_hints (no spaces/hyphens), so
replace space/hyphen variants like "entry level" and "entry-level" with their
normalized form "entrylevel" (and any other affected keys) so matched hints
aren't lost; update the other similar seniority mapping block (lines noted
around the second definition) the same way, ensuring keys like "entrylevel",
"midlevel", "senior", "staff", "principal" are present and map to the intended
canonical values.
- Around line 201-203: Update the logic that applies the US-only regex signal so
it always overrides any LLM-derived location value: replace the conditional that
only sets location_type when None with an unconditional assignment when
hints.get("us_only_detected") is truthy (i.e., if hints.get("us_only_detected"):
location_type = "us_only"). Ensure you modify the code handling location_type
(the block referencing hints.get("us_only_detected") and the variable
location_type) so the "us_only" signal cannot be overridden by an LLM-returned
value like "remote_any".
- Around line 192-200: Normalize and trim enum strings before validation: for
raw_seniority, if non-null call .strip() and .lower(), compare against a set of
lowercased entries derived from SENIORITY_ORDER, and if found assign the
canonical value from SENIORITY_ORDER (preserve original canonical casing) to
seniority; for raw_location_type call .strip() and .lower() then compare to the
allowed lowercase values ("us_only","timezone_preferred","remote_any") and
assign the canonical lowercase string to location_type; ensure you handle None
safely and avoid dropping values that differ only by case or trailing/leading
whitespace.
- Line 177: The code currently sets raw_content via
response.choices[0].message.content without validating the response shape;
update the logic around raw_content in job_match.py (where raw_content is
assigned) to defensively verify response.choices exists and is non-empty, that
response.choices[0].message is present, and that message.content is not
None/empty before using it—if any of these are missing, raise or log a clear,
descriptive error (including the actual response payload) instead of silently
defaulting to "", so downstream JSON parsing failures surface the real
API-structure issue.

---

Nitpick comments:
In
`@apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py`:
- Around line 54-70: The migration uses op.create_index and op.drop_index in
upgrade() and downgrade() and must perform them without write-blocking locks;
modify both upgrade() and downgrade() to run index operations inside the
migration's autocommit_block() context and call op.create_index(...,
postgresql_concurrently=True) for each index and op.drop_index(...,
postgresql_concurrently=True) for each drop so Postgres creates/drops indexes
concurrently on the people table; ensure you still keep the same index names
("idx_people_skills", "idx_people_is_member", "idx_people_seniority",
"idx_people_address_country") and the current column lists when applying these
changes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1814db and ba0c981.

📒 Files selected for processing (6)
  • apps/discord_bot/src/five08/discord_bot/cogs/crm.py
  • apps/worker/src/five08/worker/crm/people_sync.py
  • apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py
  • packages/shared/src/five08/audit.py
  • packages/shared/src/five08/candidate_search.py
  • packages/shared/src/five08/job_match.py

Comment thread apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Comment thread apps/worker/src/five08/worker/crm/people_sync.py Outdated
Comment thread packages/shared/src/five08/audit.py
Comment thread packages/shared/src/five08/candidate_search.py
Comment thread packages/shared/src/five08/candidate_search.py Outdated
Comment thread packages/shared/src/five08/job_match.py
Comment thread packages/shared/src/five08/job_match.py Outdated
Comment thread packages/shared/src/five08/job_match.py
Comment thread packages/shared/src/five08/job_match.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an end-to-end job matching pipeline: extract structured requirements from a pasted job posting, search/rank candidates from the people cache in Postgres, and expose the workflow via a Discord slash command.

Changes:

  • Introduces OpenAI-backed job requirement extraction (job_match.py) with regex hinting + coercion into a normalized JobRequirements.
  • Adds a ranked Postgres candidate search (candidate_search.py) and extends the people cache schema + CRM sync to store matchable profile/skill fields.
  • Adds a Discord /match-candidates command to run extraction + search and render top candidates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
packages/shared/src/five08/job_match.py New job posting → normalized requirements extraction (LLM + regex hints).
packages/shared/src/five08/candidate_search.py New SQL-backed candidate scoring query + Python post-processing.
packages/shared/src/five08/audit.py Extends PersonRecord and people-cache upsert to persist new profile/skill fields.
apps/worker/src/five08/worker/migrations/versions/20260304_0100_add_job_match_fields_to_people.py Adds new people columns + indexes for skills/member/location filtering.
apps/worker/src/five08/worker/crm/people_sync.py Extends CRM → people-cache sync to populate new fields (skills, seniority, resume IDs, etc.).
apps/discord_bot/src/five08/discord_bot/cogs/crm.py Adds /match-candidates command to run extraction and render ranked results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6226 to +6230
requirements = await asyncio.get_event_loop().run_in_executor(
None,
lambda: extract_job_requirements(
posting,
api_key=settings.openai_api_key,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within an async command handler, asyncio.get_event_loop().run_in_executor(...) is deprecated-style and inconsistent with the rest of this cog (which uses asyncio.to_thread(...) for blocking work). Switching to asyncio.to_thread(...) avoids event-loop selection issues and matches existing patterns in this file.

Copilot uses AI. Check for mistakes.

seniority_match = _SENIORITY_RE.search(text)
if seniority_match:
raw = seniority_match.group(1).lower().replace("-", "").replace(" ", "")
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex hint normalization removes spaces/hyphens (e.g. "entry level" -> "entrylevel"), but _SENIORITY_KEYWORDS is keyed by "entry level"/"entry-level" and does not include "entrylevel". As a result, entry-level postings won't set seniority_hint. Consider normalizing the map keys the same way or normalizing after lookup.

Suggested change
raw = seniority_match.group(1).lower().replace("-", "").replace(" ", "")
raw = seniority_match.group(1).lower()

Copilot uses AI. Check for mistakes.
Comment on lines +6211 to +6215
@app_commands.command(
name="match-candidates",
description="Analyze a job posting and return ranked matching candidates from the people cache.",
)
@app_commands.describe(posting="Paste the full job posting text here")
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the command "replies to thread", but this implementation posts followups to whatever channel the command is invoked in and doesn’t enforce/ensure a thread context. If the thread requirement is important, consider checking interaction.channel type and creating a thread (or responding ephemerally with instructions) when invoked outside a thread.

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +134
"""Extract structured job requirements from a posting using OpenAI.

Raises RuntimeError if OpenAI is not configured (also logs to webhook).
Uses regex pre-scan as cheap hints injected into the LLM prompt.
"""
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new extraction pipeline has multiple edge-cases (LLM fenced JSON, invalid JSON, regex-vs-LLM disagreements, seniority/location coercion) but there are no unit tests covering them. Adding focused unit tests for extract_job_requirements will help keep /match-candidates stable as prompts/models change.

Copilot uses AI. Check for mistakes.
Comment thread packages/shared/src/five08/audit.py Outdated
%s, %s, %s, %s, %s,
%s, %s, %s, %s, %s,
%s, %s, %s, %s, %s,
%s, %s, %s, %s,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The INSERT column list includes 21 columns (ending with sync_status) but the VALUES placeholder list only has 20 %s entries, so this query will fail at runtime with a bind/parameter count error. Add the missing placeholder (or remove a column) so placeholders and parameters match exactly.

Suggested change
%s, %s, %s, %s,
%s, %s, %s, %s, %s,

Copilot uses AI. Check for mistakes.

Ranking priority:
1. Members before prospects.
2. Location match (hard filter when us_only, else soft signal).
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ranking description says location match is a soft signal when not us_only, but the SQL only applies a hard filter when us_only and otherwise doesn’t score location/timezone at all. Either implement the stated soft location scoring (e.g. ORDER BY terms for country/timezone match) or update the docstring to match the actual behavior.

Suggested change
2. Location match (hard filter when us_only, else soft signal).
2. US-only location restriction when enabled (hard filter); otherwise,
location/timezone are not used in ranking.

Copilot uses AI. Check for mistakes.
Comment thread packages/shared/src/five08/job_match.py Outdated
Comment on lines +17 to +18
# Regex hints — used to pre-scan the posting and inform the LLM prompt.
# These are NOT a standalone fallback; they're cheap signals the LLM can use.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says these regex hints are "NOT a standalone fallback", but later the code forces location_type = "us_only" when the regex matched and the LLM returned null. Either update the comment to reflect the actual fallback behavior or remove the fallback if it's not intended.

Suggested change
# Regex hints — used to pre-scan the posting and inform the LLM prompt.
# These are NOT a standalone fallback; they're cheap signals the LLM can use.
# Regex hints — used to pre-scan the posting and inform the LLM prompt, and
# in limited cases may be used as a lightweight fallback (e.g., defaulting
# location_type to "us_only" when the LLM output is null but regex matches).

Copilot uses AI. Check for mistakes.
Comment on lines +343 to +353
if not skills and isinstance(raw_skill_attrs, dict):
for key, payload in raw_skill_attrs.items():
raw = _text_or_none(key)
if raw and isinstance(payload, dict):
skills.append(raw)
value = payload.get("strength")
parsed = self._coerce_int(value)
if parsed is not None:
attrs[raw.casefold()] = max(
attrs.get(raw.casefold(), 0), parsed
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_skill_payload(...) already normalizes skills/attrs from both skills and dict-like cSkillAttrs. This fallback path re-adds skills from cSkillAttrs without normalization, which can introduce non-canonical skill strings into people.skills and reduce match accuracy. Consider removing this fallback or normalizing raw before appending/using it as a key.

Suggested change
if not skills and isinstance(raw_skill_attrs, dict):
for key, payload in raw_skill_attrs.items():
raw = _text_or_none(key)
if raw and isinstance(payload, dict):
skills.append(raw)
value = payload.get("strength")
parsed = self._coerce_int(value)
if parsed is not None:
attrs[raw.casefold()] = max(
attrs.get(raw.casefold(), 0), parsed
)

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +191
# Secondary sort: among equal SQL scores, rank by seniority alignment.
# This is stable (preserves SQL order within same seniority bucket).
results.sort(
key=lambda c: (not c.is_member, -c.required_skill_score, -c.seniority_score)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final Python results.sort(...) can reorder candidates across different required_matched / preferred_matched buckets because the sort key does not include those fields (and can even override the SQL ORDER BY priority). If the intent is only to break ties by seniority, restrict the sort to seniority (stable) or include the matched-count metrics (e.g. len(matched_required_skills), len(matched_preferred_skills)) in the key so it preserves the documented ranking order.

Suggested change
# Secondary sort: among equal SQL scores, rank by seniority alignment.
# This is stable (preserves SQL order within same seniority bucket).
results.sort(
key=lambda c: (not c.is_member, -c.required_skill_score, -c.seniority_score)
# Final in-Python ranking:
# 1. Members before prospects.
# 2. Required skill count matched (descending).
# 3. Required skill strength score (descending).
# 4. Preferred skill count matched (descending).
# 5. Seniority score (descending).
results.sort(
key=lambda c: (
not c.is_member,
-len(c.matched_required_skills),
-c.required_skill_score,
-len(c.matched_preferred_skills),
-c.seniority_score,
)

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +64
def search_candidates(
settings: SharedSettings,
requirements: JobRequirements,
*,
limit: int = 10,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ranking/query logic is subtle (SQL scoring + Python post-sort + us_only filtering) but there are no unit tests ensuring the documented ranking priority is preserved and filtering behaves as expected. Adding a small test matrix for search_candidates would prevent regressions as the query evolves.

Copilot uses AI. Check for mistakes.
- audit.py: fix VALUES placeholder count (20 → 21 to match 21 columns)
- people_sync.py: store skill strength attrs under raw key (not casefold) to match skills array keys for DB lookup
- job_match.py: fix seniority hint lookup (use normalized keys without spaces/hyphens); add defensive check for empty/missing response.choices content; normalize enum strings before validation (strip/lower); regex US-only detection always overrides LLM location value
- candidate_search.py: add timezone_matched scoring in SQL ORDER BY when preferred_timezones provided; fix secondary Python sort to preserve primary ranking metrics before seniority
- crm.py: add _audit_command calls on all exit paths (extract error, no skills, search error, success)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…omments

- Replace asyncio.get_event_loop().run_in_executor() with asyncio.to_thread()
  in match_candidates command handler
- Fix candidate_search.py docstring: clarify US-only is a hard filter
  and timezone is a soft signal
- Fix job_match.py module comment: clarify regex hints as LLM context
  with US-only as override guard (not standalone fallback)
- Update match_candidates command description to mention public reply
- Add tests/unit/test_job_match.py: 23 tests covering _regex_hints,
  _parse_llm_response, _coerce_str_list, extract_job_requirements,
  and _build_prompt
- Add tests/unit/test_candidate_search.py: 14 tests covering
  _seniority_score edge cases, early-return on empty required_skills,
  row-to-CandidateMatch mapping, secondary sort order, and empty DB

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Use postgresql_concurrently=True inside autocommit_block() for all
create_index and drop_index calls in the job-match fields migration,
preventing table-level write locks on the people table during deploy.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Remove the posting parameter. The command now:
- Rejects invocations outside a thread with an ephemeral error.
- Reads the thread's starter_message as the job posting text.
- Falls back to fetching the parent message by thread ID if
  starter_message is not cached (TextChannel threads only).
- Returns an ephemeral error if the opener is missing or empty.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Forum post IDs equal their thread IDs, so thread.fetch_message(thread.id)
works for both text-channel threads and forum posts without needing to
go through the parent channel.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Thread tags (e.g. "Python", "Remote", "Senior") are prepended as
"Thread tags: ..." before the starter message content so the LLM
naturally incorporates them into skill/seniority/location extraction.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants