Skip to content

Fix resume contact inference and LinkedIn field handling#129

Closed
michaelmwu wants to merge 2 commits into
mainfrom
michaelmwu/fix-resume-contact
Closed

Fix resume contact inference and LinkedIn field handling#129
michaelmwu wants to merge 2 commits into
mainfrom
michaelmwu/fix-resume-contact

Conversation

@michaelmwu
Copy link
Copy Markdown
Member

@michaelmwu michaelmwu commented Mar 3, 2026

Description

  • Resume-based contact creation now shows the backend error text + status directly in the Discord failure message.
  • LinkedIn field usage is now centralized via _configured_linkedin_field and used for search/create/update flows.
  • Resume inference failure for no-match now includes parsed name/email so users can verify the candidate identity before creating.
  • LinkedIn update embeds now read from the same configured field used in update payloads.

Related Issue

  • N/A

How Has This Been Tested?

  • uv run pytest tests/unit/test_crm.py -k "search_contacts_by_field_uses_configured_linkedin_field or build_resume_parsed_identity_summary_includes_name_and_email or upload_resume_no_matching_inferred_contact_shows_name_and_email or update_contact_uses_configured_linkedin_field or test_resume_create_contact_view_logs_create_failure or test_upload_resume_link_user_shows_confirm_then_creates_contact"
  • uv run pytest tests/unit/test_resume_extractor.py -k "split_name"

Summary by CodeRabbit

Release Notes

  • New Features

    • Configurable LinkedIn field handling for contact management
    • Separate first and last name extraction from resumes with intelligent parsing capabilities
    • Enhanced resume parsing with improved identity information and summary display
  • Improvements

    • More robust error messaging during resume processing with additional contextual details
    • Consistent and deduplicated contact field handling across search and update workflows

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR centralizes LinkedIn field configuration, implements structured name parsing with LLM and heuristic fallbacks, and refactors the CRM payload construction to consistently populate first and last names from extracted resume data. The resume extractor now exports split_name() for name decomposition, and the CRM cog applies these extracted names across multiple payload-building paths.

Changes

Cohort / File(s) Summary
Resume Extraction Name Parsing
packages/shared/src/five08/resume_extractor.py
Added split_name() public method with LLM and heuristic name-splitting logic. Extended ResumeExtractedProfile with first_name and last_name fields. Updated extraction flows to populate both fields alongside name via new splitting helpers.
CRM LinkedIn Field Configurability
apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Introduced _configured_linkedin_field() staticmethod to centralize LinkedIn field name (default cLinkedInUrl). Replaced hardcoded LinkedIn field references across multiple call sites including payload builders, contact search, and update paths.
CRM Name Field Population
apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Added _populate_name_fields() helper to extract and set firstName and lastName from a source name. Integrated into resume-based payload builders (_build_resume_create_contact_payload, _build_contact_payload_for_link_user, _infer_contact_from_resume) and various resume workflows.
CRM Resume Processing Enhancements
apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Added _build_resume_parsed_identity_summary() to generate user-friendly summaries from resume data. Enhanced error messaging with detailed error and status information. Refactored _search_contacts_by_field() with dynamic field selection and deduplication logic.
Test Coverage
tests/unit/test_crm.py, tests/unit/test_resume_extractor.py
Added tests validating configured LinkedIn field usage, first/last name population across payloads, resume identity summary formatting, and split_name behavior with LLM fallback and single-token name handling.

Sequence Diagram(s)

sequenceDiagram
    participant Resume as Resume File
    participant Extractor as Resume Extractor
    participant LLM as LLM Service
    participant Heuristic as Heuristic Parser
    participant CRM as CRM Cog

    Resume->>Extractor: extract(file_content)
    Extractor->>Extractor: _build_prompt() & call LLM
    Extractor->>LLM: Parse name, firstName, lastName
    LLM-->>Extractor: Parsed identity + names
    
    alt LLM Success
        Extractor->>Extractor: split_name(full_name, hints from LLM)
        Extractor->>LLM: _split_name_with_llm(full_name)
        LLM-->>Extractor: firstName, lastName
    else LLM Failure
        Extractor->>Heuristic: _split_name_heuristically(full_name)
        Heuristic-->>Extractor: firstName, lastName (with prefix/suffix handling)
    end
    
    Extractor-->>CRM: ResumeExtractedProfile(name, first_name, last_name, ...)
    CRM->>CRM: _populate_name_fields(payload, source_name)
    CRM->>CRM: _build_resume_parsed_identity_summary(file_content)
    CRM-->>CRM: Payload with firstName, lastName, LinkedIn field configured
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Names split swift as carrot sticks,
LinkedIn fields now dance like tricks,
First meets Last in PayloadLand—
Resume magic, paws at hand! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes in the PR: fixing resume contact inference logic and making LinkedIn field handling configurable across multiple code paths.
Docstring Coverage ✅ Passed Docstring coverage is 82.61% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch michaelmwu/fix-resume-contact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Discord CRM resume workflow by (1) enriching resume-derived identity details (first/last name + parsed identity summary) and (2) centralizing LinkedIn custom-field selection so search/create/update flows consistently use the same CRM field.

Changes:

  • Add name splitting (LLM + heuristic fallback) and propagate firstName/lastName into resume extraction results and CRM create payloads.
  • Centralize LinkedIn field selection via _configured_linkedin_field() and use it across search/create/update + update embed rendering.
  • Improve Discord UX on failures: include backend error + status on contact-create failures, and include parsed name/email on “no matching contact” inference.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
packages/shared/src/five08/resume_extractor.py Adds first/last name fields and new split_name() logic (LLM + heuristic) and updates the LLM prompt/parse pipeline.
apps/discord_bot/src/five08/discord_bot/cogs/crm.py Uses configured LinkedIn field for search/create/update, populates firstName/lastName in create payloads, and enhances user-facing error/identity messages.
tests/unit/test_crm.py Adds/updates unit coverage for configured LinkedIn field usage, name-field propagation, improved failure messages, and parsed identity summary behavior.
tests/unit/test_resume_extractor.py Adds unit coverage for the new name-splitting behavior (LLM preference + heuristic fallback + single-token handling).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1001 to +1008
inferred = None
if self.client is not None:
try:
inferred = self._split_name_with_llm(normalized_full_name)
except Exception:
inferred = None
if inferred is None:
inferred = self._split_name_heuristically(normalized_full_name)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_name() will attempt an LLM call whenever self.client is configured. In extract(), you already performed an LLM completion immediately before calling split_name(), so this can result in a second OpenAI request if firstName/lastName aren’t returned. Consider defaulting to heuristic splitting in this situation (or adding a flag to disable the extra LLM call) to reduce latency/cost and avoid doubling failure modes.

Suggested change
inferred = None
if self.client is not None:
try:
inferred = self._split_name_with_llm(normalized_full_name)
except Exception:
inferred = None
if inferred is None:
inferred = self._split_name_heuristically(normalized_full_name)
inferred: tuple[str, str] | None = None
# Prefer heuristic splitting first to avoid unnecessary LLM calls.
inferred = self._split_name_heuristically(normalized_full_name)
# If the heuristic could not split the name, fall back to the LLM when available.
if inferred is None and self.client is not None:
try:
inferred = self._split_name_with_llm(normalized_full_name)
except Exception:
inferred = None

Copilot uses AI. Check for mistakes.
Comment on lines +1058 to +1059
return split_first or full_name, split_last or SINGLE_NAME_FALLBACK_LAST_NAME

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the name-split LLM returns only one of firstName/lastName, _split_name_with_llm() falls back to full_name as the missing part. This can produce a firstName containing spaces (e.g., "Ada Lovelace") or duplicate data across fields. Consider falling back to the heuristic splitter to fill missing parts (or at least using the first/last token) when the model returns incomplete output.

Suggested change
return split_first or full_name, split_last or SINGLE_NAME_FALLBACK_LAST_NAME
# If the model returned only one of first/last, use the heuristic splitter
# to fill in the missing part based on the full name, avoiding duplicate
# or multi-token first names.
if not split_first or not split_last:
heuristic_first, heuristic_last = self._split_name_heuristically(full_name)
split_first = split_first or heuristic_first
split_last = split_last or heuristic_last
return split_first, split_last

Copilot uses AI. Check for mistakes.
@staticmethod
def _configured_linkedin_field() -> str:
"""Return the configured field for LinkedIn profile values."""
return str(getattr(settings, "crm_linkedin_field", "cLinkedInUrl"))
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_configured_linkedin_field() reads settings.crm_linkedin_field, but the bot Settings model doesn’t declare this field (so Pydantic Settings won’t load it from env/config and it will always fall back). Also, if the attribute were ever present but None, str(None) would yield the literal string "None". Consider adding an explicit crm_linkedin_field: str = "cLinkedInUrl" to the Settings schema (and validating/stripping it here, falling back when empty).

Suggested change
return str(getattr(settings, "crm_linkedin_field", "cLinkedInUrl"))
default_field = "cLinkedInUrl"
raw_value = getattr(settings, "crm_linkedin_field", None)
if not isinstance(raw_value, str):
return default_field
value = raw_value.strip()
return value or default_field

Copilot uses AI. Check for mistakes.
Comment on lines +3309 to +3312

return (
f"\nParsed contact details: name=`{parsed_name}`, email=`{primary_email}`"
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsed name/email are interpolated directly into inline-code backticks. If the extracted name/email contains a backtick, it can break formatting and potentially hide/alter surrounding text. Consider sanitizing (e.g., replace/backslash-escape backticks and truncate to a safe length) before embedding user/LLM-derived values into Discord messages.

Copilot uses AI. Check for mistakes.
Comment on lines 1407 to 1410
await interaction.followup.send(
"⚠️ Could not create a contact from this resume. "
f"⚠️ Could not create a contact from this resume: `{error_detail}`{status_note}. "
"Please provide `search_term` or `link_user`.",
ephemeral=True,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-facing failure message embeds error_detail = str(exc) verbatim. For EspoAPIError raised by the shared client, this may include boilerplate (e.g., "Wrong request, status code is …") and can duplicate the separately-appended status note, making the message noisy. Consider special-casing EspoAPIError to extract/display just the server reason (and escape/truncate any backticks/newlines) while keeping the full exception details only in logs/audit metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +3326 to +3330
first_name, last_name = self.resume_extractor.split_name(
full_name=source_name,
first_name_hint=str(payload.get("firstName", "")).strip() or None,
last_name_hint=str(payload.get("lastName", "")).strip() or None,
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_populate_name_fields() calls resume_extractor.split_name(), which will invoke an additional LLM call whenever the OpenAI client is configured. In the resume-upload path you already may have done an LLM extraction, so this can introduce an extra API request (latency/cost/failure surface) just to split the name. Consider using heuristic-only splitting here, or plumbing through extracted first_name/last_name hints from the resume profile to avoid triggering another model call.

Copilot uses AI. Check for mistakes.
@michaelmwu michaelmwu deleted the michaelmwu/fix-resume-contact branch March 3, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants