Fix resume contact inference and LinkedIn field handling#129
Fix resume contact inference and LinkedIn field handling#129michaelmwu wants to merge 2 commits into
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR centralizes LinkedIn field configuration, implements structured name parsing with LLM and heuristic fallbacks, and refactors the CRM payload construction to consistently populate first and last names from extracted resume data. The resume extractor now exports Changes
Sequence Diagram(s)sequenceDiagram
participant Resume as Resume File
participant Extractor as Resume Extractor
participant LLM as LLM Service
participant Heuristic as Heuristic Parser
participant CRM as CRM Cog
Resume->>Extractor: extract(file_content)
Extractor->>Extractor: _build_prompt() & call LLM
Extractor->>LLM: Parse name, firstName, lastName
LLM-->>Extractor: Parsed identity + names
alt LLM Success
Extractor->>Extractor: split_name(full_name, hints from LLM)
Extractor->>LLM: _split_name_with_llm(full_name)
LLM-->>Extractor: firstName, lastName
else LLM Failure
Extractor->>Heuristic: _split_name_heuristically(full_name)
Heuristic-->>Extractor: firstName, lastName (with prefix/suffix handling)
end
Extractor-->>CRM: ResumeExtractedProfile(name, first_name, last_name, ...)
CRM->>CRM: _populate_name_fields(payload, source_name)
CRM->>CRM: _build_resume_parsed_identity_summary(file_content)
CRM-->>CRM: Payload with firstName, lastName, LinkedIn field configured
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves the Discord CRM resume workflow by (1) enriching resume-derived identity details (first/last name + parsed identity summary) and (2) centralizing LinkedIn custom-field selection so search/create/update flows consistently use the same CRM field.
Changes:
- Add name splitting (LLM + heuristic fallback) and propagate
firstName/lastNameinto resume extraction results and CRM create payloads. - Centralize LinkedIn field selection via
_configured_linkedin_field()and use it across search/create/update + update embed rendering. - Improve Discord UX on failures: include backend error + status on contact-create failures, and include parsed name/email on “no matching contact” inference.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
packages/shared/src/five08/resume_extractor.py |
Adds first/last name fields and new split_name() logic (LLM + heuristic) and updates the LLM prompt/parse pipeline. |
apps/discord_bot/src/five08/discord_bot/cogs/crm.py |
Uses configured LinkedIn field for search/create/update, populates firstName/lastName in create payloads, and enhances user-facing error/identity messages. |
tests/unit/test_crm.py |
Adds/updates unit coverage for configured LinkedIn field usage, name-field propagation, improved failure messages, and parsed identity summary behavior. |
tests/unit/test_resume_extractor.py |
Adds unit coverage for the new name-splitting behavior (LLM preference + heuristic fallback + single-token handling). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inferred = None | ||
| if self.client is not None: | ||
| try: | ||
| inferred = self._split_name_with_llm(normalized_full_name) | ||
| except Exception: | ||
| inferred = None | ||
| if inferred is None: | ||
| inferred = self._split_name_heuristically(normalized_full_name) |
There was a problem hiding this comment.
split_name() will attempt an LLM call whenever self.client is configured. In extract(), you already performed an LLM completion immediately before calling split_name(), so this can result in a second OpenAI request if firstName/lastName aren’t returned. Consider defaulting to heuristic splitting in this situation (or adding a flag to disable the extra LLM call) to reduce latency/cost and avoid doubling failure modes.
| inferred = None | |
| if self.client is not None: | |
| try: | |
| inferred = self._split_name_with_llm(normalized_full_name) | |
| except Exception: | |
| inferred = None | |
| if inferred is None: | |
| inferred = self._split_name_heuristically(normalized_full_name) | |
| inferred: tuple[str, str] | None = None | |
| # Prefer heuristic splitting first to avoid unnecessary LLM calls. | |
| inferred = self._split_name_heuristically(normalized_full_name) | |
| # If the heuristic could not split the name, fall back to the LLM when available. | |
| if inferred is None and self.client is not None: | |
| try: | |
| inferred = self._split_name_with_llm(normalized_full_name) | |
| except Exception: | |
| inferred = None |
| return split_first or full_name, split_last or SINGLE_NAME_FALLBACK_LAST_NAME | ||
|
|
There was a problem hiding this comment.
When the name-split LLM returns only one of firstName/lastName, _split_name_with_llm() falls back to full_name as the missing part. This can produce a firstName containing spaces (e.g., "Ada Lovelace") or duplicate data across fields. Consider falling back to the heuristic splitter to fill missing parts (or at least using the first/last token) when the model returns incomplete output.
| return split_first or full_name, split_last or SINGLE_NAME_FALLBACK_LAST_NAME | |
| # If the model returned only one of first/last, use the heuristic splitter | |
| # to fill in the missing part based on the full name, avoiding duplicate | |
| # or multi-token first names. | |
| if not split_first or not split_last: | |
| heuristic_first, heuristic_last = self._split_name_heuristically(full_name) | |
| split_first = split_first or heuristic_first | |
| split_last = split_last or heuristic_last | |
| return split_first, split_last |
| @staticmethod | ||
| def _configured_linkedin_field() -> str: | ||
| """Return the configured field for LinkedIn profile values.""" | ||
| return str(getattr(settings, "crm_linkedin_field", "cLinkedInUrl")) |
There was a problem hiding this comment.
_configured_linkedin_field() reads settings.crm_linkedin_field, but the bot Settings model doesn’t declare this field (so Pydantic Settings won’t load it from env/config and it will always fall back). Also, if the attribute were ever present but None, str(None) would yield the literal string "None". Consider adding an explicit crm_linkedin_field: str = "cLinkedInUrl" to the Settings schema (and validating/stripping it here, falling back when empty).
| return str(getattr(settings, "crm_linkedin_field", "cLinkedInUrl")) | |
| default_field = "cLinkedInUrl" | |
| raw_value = getattr(settings, "crm_linkedin_field", None) | |
| if not isinstance(raw_value, str): | |
| return default_field | |
| value = raw_value.strip() | |
| return value or default_field |
|
|
||
| return ( | ||
| f"\nParsed contact details: name=`{parsed_name}`, email=`{primary_email}`" | ||
| ) |
There was a problem hiding this comment.
The parsed name/email are interpolated directly into inline-code backticks. If the extracted name/email contains a backtick, it can break formatting and potentially hide/alter surrounding text. Consider sanitizing (e.g., replace/backslash-escape backticks and truncate to a safe length) before embedding user/LLM-derived values into Discord messages.
| await interaction.followup.send( | ||
| "⚠️ Could not create a contact from this resume. " | ||
| f"⚠️ Could not create a contact from this resume: `{error_detail}`{status_note}. " | ||
| "Please provide `search_term` or `link_user`.", | ||
| ephemeral=True, |
There was a problem hiding this comment.
The user-facing failure message embeds error_detail = str(exc) verbatim. For EspoAPIError raised by the shared client, this may include boilerplate (e.g., "Wrong request, status code is …") and can duplicate the separately-appended status note, making the message noisy. Consider special-casing EspoAPIError to extract/display just the server reason (and escape/truncate any backticks/newlines) while keeping the full exception details only in logs/audit metadata.
| first_name, last_name = self.resume_extractor.split_name( | ||
| full_name=source_name, | ||
| first_name_hint=str(payload.get("firstName", "")).strip() or None, | ||
| last_name_hint=str(payload.get("lastName", "")).strip() or None, | ||
| ) |
There was a problem hiding this comment.
_populate_name_fields() calls resume_extractor.split_name(), which will invoke an additional LLM call whenever the OpenAI client is configured. In the resume-upload path you already may have done an LLM extraction, so this can introduce an extra API request (latency/cost/failure surface) just to split the name. Consider using heuristic-only splitting here, or plumbing through extracted first_name/last_name hints from the resume profile to avoid triggering another model call.
Description
_configured_linkedin_fieldand used for search/create/update flows.Related Issue
How Has This Been Tested?
uv run pytest tests/unit/test_crm.py -k "search_contacts_by_field_uses_configured_linkedin_field or build_resume_parsed_identity_summary_includes_name_and_email or upload_resume_no_matching_inferred_contact_shows_name_and_email or update_contact_uses_configured_linkedin_field or test_resume_create_contact_view_logs_create_failure or test_upload_resume_link_user_shows_confirm_then_creates_contact"uv run pytest tests/unit/test_resume_extractor.py -k "split_name"Summary by CodeRabbit
Release Notes
New Features
Improvements