Improve resume parsing with website reparsing#219
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded a website reparse/confirmation path to the resume update flow: UI controls to confirm inferred sites, pass confirmed websites into extraction, surface external-source enrichments and existing websites in previews, persist enrichment metadata, and add extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User as Discord User
participant UI as ResumeUpdateConfirmationView
participant Bot as CRMCog
participant Processor as ResumeProfileProcessor
participant Fetcher as External Fetcher
User->>UI: edit websites / click "Reparse With Websites"
UI->>Bot: send confirmed_personal_websites + link payload
Bot->>Processor: extract_profile_proposal(confirmed_personal_websites, link_discord_payload)
Processor->>Fetcher: resolve & fetch candidate URLs (SSRF-safe)
Fetcher-->>Processor: text + enrichment status
Processor-->>Bot: ResumeExtractionResult (source_enrichments, existing_websites)
Bot->>UI: update preview message/embed (External Sources + Website Reparse)
UI->>User: show updated confirmation view
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR enhances the resume parsing pipeline by enriching extractions with public GitHub/website content and wiring a Discord “Reparse With Websites” flow so operators can confirm/trigger website-based reparsing before applying CRM updates.
Changes:
- Add external-source enrichment (CRM websites + GitHub, plus inferred sources with confirmation gating) to
ResumeProfileProcessor. - Extend extraction results with
source_enrichmentsandexisting_websitesand surface them in Discord preview embeds. - Add Discord UI support to confirm/reparse with websites, including candidate detection after website edits.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
packages/shared/src/five08/resume_profile_processor.py |
Fetches/validates external website/GitHub text (curl-cffi) and conditionally re-runs extraction; records enrichment attempts. |
packages/shared/src/five08/resume_processing_models.py |
Adds ResumeSourceEnrichment and new fields on ResumeExtractionResult to carry enrichment + baseline website context. |
apps/discord_bot/src/five08/discord_bot/cogs/crm.py |
Adds “Reparse With Websites” button logic, website reparse candidate tracking, and embed fields for external sources/reparse prompts. |
tests/unit/test_resume_profile_processor.py |
Adds unit tests covering enrichment usage + rerun behavior for inferred GitHub and confirmed websites. |
tests/unit/test_crm.py |
Adds unit tests covering new Discord UI behaviors and embed fields for enrichments/reparse prompts. |
packages/shared/pyproject.toml |
Adds curl-cffi dependency for external fetching. |
uv.lock |
Locks curl-cffi and updates shared package dependency set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if isinstance(preview_message, discord.Message): | ||
| view.preview_message = preview_message |
There was a problem hiding this comment.
interaction.followup.send(..., wait=True) can return a webhook message type (still editable) rather than discord.Message. The isinstance(preview_message, discord.Message) guard can prevent view.preview_message from being set, so later _sync_message_view() calls won’t update the UI after edits (e.g., updating website reparse candidates). Consider storing the returned message unconditionally (or widening the accepted message type to include webhook messages) since other views in this file already do so.
| if isinstance(preview_message, discord.Message): | |
| view.preview_message = preview_message | |
| view.preview_message = preview_message |
| if isinstance(preview_message, discord.Message): | ||
| view.preview_message = preview_message |
There was a problem hiding this comment.
Same issue as above: the isinstance(preview_message, discord.Message) check can drop the returned followup.send(wait=True) message from being stored, preventing later view refreshes via _sync_message_view(). Store the message regardless (or broaden the type check) so the confirmation view can be edited after modal updates.
| if isinstance(preview_message, discord.Message): | |
| view.preview_message = preview_message | |
| view.preview_message = preview_message |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7497b739a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| link_member=None, | ||
| action="crm.upload_resume", | ||
| status_message=("🔄 Re-running profile extraction with website content..."), |
There was a problem hiding this comment.
Keep reparse flow in the original command context
ResumeConfirmInferredWebsitesButton always reruns extraction with action="crm.upload_resume" and link_member=None, which drops the original context of the preview. When the button is used from /reprocess-resume (or from an upload that had a linked Discord member), the rerun no longer satisfies the role-suggestion conditions in _run_resume_extract_and_preview, so role suggestion/apply behavior is lost and audits are attributed to the wrong action.
Useful? React with 👍 / 👎.
| with curl_requests.get( | ||
| current_url, | ||
| headers=headers, | ||
| timeout=PROFILE_SOURCE_FETCH_TIMEOUT_SECONDS, | ||
| allow_redirects=False, |
There was a problem hiding this comment.
Prevent DNS rebinding between URL validation and fetch
The URL is validated as "public" before this request, but the actual fetch performs a fresh hostname lookup via curl_requests.get. That split allows DNS rebinding: an attacker-controlled domain can resolve to a public IP during validation and a private/internal IP at connect time, bypassing the SSRF guard for resume-supplied websites. The fetch path should pin/verify the connected IP, not only pre-validate the hostname.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
tests/unit/test_crm.py (1)
6030-6082: Add a callback test for the edited/new-website branch too.This only exercises the button when its URLs come from
source_enrichments. The PR also shows the same button whenwebsite_reparse_candidatesis derived fromexisting_websitesplus edited/newcWebsiteLinkvalues, and that path is not verified end-to-end here. A second callback test for that mode would close the biggest remaining gap in the new reparse flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_crm.py` around lines 6030 - 6082, The test only exercises the ResumeUpdateConfirmationView button when source_enrichments provide the URLs; add a second async test that constructs a ResumeUpdateConfirmationView with existing_websites and edited/new cWebsiteLink values so website_reparse_candidates is populated from those fields, locate the same Reparse button (as in test_confirm_inferred_websites_button_reruns_preview) and invoke its callback with a mocked confirm_interaction, then assert crm_cog._run_resume_extract_and_preview was awaited and that its kwargs include confirmed_personal_websites matching the cWebsiteLink/existing_websites-derived URLs, link_discord_payload matching input, and the same status_message ("🔄 Re-running profile extraction with website content...").tests/unit/test_resume_profile_processor.py (1)
319-321: Assert the full enrichment payloads here.These checks only verify
status/origin. The downstream CRM flow also depends onlabel,url, and, for confirmation/failure cases,detail, so a bad producer payload can still pass this suite while the button/rendering logic breaks. Comparing the fullResumeSourceEnrichmententries would make this contract much tighter.Also applies to: 397-404, 464-467
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_resume_profile_processor.py` around lines 319 - 321, The test only asserts status and origin for result.source_enrichments; update it to assert the full ResumeSourceEnrichment payloads (including label, url and detail where applicable) so the contract is tighter. Replace the two partial checks with a single equality check against the expected ResumeSourceEnrichment objects (or their dict representations) for result.source_enrichments; e.g., build expected entries with the same fields and compare whole objects/ dicts to result.source_enrichments to catch malformed producer payloads. Apply the same change to the other occurrences mentioned (the assertions around lines 397-404 and 464-467) so all enrichment assertions validate full payloads.packages/shared/src/five08/resume_processing_models.py (1)
67-74: ConstrainoriginandstatuswithLiteraltypes to catch invalid values at the model boundary.These fields implement a cross-module protocol between the processor and CRM UI, but plain
strpermits typos that only fail downstream when the UI attempts exact-match branching on status (e.g., lines 3709–3713 in crm.py). Constraining them withLiteralenforces valid values at validation time.♻️ Suggested change
-from typing import Any +from typing import Any, Literal @@ class ResumeSourceEnrichment(BaseModel): @@ - origin: str - status: str + origin: Literal["crm", "resume_inference", "resume_confirmation"] + status: Literal["used", "failed", "confirmation_needed"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/src/five08/resume_processing_models.py` around lines 67 - 74, The ResumeSourceEnrichment model currently lets origin and status be arbitrary strings which allows typos to slip past validation; change the type annotations in class ResumeSourceEnrichment to use typing.Literal with the exact allowed values for origin and status (e.g., origin: Literal["..."] and status: Literal["..."]) so invalid values are rejected at model parse time, add the necessary import for Literal (or typing_extensions.Literal for older Python), and if there are central enums/constants for these protocol values prefer referencing them (or mirror their values) so ResumeSourceEnrichment.origin and .status enforce the same vocabulary used by the CRM UI/processor.packages/shared/pyproject.toml (1)
7-7: Constraincurl-cffito the API series this PR actually uses.
>=0.10.0leaves installs open to any latercurl-cffirelease, but the new fetcher imports concretecurl_cffi.requestssymbols and depends on specific streaming/context-manager behavior. While the specific API surface (BrowserTypeLiteral,RequestsError, andresponse.iter_content()streaming) has remained stable since 0.10.x, curl_cffi provides no explicit backward compatibility guarantee. Pinning the compatible series avoids a surprise break when that surface changes.♻️ Suggested change
- "curl-cffi>=0.10.0", + "curl-cffi~=0.10.0",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/pyproject.toml` at line 7, Replace the open-ended dependency "curl-cffi>=0.10.0" with a constrained series specifier to lock to the 0.10 API series used by the new fetcher; change the requirement to a range such as ">=0.10.0,<0.11.0" (or "~=0.10.0") so imports like curl_cffi.requests and symbols BrowserTypeLiteral, RequestsError and response.iter_content() remain on the tested API surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 1705-1714: The callback currently disables the preview buttons
unconditionally after calling _run_resume_extract_and_preview(), which leaves
the user without any usable controls when that reparse fails; change the flow so
you only disable and update the existing view when the reparse succeeded—have
_run_resume_extract_and_preview() return a success boolean (or raise on fatal
failure) and move the loop that sets item.disabled = True and the
interaction.message.edit(view=view) calls into the success branch (or after a
truthy return), keeping the existing NotFound/HTTPException handling around
interaction.message.edit; reference the function name
_run_resume_extract_and_preview and the interaction.message.edit(view=view)/for
item in view.children loop to locate where to gate the disabling.
- Around line 1693-1704: The call to _run_resume_extract_and_preview hardcodes
action="crm.upload_resume" and link_member=None, which loses original flow
context and drops Discord-role suggestion/apply behavior on reparses; modify the
invocation to preserve the original action and member context from the view
(e.g., use the view's stored action/flow identifier and the view's linked-member
payload instead of the hardcoded values or None) so reparses keep the same path
(e.g., reprocess vs upload) and retain role-suggestion logic when a linked
member exists.
- Around line 1791-1794: The ResumeEditWebsitesButton is only being added when
proposed_updates.get("cWebsiteLink") is present, which hides the website editor
for previews that need a new website; instead, after calling
self._refresh_website_reparse_candidates() always add the
ResumeEditWebsitesButton (via self.add_item(ResumeEditWebsitesButton())) so the
editor is available even when cWebsiteLink is not in proposed_updates; locate
the block around _refresh_website_reparse_candidates, remove or change the
conditional that checks proposed_updates.get("cWebsiteLink") and ensure add_item
is invoked unconditionally (or guarded only to avoid duplicate adds if needed).
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 81-83: The current _HTML_META_DESC_RE regex only matches meta tags
where name/property precedes content and will miss tags with reversed attribute
order; replace this brittle pattern by adding a helper like
_extract_meta_description(html: str) -> str | None that iterates meta tags
(using re.finditer(r'(?is)<meta\s+([^>]+)>')), parses attributes per-tag
(searching for (?:name|property)\s*=\s*["\'](?:description|og:description)["\']
and content\s*=\s*["\']([^"\']*)["\'] with re.I) and returns the content value
when both are present, then update any code that used _HTML_META_DESC_RE to call
_extract_meta_description instead.
---
Nitpick comments:
In `@packages/shared/pyproject.toml`:
- Line 7: Replace the open-ended dependency "curl-cffi>=0.10.0" with a
constrained series specifier to lock to the 0.10 API series used by the new
fetcher; change the requirement to a range such as ">=0.10.0,<0.11.0" (or
"~=0.10.0") so imports like curl_cffi.requests and symbols BrowserTypeLiteral,
RequestsError and response.iter_content() remain on the tested API surface.
In `@packages/shared/src/five08/resume_processing_models.py`:
- Around line 67-74: The ResumeSourceEnrichment model currently lets origin and
status be arbitrary strings which allows typos to slip past validation; change
the type annotations in class ResumeSourceEnrichment to use typing.Literal with
the exact allowed values for origin and status (e.g., origin: Literal["..."] and
status: Literal["..."]) so invalid values are rejected at model parse time, add
the necessary import for Literal (or typing_extensions.Literal for older
Python), and if there are central enums/constants for these protocol values
prefer referencing them (or mirror their values) so
ResumeSourceEnrichment.origin and .status enforce the same vocabulary used by
the CRM UI/processor.
In `@tests/unit/test_crm.py`:
- Around line 6030-6082: The test only exercises the
ResumeUpdateConfirmationView button when source_enrichments provide the URLs;
add a second async test that constructs a ResumeUpdateConfirmationView with
existing_websites and edited/new cWebsiteLink values so
website_reparse_candidates is populated from those fields, locate the same
Reparse button (as in test_confirm_inferred_websites_button_reruns_preview) and
invoke its callback with a mocked confirm_interaction, then assert
crm_cog._run_resume_extract_and_preview was awaited and that its kwargs include
confirmed_personal_websites matching the cWebsiteLink/existing_websites-derived
URLs, link_discord_payload matching input, and the same status_message ("🔄
Re-running profile extraction with website content...").
In `@tests/unit/test_resume_profile_processor.py`:
- Around line 319-321: The test only asserts status and origin for
result.source_enrichments; update it to assert the full ResumeSourceEnrichment
payloads (including label, url and detail where applicable) so the contract is
tighter. Replace the two partial checks with a single equality check against the
expected ResumeSourceEnrichment objects (or their dict representations) for
result.source_enrichments; e.g., build expected entries with the same fields and
compare whole objects/ dicts to result.source_enrichments to catch malformed
producer payloads. Apply the same change to the other occurrences mentioned (the
assertions around lines 397-404 and 464-467) so all enrichment assertions
validate full payloads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3fd27fc3-11d6-40bc-80af-46c55d0f6361
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/pyproject.tomlpackages/shared/src/five08/resume_processing_models.pypackages/shared/src/five08/resume_profile_processor.pytests/unit/test_crm.pytests/unit/test_resume_profile_processor.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91804fe77b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _is_public_ip(value: IPAddress) -> bool: | ||
| return not ( | ||
| value.is_private | ||
| or value.is_loopback | ||
| or value.is_link_local |
There was a problem hiding this comment.
Reject non-global addresses in public IP validation
The SSRF guard currently accepts any address that is not is_private/loopback/link-local/etc., which still permits non-global special-use ranges (for example 100.64.0.0/10, where ipaddress returns is_private=False but is_global=False). In that case, an attacker-controlled hostname can pass _resolve_public_profile_request_target and still be fetched by _fetch_external_profile_source_text, allowing requests to non-public network space. Please validate resolved IPs using is_global (or an equivalent explicit denylist) instead of only negating selected flags.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 1898-1902: _sync_message_view currently only updates the
interactive view and leaves the preview embed/stale content unchanged; update it
to re-render the preview embed from the current website state and include it in
the edit call. Specifically, obtain a fresh embed by calling the existing
preview-builder function (for example self._build_preview_embed or
self.build_preview_embed — use whichever function in this class constructs the
preview embed) and then call await self.preview_message.edit(view=self,
embed=new_embed); if no builder exists, reconstruct the embed from the current
state before editing, while keeping the existing None check and exception
handling.
- Around line 1716-1721: The current loop in the block that iterates
view.children only disables discord.ui.Button instances, leaving any select
menus (notably ResumeSeniorityOverrideSelect) still interactive; update the loop
where you check items in view.children (the code surrounding
interaction.message.edit and the for item in view.children:) to also disable
select menus by checking for discord.ui.Select and/or the custom
ResumeSeniorityOverrideSelect class and setting item.disabled = True so all
controls (buttons and selects) are rendered inert before calling
interaction.message.edit(view=view).
- Around line 1850-1879: _current_inferred_reparse_websites() inferred URLs are
always re-added in _refresh_website_reparse_candidates(), which reintroduces
URLs a user removed in the website editor; change the merge logic in
_refresh_website_reparse_candidates() so proposed_links (the user's edits) take
precedence over inferred candidates: compute the normalized key set for
proposed_links and filter out any inferred_candidates whose
normalized_website_identity_key appears in that set before combining with
proposed_additions, then normalize and assign to
self.website_reparse_candidates; reference functions/vars:
_refresh_website_reparse_candidates, _current_inferred_reparse_websites,
_normalize_website_links_for_reparse, normalized_website_identity_key,
proposed_links, proposed_additions, inferred_candidates, and
website_reparse_candidates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eaad4cff-5a76-4832-8191-4f11ac879c12
📒 Files selected for processing (4)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/src/five08/resume_profile_processor.pytests/unit/test_crm.pytests/unit/test_resume_profile_processor.py
✅ Files skipped from review due to trivial changes (1)
- tests/unit/test_resume_profile_processor.py
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/shared/src/five08/resume_profile_processor.py
- tests/unit/test_crm.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/test_crm.py (1)
6101-6106: Prefer selecting the reparse control by button type, not label text.Using label-based lookup here is brittle to copy changes and localization; type-based lookup is more stable.
♻️ Proposed refactor
confirm_button = next( child for child in view.children - if isinstance(child, discord.ui.Button) - and child.label == "Reparse With Websites" + if isinstance(child, ResumeConfirmInferredWebsitesButton) )confirm_button = next( child for child in view.children - if isinstance(child, discord.ui.Button) - and child.label == "Reparse With Websites" + if isinstance(child, ResumeConfirmInferredWebsitesButton) )Also applies to: 6166-6171
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_crm.py` around lines 6101 - 6106, The test currently finds the "Reparse With Websites" button by matching label text which is brittle; update the selection to find the discord.ui.Button by a stable identifier (e.g., custom_id) instead of label. Change the lookup that sets confirm_button to iterate view.children and pick the child where isinstance(child, discord.ui.Button) and child.custom_id == "reparse_with_websites" (or another stable custom_id you add where the button is constructed), and make the identical change at the other occurrence that currently uses the label match. Ensure the actual button creation code sets that custom_id so the tests can reliably select by it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit/test_crm.py`:
- Around line 6101-6106: The test currently finds the "Reparse With Websites"
button by matching label text which is brittle; update the selection to find the
discord.ui.Button by a stable identifier (e.g., custom_id) instead of label.
Change the lookup that sets confirm_button to iterate view.children and pick the
child where isinstance(child, discord.ui.Button) and child.custom_id ==
"reparse_with_websites" (or another stable custom_id you add where the button is
constructed), and make the identical change at the other occurrence that
currently uses the label match. Ensure the actual button creation code sets that
custom_id so the tests can reliably select by it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4d994268-0a37-41a5-b8f2-dae3ba8f8a62
📒 Files selected for processing (4)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/src/five08/resume_profile_processor.pytests/unit/test_crm.pytests/unit/test_resume_profile_processor.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/test_resume_profile_processor.py
- packages/shared/src/five08/resume_profile_processor.py
- apps/discord_bot/src/five08/discord_bot/cogs/crm.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def on_submit(self, interaction: discord.Interaction) -> None: | ||
| raw = self.websites_input.value or "" | ||
| links = [line.strip() for line in raw.splitlines() if line.strip()] | ||
| self.confirmation_view.website_edits_override_inferred_candidates = True | ||
| if links: | ||
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | ||
| else: | ||
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | ||
| self.confirmation_view._refresh_website_reparse_candidates() | ||
| await self.confirmation_view._sync_message_view() | ||
| count = len(links) | ||
| if self.confirmation_view.website_reparse_candidates: | ||
| action_hint = " Click **Reparse With Websites** to refresh the extraction." | ||
| else: | ||
| action_hint = " Click **Confirm Updates** to apply." | ||
| await interaction.response.send_message( | ||
| f"✅ Websites updated to {count} link{'s' if count != 1 else ''}. " | ||
| "Click **Confirm Updates** to apply.", | ||
| + action_hint, | ||
| ephemeral=True, | ||
| ) |
There was a problem hiding this comment.
ResumeEditWebsitesModal.on_submit awaits _sync_message_view() (which performs a Discord message edit API call) before sending the interaction response. Modal submits must be acknowledged quickly; if the message edit is slow or rate-limited this can cause an "interaction failed" client error. Consider responding/defering first (e.g., interaction.response.defer(...) + followup) or scheduling _sync_message_view() asynchronously so the initial response is sent within the interaction deadline.
| resolve_entry = f"{host}:{port}:" + ",".join( | ||
| _format_curl_resolve_address(ip) for ip in resolved_ips | ||
| ) | ||
| session: curl_requests.Session = curl_requests.Session( | ||
| curl_options={CurlOpt.RESOLVE: [resolve_entry]} |
There was a problem hiding this comment.
CurlOpt.RESOLVE entries appear to be built as a single string with multiple resolved IPs joined by commas (f"{host}:{port}:" + ",".join(...)). libcurl’s RESOLVE/--resolve format is host:port:address per entry; multiple addresses typically require separate entries (or choosing one). As written, any hostname that resolves to multiple A/AAAA records may produce an invalid RESOLVE value and cause fetches to fail. Consider selecting a single validated IP (e.g., the first preferred IPv4/then IPv6) or constructing a list of one entry per IP in curl_options.
| resolve_entry = f"{host}:{port}:" + ",".join( | |
| _format_curl_resolve_address(ip) for ip in resolved_ips | |
| ) | |
| session: curl_requests.Session = curl_requests.Session( | |
| curl_options={CurlOpt.RESOLVE: [resolve_entry]} | |
| resolve_entries = [ | |
| f"{host}:{port}:{_format_curl_resolve_address(ip)}" | |
| for ip in resolved_ips | |
| ] | |
| session: curl_requests.Session = curl_requests.Session( | |
| curl_options={CurlOpt.RESOLVE: resolve_entries} |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7aee2a7367
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| enrichments.append( | ||
| ResumeSourceEnrichment( | ||
| label="Personal Website", | ||
| url=website_url, | ||
| origin="resume_inference", |
There was a problem hiding this comment.
Recompute website confirmations after enriched re-extraction
confirmation_needed website enrichments are recorded from the first extraction pass, but this flow may immediately run a second extraction with inferred GitHub sources (_extract_profile_with_external_sources). If that second pass changes or removes website_links, the preview can still show stale inferred websites and offer “Reparse With Websites” for URLs that are no longer in the final extracted profile, which misleads operators and can trigger unnecessary external fetches.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4d4cf0705
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self._refresh_website_reparse_candidates() | ||
| self.add_item(ResumeEditWebsitesButton()) | ||
| if proposed_updates.get("cSocialLinks"): |
There was a problem hiding this comment.
Preserve existing websites when enabling manual website edits
The view now always adds ResumeEditWebsitesButton, including cases where no website update was proposed. In that path, ResumeEditWebsitesModal still initializes from proposed_updates["cWebsiteLink"] only, so contacts that already have CRM website links open with an empty editor; if an operator adds a single URL and confirms, the apply payload overwrites cWebsiteLink with only the new value and drops previously stored links. This creates an easy data-loss path whenever existing websites are present but extraction proposed none.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/shared/src/five08/resume_profile_processor.py (1)
1149-1198: Apply the website cap to the combined candidate set.Confirmed websites and CRM websites are sliced independently here, so one run can still enqueue up to four website fetches before GitHub is added. If
PROFILE_SOURCE_MAX_WEBSITESis meant to bound prompt size and latency, merge/dedupe both website sources first and then apply the limit once.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/src/five08/resume_profile_processor.py` around lines 1149 - 1198, In _build_initial_external_source_candidates, both explicit_personal_websites and contact cWebsiteLink are sliced separately which can exceed PROFILE_SOURCE_MAX_WEBSITES; instead, call _coerce_website_links on both sources, map each URL to its normalized key using normalized_website_identity_key, dedupe/merge them (prefer explicit/resume_confirmation origin when a key appears in both), then take at most PROFILE_SOURCE_MAX_WEBSITES from that merged list and construct _ExternalProfileSourceCandidate entries (using origin "resume_confirmation" or "crm" as chosen), and keep the existing GitHub handling unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1060-1063: The enrichment-aware calls to extractor.extract (the
one passing extra_sources and the later similar call) must fail open: wrap each
call in a try/except Exception as e so that any exception does not overwrite the
previously successful `extracted` result or force success=False; on exception
log/record the enrichment failure (e.g., set an `enrichment_error` or
`enrichment_failed` flag and capture the exception message) and continue using
the last good `extracted` value instead of blanking the preview or propagating
to the outer except; apply the same pattern to the second extractor.extract call
referenced around the later block.
- Around line 1098-1148: The current early return on existing_website_links in
_refresh_inferred_website_confirmation_enrichments prevents creating
confirmation_needed enrichments for new distinct sites; remove the return and
instead compute a set of normalized CRM website identity keys from
contact.get("cWebsiteLink") (using normalized_website_identity_key) and, when
iterating extracted.website_links[:PROFILE_SOURCE_MAX_WEBSITES], skip adding a
ResumeSourceEnrichment only if the normalized identity matches an existing CRM
identity or is in seen_source_keys or confirmed_personal_website_keys; otherwise
append the confirmation_needed enrichment and add the dedupe_source_key to
seen_source_keys so new distinct sites still get confirmation/reparse options.
- Around line 90-93: The regex _GITHUB_USERNAME_RE currently and use of search()
allow repo URLs like /org/repo to match and return the org as a username; change
the pattern to require the URL end (allow optional trailing slash) — e.g.
^(?:https?://)?(?:www\.)?github\.com/([A-Za-z0-9-]{1,39})/?$ — and replace uses
of search() with fullmatch()/match() (or use re.fullmatch with the new pattern)
so only profile URLs match; apply the same pattern+call-change to the other
occurrence referenced (the block around the second _GITHUB_USERNAME_RE usage).
---
Nitpick comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1149-1198: In _build_initial_external_source_candidates, both
explicit_personal_websites and contact cWebsiteLink are sliced separately which
can exceed PROFILE_SOURCE_MAX_WEBSITES; instead, call _coerce_website_links on
both sources, map each URL to its normalized key using
normalized_website_identity_key, dedupe/merge them (prefer
explicit/resume_confirmation origin when a key appears in both), then take at
most PROFILE_SOURCE_MAX_WEBSITES from that merged list and construct
_ExternalProfileSourceCandidate entries (using origin "resume_confirmation" or
"crm" as chosen), and keep the existing GitHub handling unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4040148a-7711-48f7-b71b-1390558169b5
📒 Files selected for processing (4)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/src/five08/resume_profile_processor.pytests/unit/test_crm.pytests/unit/test_resume_profile_processor.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/test_resume_profile_processor.py
- tests/unit/test_crm.py
- apps/discord_bot/src/five08/discord_bot/cogs/crm.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| scheme = parsed.scheme.lower() | ||
| if scheme not in {"http", "https"}: | ||
| return "Profile URL must use http or https" | ||
| if parsed.username or parsed.password: | ||
| return "Profile URL must not include credentials" | ||
|
|
There was a problem hiding this comment.
_resolve_public_profile_request_target allows arbitrary explicit ports from profile URLs. Even with IP validation, this enables outbound port-scanning of public hosts (e.g. https://example.com:22) from the worker. Consider restricting allowed ports to 80/443 (or a small allowlist) and returning a clear validation error when an unsupported port is provided.
| if links: | ||
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | ||
| else: | ||
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) |
There was a problem hiding this comment.
The websites modal now seeds its default value from existing_websites when no cWebsiteLink update is proposed. If the user submits without making changes, on_submit will still set proposed_updates['cWebsiteLink'] to the existing list, creating a no-op CRM update (and can also affect downstream logic that treats presence of the key as an intentional change). Consider comparing the submitted normalized links against confirmation_view.existing_websites (by identity key) and, when they match, popping cWebsiteLink from proposed_updates instead of setting it.
| if links: | |
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | |
| else: | |
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | |
| # Determine whether the submitted links are logically identical to the | |
| # existing websites (by normalized identity key). If they are, avoid | |
| # setting cWebsiteLink in proposed_updates so we don't create a no-op | |
| # CRM update or signal an intentional change where there is none. | |
| existing_websites = self.confirmation_view.existing_websites or [] | |
| def _identity_keys(urls: list[str]) -> list[str]: | |
| return [ | |
| normalized_website_identity_key(normalize_website_url(u)) | |
| for u in urls | |
| if str(u).strip() | |
| ] | |
| submitted_keys = _identity_keys(links) | |
| existing_keys = _identity_keys(existing_websites) | |
| if submitted_keys == existing_keys: | |
| # No logical change; ensure we don't propose an unnecessary update. | |
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | |
| else: | |
| if links: | |
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | |
| else: | |
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a1e8636b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for website_url in extracted.website_links[:PROFILE_SOURCE_MAX_WEBSITES]: | ||
| source_key = normalized_website_identity_key(website_url) | ||
| if not source_key or source_key in existing_website_keys: |
There was a problem hiding this comment.
Filter existing websites before applying reparse cap
The reparse candidate loop limits extracted.website_links to PROFILE_SOURCE_MAX_WEBSITES before it excludes links already on the CRM contact. If the first two extracted links are already in cWebsiteLink and a new website appears later, that new site is never surfaced as confirmation_needed (and can’t be fetched/reparsed), so enrichment is silently skipped for valid resumes with more than two links. Apply the cap after filtering to new unique links.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for candidate in candidates: | ||
| if candidate.source_key in seen_source_keys: | ||
| continue | ||
| seen_source_keys.add(candidate.source_key) | ||
|
|
||
| try: | ||
| fetched = self._fetch_external_profile_source_text(candidate.url) | ||
| except Exception as exc: | ||
| enrichments.append( | ||
| ResumeSourceEnrichment( | ||
| label=candidate.label, | ||
| url=candidate.url, | ||
| origin=candidate.origin, | ||
| status="failed", | ||
| detail=str(exc), | ||
| ) | ||
| ) | ||
| continue |
There was a problem hiding this comment.
_fetch_external_profile_sources() marks candidate.source_key as seen before attempting the fetch. If two candidates share the same source_key (e.g., http vs https variants collapsing to the same normalized_website_identity_key) and the first fetch fails, the second candidate will be skipped, preventing a potentially successful fallback. Consider only adding to seen_source_keys after a successful fetch, or removing it again on exception so alternate candidates can be tried.
| explicit_website_links = self._coerce_website_links(explicit_personal_websites) | ||
| for website_url in explicit_website_links[:PROFILE_SOURCE_MAX_WEBSITES]: | ||
| source_key = normalized_website_identity_key(website_url) | ||
| if not source_key: | ||
| continue | ||
| candidates.append( | ||
| _ExternalProfileSourceCandidate( | ||
| label="Personal Website", | ||
| url=website_url, | ||
| origin="resume_confirmation", | ||
| source_key=f"website:{source_key}", | ||
| ) | ||
| ) | ||
|
|
||
| website_links = self._coerce_website_links(contact.get("cWebsiteLink"))[ | ||
| :PROFILE_SOURCE_MAX_WEBSITES | ||
| ] |
There was a problem hiding this comment.
PROFILE_SOURCE_MAX_WEBSITES is applied separately to explicit_personal_websites and to CRM cWebsiteLink, so the initial enrichment pass can fetch up to 2 + 2 websites (plus GitHub). If the intent is a global cap on website fetches per resume, this should be enforced across both sources (or the constant renamed to reflect per-origin caps) to avoid unexpected extra network calls.
| for url in existing_links | ||
| if normalized_website_identity_key(url) | ||
| ] | ||
| if submitted_keys == existing_keys: |
There was a problem hiding this comment.
The no-op detection compares submitted_keys == existing_keys as lists, which is order-sensitive. Submitting the same set of existing websites in a different order will be treated as an update and can trigger an unnecessary reparse/CRM update. Consider comparing a canonicalized representation (e.g., sorted keys or a set) instead of the raw list order.
| if submitted_keys == existing_keys: | |
| if set(submitted_keys) == set(existing_keys): |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/discord_bot/src/five08/discord_bot/cogs/crm.py (1)
4177-4189:⚠️ Potential issue | 🟠 MajorMake preview-path audit writes best-effort in this command flow.
This changed extraction/preview path still uses
_audit_command(...)directly. If audit writes fail, the user flow can fail instead of returning the intended preview/error response.🛡️ Suggested fix direction
- self._audit_command( + self._audit_command_safe( interaction=interaction, action=action_name, result="error", metadata={ "filename": filename, "attachment_id": attachment_id, "stage": "extract_execute", "error": str(exc), }, resource_type="crm_contact", resource_id=str(contact_id), )Apply this consistently to audit calls inside
_run_resume_extract_and_preview(...).As per coding guidelines: "Audit logging must be best effort: command flows in Discord cogs should never fail if audit writes fail".
Also applies to: 4196-4207, 4223-4235, 4340-4351, 4379-4392, 4427-4440
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py` around lines 4177 - 4189, The audit writes in _run_resume_extract_and_preview currently call self._audit_command(...) directly which can raise and break the preview/error response; make these audit calls best-effort by wrapping each self._audit_command(...) invocation in a try/except that catches all exceptions, logs the failure (e.g., via self.logger.exception or similar) and continues without re-raising so command flow still returns the preview/error response; apply this change to every audit call inside _run_resume_extract_and_preview (including the calls around extract_execute, the subsequent audit blocks at the other noted sites) so audit failures never propagate to the user flow.
🧹 Nitpick comments (1)
packages/shared/src/five08/resume_profile_processor.py (1)
1346-1451: Significant code duplication between IP-literal and DNS-resolved fetch branches.The response handling logic (redirect checks, status validation, content-type filtering, chunked body reading) is duplicated almost verbatim between the IP-literal branch (lines 1355-1392) and the DNS-resolved branch (lines 1412-1449). Consider extracting a helper method for the common response processing.
♻️ Suggested extraction of common response handling
+ def _handle_profile_response( + self, + response: Any, + current_url: str, + ) -> tuple[str | None, str | None]: + """Handle profile response, returning (extracted_text, redirect_url).""" + try: + if response.status_code in {301, 302, 303, 307, 308}: + redirect_to = response.headers.get("Location") + if not redirect_to: + raise ValueError("Profile URL redirect missing Location header") + return None, urljoin(current_url, redirect_to) + + response.raise_for_status() + content_type = str(response.headers.get("Content-Type", "")).lower() + content_length = response.headers.get("Content-Length") + if content_length: + try: + content_length_value = int(content_length) + except (TypeError, ValueError): + content_length_value = None + if ( + content_length_value is not None + and content_length_value > PROFILE_SOURCE_MAX_BYTES + ): + raise ValueError("Profile page exceeds size limit") + + payload = bytearray() + for chunk in response.iter_content(chunk_size=8192): + if not chunk: + continue + payload.extend(chunk) + if len(payload) > PROFILE_SOURCE_MAX_BYTES: + raise ValueError("Profile page exceeds size limit") + + return self._extract_profile_source_text( + body=bytes(payload), + content_type=content_type, + ), None + finally: + response.close()Then use in both branches:
extracted_text, redirect_url = self._handle_profile_response(response, current_url) if redirect_url: current_url = redirect_url continue return extracted_text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/src/five08/resume_profile_processor.py` around lines 1346 - 1451, The two branches for IP-literal and DNS-resolved fetch duplicate the entire response processing (redirect handling, raise_for_status, content-length checks, iterative chunk reading and calls to _extract_profile_source_text); extract that shared logic into a helper method (e.g. _handle_profile_response(response, current_url)) that returns (extracted_text, redirect_url) or raises on error, reuse it from both branches by calling it after getting response, updating current_url and continuing on a redirect, otherwise returning extracted_text; ensure the helper uses the same symbols/semantics: response.iter_content, response.headers.get("Content-Type"), response.headers.get("Content-Length"), PROFILE_SOURCE_MAX_BYTES and _extract_profile_source_text, and still closes the response (or let callers close) to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 896-907: The code compares submitted_keys and existing_keys as
ordered lists, causing false positives when the same website identities are
present in different orders; change the comparison to be order-insensitive by
comparing sets (or multisets if duplicates matter) of
normalized_website_identity_key results instead of lists. In practice, compute
submitted_keys and existing_keys as you already do, then check equality using
set(submitted_keys) == set(existing_keys) (or collections.Counter for
duplicate-aware comparison) and only call
self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) when those
sets differ. Ensure you still use normalized_website_identity_key for both
submitted_links and existing_links so the identity logic remains consistent.
---
Outside diff comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 4177-4189: The audit writes in _run_resume_extract_and_preview
currently call self._audit_command(...) directly which can raise and break the
preview/error response; make these audit calls best-effort by wrapping each
self._audit_command(...) invocation in a try/except that catches all exceptions,
logs the failure (e.g., via self.logger.exception or similar) and continues
without re-raising so command flow still returns the preview/error response;
apply this change to every audit call inside _run_resume_extract_and_preview
(including the calls around extract_execute, the subsequent audit blocks at the
other noted sites) so audit failures never propagate to the user flow.
---
Nitpick comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1346-1451: The two branches for IP-literal and DNS-resolved fetch
duplicate the entire response processing (redirect handling, raise_for_status,
content-length checks, iterative chunk reading and calls to
_extract_profile_source_text); extract that shared logic into a helper method
(e.g. _handle_profile_response(response, current_url)) that returns
(extracted_text, redirect_url) or raises on error, reuse it from both branches
by calling it after getting response, updating current_url and continuing on a
redirect, otherwise returning extracted_text; ensure the helper uses the same
symbols/semantics: response.iter_content, response.headers.get("Content-Type"),
response.headers.get("Content-Length"), PROFILE_SOURCE_MAX_BYTES and
_extract_profile_source_text, and still closes the response (or let callers
close) to preserve behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d6c989ff-3493-4a26-92b8-b5eef1fee007
📒 Files selected for processing (4)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/src/five08/resume_profile_processor.pytests/unit/test_crm.pytests/unit/test_resume_profile_processor.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_crm.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b47fac97a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| raw = self.websites_input.value or "" | ||
| links = [line.strip() for line in raw.splitlines() if line.strip()] | ||
| if links: | ||
| self.confirmation_view.website_edits_override_inferred_candidates = True |
There was a problem hiding this comment.
Avoid dropping inferred websites on no-op modal submit
website_edits_override_inferred_candidates is set before checking whether the submitted website list actually changed, so a no-op submit (open modal, submit unchanged) still forces _refresh_website_reparse_candidates() to filter inferred confirmation_needed websites against an empty proposed set. In that case the reparse candidates disappear, the Reparse With Websites button is removed, and inferred website sources are hidden from the refreshed preview even though the user made no effective change, which can silently prevent the intended enrichment flow.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ebeeb883ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if fallback_extracted is not None: | ||
| return fallback_extracted | ||
| return self.extractor.extract(resume_text, extra_sources=None) |
There was a problem hiding this comment.
Reset enrichment usage after fail-open fallback
When enrichment extraction fails, this fallback re-runs parsing with extra_sources=None, but the previously fetched sources are still kept as status="used" and treated as already confirmed. In that path, the final preview can claim website/GitHub content was used even though the successful parse ignored external sources, and users lose a way to re-confirm/retry those sources because they are no longer surfaced as confirmation candidates.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif links: | ||
| self.confirmation_view.website_edits_override_inferred_candidates = True | ||
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | ||
| else: | ||
| self.confirmation_view.website_edits_override_inferred_candidates = True | ||
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | ||
| self.confirmation_view._refresh_website_reparse_candidates() | ||
| await self.confirmation_view._sync_message_view() | ||
| count = len(links) |
There was a problem hiding this comment.
In ResumeEditWebsitesModal.on_submit, you compute submitted_links using normalize_website_url, but when persisting the update you still write the raw links list into proposed_updates["cWebsiteLink"]. This can allow invalid/un-normalized entries to reach the apply step (where they may be dropped), and it can also make dedupe/no-op detection behave differently than what will actually be applied. Consider storing submitted_links (or rejecting invalid lines up-front) so the proposed update matches what will be applied/reparsed.
| elif links: | |
| self.confirmation_view.website_edits_override_inferred_candidates = True | |
| self.confirmation_view.proposed_updates["cWebsiteLink"] = links | |
| else: | |
| self.confirmation_view.website_edits_override_inferred_candidates = True | |
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | |
| self.confirmation_view._refresh_website_reparse_candidates() | |
| await self.confirmation_view._sync_message_view() | |
| count = len(links) | |
| elif submitted_links: | |
| self.confirmation_view.website_edits_override_inferred_candidates = True | |
| self.confirmation_view.proposed_updates["cWebsiteLink"] = submitted_links | |
| else: | |
| self.confirmation_view.website_edits_override_inferred_candidates = True | |
| self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) | |
| self.confirmation_view._refresh_website_reparse_candidates() | |
| await self.confirmation_view._sync_message_view() | |
| count = len(submitted_links) |
| count = len(links) | ||
| await interaction.response.send_message( | ||
| if self.confirmation_view.has_reparse_candidates: | ||
| action_hint = ( | ||
| " Click **Reparse With New Sources** to refresh the extraction." | ||
| ) | ||
| else: | ||
| action_hint = " Click **Confirm Updates** to apply." | ||
| await interaction.followup.send( | ||
| f"✅ Websites updated to {count} link{'s' if count != 1 else ''}. " | ||
| "Click **Confirm Updates** to apply.", | ||
| + action_hint, | ||
| ephemeral=True, |
There was a problem hiding this comment.
The success message uses count = len(links) (raw user input) even when the submission is treated as a no-op (submitted_keys == existing_keys) or when some lines are invalid and get filtered out by normalize_website_url. This can misreport what will actually be applied and confuse users (e.g., “updated to 1 link” while cWebsiteLink was popped). Consider computing the count from the normalized/stored list and emitting a distinct message for the no-op case.
| await interaction.followup.send( | ||
| f"❌ Resume extraction failed: {result.get('error') or 'Unknown error'}", | ||
| file=debug_file, |
There was a problem hiding this comment.
This failure path still says “Resume extraction failed” even though the command/UI has been renamed to “profile” throughout the rest of the flow. Updating this message to “Profile extraction failed” (and similar wording) will keep user-facing output consistent and reduce confusion.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1507e97320
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not has_external_profile_sources: | ||
| raise ValueError("No resume or external profile sources available") | ||
| text = "" | ||
| normalized_filename = normalized_filename or "crm-profile-sources" |
There was a problem hiding this comment.
Allow LLM extraction when no resume attachment is present
When attachment_id is missing, this path forces text = "" and still proceeds with extraction from external sources. In ResumeProfileExtractor.extract, an empty resume_text short-circuits to heuristic mode, so the new source-only reprocess flow never uses LLM parsing even when website/GitHub content was fetched successfully. That materially reduces extraction quality for /reprocess-profile contacts without resumes and can miss fields operators expect from enriched sources.
Useful? React with 👍 / 👎.
| github_username = str(contact.get("cGitHubUsername") or "").strip() | ||
| return bool(github_username) |
There was a problem hiding this comment.
Validate external-source fields before enabling reprocess flow
This gate treats any non-empty cGitHubUsername as a usable profile source, but the processor later applies stricter normalization and can reject malformed values. For contacts with no resume and an invalid GitHub value, the UI now routes into reprocess instead of upload handoff, then extraction fails with “No resume or external profile sources available,” creating a dead-end flow. Reuse the same normalization criteria as the processor when deciding has_profile_sources.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5273d00368
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| text = "" | ||
| normalized_filename = normalized_filename or "crm-profile-sources" |
There was a problem hiding this comment.
Feed external source text into fallback skill extraction
When no attachment is present this path sets text to an empty string, but later _coerce_profile_skill_result uses that same value for fallback skill extraction (skills_extractor.extract_skills(resume_text)) when the profile extractor returns no skills. In /reprocess-profile source-only runs, that means fallback skill parsing operates on "" even after website/GitHub content was successfully fetched, so skill updates can be silently missed. Use the fetched external-source corpus (or a combined source blob) instead of empty resume text for this fallback path.
Useful? React with 👍 / 👎.
Description
This updates resume profile parsing to enrich from public GitHub and website content, adds curl_cffi mobile impersonation for fetches, and records enrichment attempts in extraction results. Inferred personal websites now require confirmation, while newly added or confirmed website links in the resume preview offer a Reparse With Websites path before applying CRM updates.
Related Issue
None.
How Has This Been Tested?
uv run pytest tests/unit/test_resume_profile_processor.py -quv run pytest tests/unit/test_crm.py -q -k "resume_update_view_adds_confirm_websites_button_when_needed or resume_update_view_adds_reparse_button_for_new_websites or edit_websites_modal_submit_updates_proposed or edit_websites_modal_submit_removes_field_when_blank or confirm_inferred_websites_button_reruns_preview or resume_preview_embed_marks_confirmation_needed_sources or resume_preview_embed_prompts_for_new_website_reparse or resume_preview_embed_includes_external_sources or build_resume_extract_debug_file_serializes_raw_payload or resume_preview_embed_includes_debug_field or reprocess_confirmation_view_calls_reprocess_preview or run_resume_extract_and_preview_calls_direct_extract_for_reprocess"uv run ruff check packages/shared/src/five08/resume_profile_processor.py packages/shared/src/five08/resume_processing_models.py apps/discord_bot/src/five08/discord_bot/cogs/crm.py tests/unit/test_resume_profile_processor.py tests/unit/test_crm.pySummary by CodeRabbit
New Features
Tests
Chores