Improve resume parsing with website reparsing by michaelmwu · Pull Request #219 · 508-dev/508-workflows

michaelmwu · 2026-03-26T00:35:50Z

Description

This updates resume profile parsing to enrich from public GitHub and website content, adds curl_cffi mobile impersonation for fetches, and records enrichment attempts in extraction results. Inferred personal websites now require confirmation, while newly added or confirmed website links in the resume preview offer a Reparse With Websites path before applying CRM updates.

Related Issue

None.

How Has This Been Tested?

uv run pytest tests/unit/test_resume_profile_processor.py -q
uv run pytest tests/unit/test_crm.py -q -k "resume_update_view_adds_confirm_websites_button_when_needed or resume_update_view_adds_reparse_button_for_new_websites or edit_websites_modal_submit_updates_proposed or edit_websites_modal_submit_removes_field_when_blank or confirm_inferred_websites_button_reruns_preview or resume_preview_embed_marks_confirmation_needed_sources or resume_preview_embed_prompts_for_new_website_reparse or resume_preview_embed_includes_external_sources or build_resume_extract_debug_file_serializes_raw_payload or resume_preview_embed_includes_debug_field or reprocess_confirmation_view_calls_reprocess_preview or run_resume_extract_and_preview_calls_direct_extract_for_reprocess"
uv run ruff check packages/shared/src/five08/resume_profile_processor.py packages/shared/src/five08/resume_processing_models.py apps/discord_bot/src/five08/discord_bot/cogs/crm.py tests/unit/test_resume_profile_processor.py tests/unit/test_crm.py

Summary by CodeRabbit

New Features
- Added a conditional "Reparse With Websites" flow to resume confirmation, editable website handling, and a confirm-inferred-websites action to re-run extraction/previews with selected sites.
- Preview embeds now include "External Sources", a "Website Reparse" prompt when new sites are proposed, and an optional Discord link field.
- Extraction accepts confirmed personal websites and tracks per-source enrichment statuses.
Tests
- Expanded unit coverage for website reparse/confirmation, external-source fetching, extraction reruns, and HTML/IP handling.
Chores
- Added curl-cffi dependency.

coderabbitai · 2026-03-26T00:36:03Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added a website reparse/confirmation path to the resume update flow: UI controls to confirm inferred sites, pass confirmed websites into extraction, surface external-source enrichments and existing websites in previews, persist enrichment metadata, and add extensive tests.

Changes

Cohort / File(s)	Summary
CRM bot UI & flow `apps/discord_bot/src/five08/discord_bot/cogs/crm.py`	Added `ResumeConfirmInferredWebsitesButton`, extended `ResumeUpdateConfirmationView` state/helpers (attachment, filename, link_member, preview tracking, website reparse candidates, override flag), added `_sync_message_view`, deferred interactions, and updated confirmation messaging/controls.
Preview/extract plumbing (CRM) `apps/discord_bot/src/five08/discord_bot/cogs/crm.py` (same file cohort)	Updated signatures: `_extract_resume_profile_direct(..., confirmed_personal_websites)`, `_build_resume_preview_embed(..., link_member, link_discord)`, `_run_resume_extract_and_preview(..., confirmed_personal_websites, link_discord_payload)`; embed now renders “External Sources” and “Website Reparse”; re-run preview flow accepts confirmed sites and Discord link payload.
Resume processor implementation `packages/shared/src/five08/resume_profile_processor.py`	Two-phase extraction with external-source enrichment: accepts `confirmed_personal_websites`, builds/normalizes candidate sites (including GitHub), resolves and securely fetches external profiles (SSRF-safe checks, ports, redirects, timeouts, content limits), dedupes by identity key, records `source_enrichments`, reconciles confirmation-needed candidates, and returns `existing_websites`.
Shared models & deps `packages/shared/src/five08/resume_processing_models.py`, `packages/shared/pyproject.toml`	Added `ResumeSourceEnrichment` model; `ResumeExtractionResult` gains `source_enrichments` and `existing_websites`. Added dependency `curl-cffi>=0.10.0`.
Tests `tests/unit/test_crm.py`, `tests/unit/test_resume_profile_processor.py`	Expanded tests for inferred-website confirmation button, website edit overrides, preview/embed fields (External Sources, Website Reparse), debug serialization, reparse interaction callbacks, fetch/resolve/curl behavior, public-IP validation, and fail-open extraction scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant User as Discord User
  participant UI as ResumeUpdateConfirmationView
  participant Bot as CRMCog
  participant Processor as ResumeProfileProcessor
  participant Fetcher as External Fetcher

  User->>UI: edit websites / click "Reparse With Websites"
  UI->>Bot: send confirmed_personal_websites + link payload
  Bot->>Processor: extract_profile_proposal(confirmed_personal_websites, link_discord_payload)
  Processor->>Fetcher: resolve & fetch candidate URLs (SSRF-safe)
  Fetcher-->>Processor: text + enrichment status
  Processor-->>Bot: ResumeExtractionResult (source_enrichments, existing_websites)
  Bot->>UI: update preview message/embed (External Sources + Website Reparse)
  UI->>User: show updated confirmation view

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add resume seniority override #157 — Modifies the same CRM resume confirmation/preview flow and signatures; likely overlapping UI and plumbing changes.
fix worker startup side effects and DRY crm normalization #140 — Related changes to resume extraction/enrichment pipeline and external fetch handling.
fix: improve resume URL extraction guidance #137 — Related website URL normalization and candidate filtering logic.

Poem

I hopped through links both near and far,
Nudged websites, fetched them like a star,
I parsed and peeked, then gave a cheer,
Reparsed the resumes — all clear! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.72% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Improve resume parsing with website reparsing' directly and clearly summarizes the main change—adding website-based enrichment to resume profile extraction and a reparse confirmation flow in the UI.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch michaelmwu/profile-web-fetch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR enhances the resume parsing pipeline by enriching extractions with public GitHub/website content and wiring a Discord “Reparse With Websites” flow so operators can confirm/trigger website-based reparsing before applying CRM updates.

Changes:

Add external-source enrichment (CRM websites + GitHub, plus inferred sources with confirmation gating) to ResumeProfileProcessor.
Extend extraction results with source_enrichments and existing_websites and surface them in Discord preview embeds.
Add Discord UI support to confirm/reparse with websites, including candidate detection after website edits.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`packages/shared/src/five08/resume_profile_processor.py`	Fetches/validates external website/GitHub text (curl-cffi) and conditionally re-runs extraction; records enrichment attempts.
`packages/shared/src/five08/resume_processing_models.py`	Adds `ResumeSourceEnrichment` and new fields on `ResumeExtractionResult` to carry enrichment + baseline website context.
`apps/discord_bot/src/five08/discord_bot/cogs/crm.py`	Adds “Reparse With Websites” button logic, website reparse candidate tracking, and embed fields for external sources/reparse prompts.
`tests/unit/test_resume_profile_processor.py`	Adds unit tests covering enrichment usage + rerun behavior for inferred GitHub and confirmed websites.
`tests/unit/test_crm.py`	Adds unit tests covering new Discord UI behaviors and embed fields for enrichments/reparse prompts.
`packages/shared/pyproject.toml`	Adds `curl-cffi` dependency for external fetching.
`uv.lock`	Locks `curl-cffi` and updates shared package dependency set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T00:39:49Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+            if isinstance(preview_message, discord.Message):
+                view.preview_message = preview_message


interaction.followup.send(..., wait=True) can return a webhook message type (still editable) rather than discord.Message. The isinstance(preview_message, discord.Message) guard can prevent view.preview_message from being set, so later _sync_message_view() calls won’t update the UI after edits (e.g., updating website reparse candidates). Consider storing the returned message unconditionally (or widening the accepted message type to include webhook messages) since other views in this file already do so.

Suggested change

if isinstance(preview_message, discord.Message):

view.preview_message = preview_message

view.preview_message = preview_message

Copilot · 2026-03-26T00:39:50Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+        if isinstance(preview_message, discord.Message):
+            view.preview_message = preview_message


Same issue as above: the isinstance(preview_message, discord.Message) check can drop the returned followup.send(wait=True) message from being stored, preventing later view refreshes via _sync_message_view(). Store the message regardless (or broaden the type check) so the confirmation view can be edited after modal updates.

Suggested change

if isinstance(preview_message, discord.Message):

view.preview_message = preview_message

view.preview_message = preview_message

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7497b739a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T00:41:50Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+            link_member=None,
+            action="crm.upload_resume",
+            status_message=("🔄 Re-running profile extraction with website content..."),


Keep reparse flow in the original command context

ResumeConfirmInferredWebsitesButton always reruns extraction with action="crm.upload_resume" and link_member=None, which drops the original context of the preview. When the button is used from /reprocess-resume (or from an upload that had a linked Discord member), the rerun no longer satisfies the role-suggestion conditions in _run_resume_extract_and_preview, so role suggestion/apply behavior is lost and audits are attributed to the wrong action.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-26T00:41:50Z

packages/shared/src/five08/resume_profile_processor.py

+                with curl_requests.get(
+                    current_url,
+                    headers=headers,
+                    timeout=PROFILE_SOURCE_FETCH_TIMEOUT_SECONDS,
+                    allow_redirects=False,


Prevent DNS rebinding between URL validation and fetch

The URL is validated as "public" before this request, but the actual fetch performs a fresh hostname lookup via curl_requests.get. That split allows DNS rebinding: an attacker-controlled domain can resolve to a public IP during validation and a private/internal IP at connect time, bypassing the SSRF guard for resume-supplied websites. The fetch path should pin/verify the connected IP, not only pre-validate the hostname.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (4)

tests/unit/test_crm.py (1)
6030-6082: Add a callback test for the edited/new-website branch too.

This only exercises the button when its URLs come from source_enrichments. The PR also shows the same button when website_reparse_candidates is derived from existing_websites plus edited/new cWebsiteLink values, and that path is not verified end-to-end here. A second callback test for that mode would close the biggest remaining gap in the new reparse flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_crm.py` around lines 6030 - 6082, The test only exercises the
ResumeUpdateConfirmationView button when source_enrichments provide the URLs;
add a second async test that constructs a ResumeUpdateConfirmationView with
existing_websites and edited/new cWebsiteLink values so
website_reparse_candidates is populated from those fields, locate the same
Reparse button (as in test_confirm_inferred_websites_button_reruns_preview) and
invoke its callback with a mocked confirm_interaction, then assert
crm_cog._run_resume_extract_and_preview was awaited and that its kwargs include
confirmed_personal_websites matching the cWebsiteLink/existing_websites-derived
URLs, link_discord_payload matching input, and the same status_message ("🔄
Re-running profile extraction with website content...").
tests/unit/test_resume_profile_processor.py (1)
319-321: Assert the full enrichment payloads here.

These checks only verify status/origin. The downstream CRM flow also depends on label, url, and, for confirmation/failure cases, detail, so a bad producer payload can still pass this suite while the button/rendering logic breaks. Comparing the full ResumeSourceEnrichment entries would make this contract much tighter.

Also applies to: 397-404, 464-467
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_resume_profile_processor.py` around lines 319 - 321, The test
only asserts status and origin for result.source_enrichments; update it to
assert the full ResumeSourceEnrichment payloads (including label, url and detail
where applicable) so the contract is tighter. Replace the two partial checks
with a single equality check against the expected ResumeSourceEnrichment objects
(or their dict representations) for result.source_enrichments; e.g., build
expected entries with the same fields and compare whole objects/ dicts to
result.source_enrichments to catch malformed producer payloads. Apply the same
change to the other occurrences mentioned (the assertions around lines 397-404
and 464-467) so all enrichment assertions validate full payloads.
packages/shared/src/five08/resume_processing_models.py (1)
67-74: Constrain origin and status with Literal types to catch invalid values at the model boundary.

These fields implement a cross-module protocol between the processor and CRM UI, but plain str permits typos that only fail downstream when the UI attempts exact-match branching on status (e.g., lines 3709–3713 in crm.py). Constraining them with Literal enforces valid values at validation time.
♻️ Suggested change
-from typing import Any
+from typing import Any, Literal
@@
 class ResumeSourceEnrichment(BaseModel):
@@
-    origin: str
-    status: str
+    origin: Literal["crm", "resume_inference", "resume_confirmation"]
+    status: Literal["used", "failed", "confirmation_needed"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/resume_processing_models.py` around lines 67 - 74,
The ResumeSourceEnrichment model currently lets origin and status be arbitrary
strings which allows typos to slip past validation; change the type annotations
in class ResumeSourceEnrichment to use typing.Literal with the exact allowed
values for origin and status (e.g., origin: Literal["..."] and status:
Literal["..."]) so invalid values are rejected at model parse time, add the
necessary import for Literal (or typing_extensions.Literal for older Python),
and if there are central enums/constants for these protocol values prefer
referencing them (or mirror their values) so ResumeSourceEnrichment.origin and
.status enforce the same vocabulary used by the CRM UI/processor.
packages/shared/pyproject.toml (1)
7-7: Constrain curl-cffi to the API series this PR actually uses.

>=0.10.0 leaves installs open to any later curl-cffi release, but the new fetcher imports concrete curl_cffi.requests symbols and depends on specific streaming/context-manager behavior. While the specific API surface (BrowserTypeLiteral, RequestsError, and response.iter_content() streaming) has remained stable since 0.10.x, curl_cffi provides no explicit backward compatibility guarantee. Pinning the compatible series avoids a surprise break when that surface changes.
♻️ Suggested change
-    "curl-cffi>=0.10.0",
+    "curl-cffi~=0.10.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/pyproject.toml` at line 7, Replace the open-ended dependency
"curl-cffi>=0.10.0" with a constrained series specifier to lock to the 0.10 API
series used by the new fetcher; change the requirement to a range such as
">=0.10.0,<0.11.0" (or "~=0.10.0") so imports like curl_cffi.requests and
symbols BrowserTypeLiteral, RequestsError and response.iter_content() remain on
the tested API surface.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 1705-1714: The callback currently disables the preview buttons
unconditionally after calling _run_resume_extract_and_preview(), which leaves
the user without any usable controls when that reparse fails; change the flow so
you only disable and update the existing view when the reparse succeeded—have
_run_resume_extract_and_preview() return a success boolean (or raise on fatal
failure) and move the loop that sets item.disabled = True and the
interaction.message.edit(view=view) calls into the success branch (or after a
truthy return), keeping the existing NotFound/HTTPException handling around
interaction.message.edit; reference the function name
_run_resume_extract_and_preview and the interaction.message.edit(view=view)/for
item in view.children loop to locate where to gate the disabling.
- Around line 1693-1704: The call to _run_resume_extract_and_preview hardcodes
action="crm.upload_resume" and link_member=None, which loses original flow
context and drops Discord-role suggestion/apply behavior on reparses; modify the
invocation to preserve the original action and member context from the view
(e.g., use the view's stored action/flow identifier and the view's linked-member
payload instead of the hardcoded values or None) so reparses keep the same path
(e.g., reprocess vs upload) and retain role-suggestion logic when a linked
member exists.
- Around line 1791-1794: The ResumeEditWebsitesButton is only being added when
proposed_updates.get("cWebsiteLink") is present, which hides the website editor
for previews that need a new website; instead, after calling
self._refresh_website_reparse_candidates() always add the
ResumeEditWebsitesButton (via self.add_item(ResumeEditWebsitesButton())) so the
editor is available even when cWebsiteLink is not in proposed_updates; locate
the block around _refresh_website_reparse_candidates, remove or change the
conditional that checks proposed_updates.get("cWebsiteLink") and ensure add_item
is invoked unconditionally (or guarded only to avoid duplicate adds if needed).

In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 81-83: The current _HTML_META_DESC_RE regex only matches meta tags
where name/property precedes content and will miss tags with reversed attribute
order; replace this brittle pattern by adding a helper like
_extract_meta_description(html: str) -> str | None that iterates meta tags
(using re.finditer(r'(?is)<meta\s+([^>]+)>')), parses attributes per-tag
(searching for (?:name|property)\s*=\s*["\'](?:description|og:description)["\']
and content\s*=\s*["\']([^"\']*)["\'] with re.I) and returns the content value
when both are present, then update any code that used _HTML_META_DESC_RE to call
_extract_meta_description instead.

---

Nitpick comments:
In `@packages/shared/pyproject.toml`:
- Line 7: Replace the open-ended dependency "curl-cffi>=0.10.0" with a
constrained series specifier to lock to the 0.10 API series used by the new
fetcher; change the requirement to a range such as ">=0.10.0,<0.11.0" (or
"~=0.10.0") so imports like curl_cffi.requests and symbols BrowserTypeLiteral,
RequestsError and response.iter_content() remain on the tested API surface.

In `@packages/shared/src/five08/resume_processing_models.py`:
- Around line 67-74: The ResumeSourceEnrichment model currently lets origin and
status be arbitrary strings which allows typos to slip past validation; change
the type annotations in class ResumeSourceEnrichment to use typing.Literal with
the exact allowed values for origin and status (e.g., origin: Literal["..."] and
status: Literal["..."]) so invalid values are rejected at model parse time, add
the necessary import for Literal (or typing_extensions.Literal for older
Python), and if there are central enums/constants for these protocol values
prefer referencing them (or mirror their values) so
ResumeSourceEnrichment.origin and .status enforce the same vocabulary used by
the CRM UI/processor.

In `@tests/unit/test_crm.py`:
- Around line 6030-6082: The test only exercises the
ResumeUpdateConfirmationView button when source_enrichments provide the URLs;
add a second async test that constructs a ResumeUpdateConfirmationView with
existing_websites and edited/new cWebsiteLink values so
website_reparse_candidates is populated from those fields, locate the same
Reparse button (as in test_confirm_inferred_websites_button_reruns_preview) and
invoke its callback with a mocked confirm_interaction, then assert
crm_cog._run_resume_extract_and_preview was awaited and that its kwargs include
confirmed_personal_websites matching the cWebsiteLink/existing_websites-derived
URLs, link_discord_payload matching input, and the same status_message ("🔄
Re-running profile extraction with website content...").

In `@tests/unit/test_resume_profile_processor.py`:
- Around line 319-321: The test only asserts status and origin for
result.source_enrichments; update it to assert the full ResumeSourceEnrichment
payloads (including label, url and detail where applicable) so the contract is
tighter. Replace the two partial checks with a single equality check against the
expected ResumeSourceEnrichment objects (or their dict representations) for
result.source_enrichments; e.g., build expected entries with the same fields and
compare whole objects/ dicts to result.source_enrichments to catch malformed
producer payloads. Apply the same change to the other occurrences mentioned (the
assertions around lines 397-404 and 464-467) so all enrichment assertions
validate full payloads.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fd27fc3-11d6-40bc-80af-46c55d0f6361

📥 Commits

Reviewing files that changed from the base of the PR and between 3e51f24 and 7497b73.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py
packages/shared/pyproject.toml
packages/shared/src/five08/resume_processing_models.py
packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py
tests/unit/test_resume_profile_processor.py

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

packages/shared/src/five08/resume_profile_processor.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91804fe77b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T01:14:04Z

packages/shared/src/five08/resume_profile_processor.py

+    def _is_public_ip(value: IPAddress) -> bool:
+        return not (
+            value.is_private
+            or value.is_loopback
+            or value.is_link_local


Reject non-global addresses in public IP validation

The SSRF guard currently accepts any address that is not is_private/loopback/link-local/etc., which still permits non-global special-use ranges (for example 100.64.0.0/10, where ipaddress returns is_private=False but is_global=False). In that case, an attacker-controlled hostname can pass _resolve_public_profile_request_target and still be fetched by _fetch_external_profile_source_text, allowing requests to non-public network space. Please validate resolved IPs using is_global (or an equivalent explicit denylist) instead of only negating selected flags.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 1898-1902: _sync_message_view currently only updates the
interactive view and leaves the preview embed/stale content unchanged; update it
to re-render the preview embed from the current website state and include it in
the edit call. Specifically, obtain a fresh embed by calling the existing
preview-builder function (for example self._build_preview_embed or
self.build_preview_embed — use whichever function in this class constructs the
preview embed) and then call await self.preview_message.edit(view=self,
embed=new_embed); if no builder exists, reconstruct the embed from the current
state before editing, while keeping the existing None check and exception
handling.
- Around line 1716-1721: The current loop in the block that iterates
view.children only disables discord.ui.Button instances, leaving any select
menus (notably ResumeSeniorityOverrideSelect) still interactive; update the loop
where you check items in view.children (the code surrounding
interaction.message.edit and the for item in view.children:) to also disable
select menus by checking for discord.ui.Select and/or the custom
ResumeSeniorityOverrideSelect class and setting item.disabled = True so all
controls (buttons and selects) are rendered inert before calling
interaction.message.edit(view=view).
- Around line 1850-1879: _current_inferred_reparse_websites() inferred URLs are
always re-added in _refresh_website_reparse_candidates(), which reintroduces
URLs a user removed in the website editor; change the merge logic in
_refresh_website_reparse_candidates() so proposed_links (the user's edits) take
precedence over inferred candidates: compute the normalized key set for
proposed_links and filter out any inferred_candidates whose
normalized_website_identity_key appears in that set before combining with
proposed_additions, then normalize and assign to
self.website_reparse_candidates; reference functions/vars:
_refresh_website_reparse_candidates, _current_inferred_reparse_websites,
_normalize_website_links_for_reparse, normalized_website_identity_key,
proposed_links, proposed_additions, inferred_candidates, and
website_reparse_candidates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eaad4cff-5a76-4832-8191-4f11ac879c12

📥 Commits

Reviewing files that changed from the base of the PR and between 7497b73 and 91804fe.

📒 Files selected for processing (4)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py
packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py
tests/unit/test_resume_profile_processor.py

✅ Files skipped from review due to trivial changes (1)

tests/unit/test_resume_profile_processor.py

🚧 Files skipped from review as they are similar to previous changes (2)

packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

coderabbitai

🧹 Nitpick comments (1)

tests/unit/test_crm.py (1)

6101-6106: Prefer selecting the reparse control by button type, not label text.

Using label-based lookup here is brittle to copy changes and localization; type-based lookup is more stable.

♻️ Proposed refactor

         confirm_button = next(
             child
             for child in view.children
-            if isinstance(child, discord.ui.Button)
-            and child.label == "Reparse With Websites"
+            if isinstance(child, ResumeConfirmInferredWebsitesButton)
         )

         confirm_button = next(
             child
             for child in view.children
-            if isinstance(child, discord.ui.Button)
-            and child.label == "Reparse With Websites"
+            if isinstance(child, ResumeConfirmInferredWebsitesButton)
         )

Also applies to: 6166-6171

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_crm.py` around lines 6101 - 6106, The test currently finds
the "Reparse With Websites" button by matching label text which is brittle;
update the selection to find the discord.ui.Button by a stable identifier (e.g.,
custom_id) instead of label. Change the lookup that sets confirm_button to
iterate view.children and pick the child where isinstance(child,
discord.ui.Button) and child.custom_id == "reparse_with_websites" (or another
stable custom_id you add where the button is constructed), and make the
identical change at the other occurrence that currently uses the label match.
Ensure the actual button creation code sets that custom_id so the tests can
reliably select by it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/test_crm.py`:
- Around line 6101-6106: The test currently finds the "Reparse With Websites"
button by matching label text which is brittle; update the selection to find the
discord.ui.Button by a stable identifier (e.g., custom_id) instead of label.
Change the lookup that sets confirm_button to iterate view.children and pick the
child where isinstance(child, discord.ui.Button) and child.custom_id ==
"reparse_with_websites" (or another stable custom_id you add where the button is
constructed), and make the identical change at the other occurrence that
currently uses the label match. Ensure the actual button creation code sets that
custom_id so the tests can reliably select by it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4d994268-0a37-41a5-b8f2-dae3ba8f8a62

📥 Commits

Reviewing files that changed from the base of the PR and between 91804fe and 7aee2a7.

📒 Files selected for processing (4)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py
packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py
tests/unit/test_resume_profile_processor.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/test_resume_profile_processor.py
packages/shared/src/five08/resume_profile_processor.py
apps/discord_bot/src/five08/discord_bot/cogs/crm.py

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T01:28:18Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

    async def on_submit(self, interaction: discord.Interaction) -> None:
        raw = self.websites_input.value or ""
        links = [line.strip() for line in raw.splitlines() if line.strip()]
+        self.confirmation_view.website_edits_override_inferred_candidates = True
        if links:
            self.confirmation_view.proposed_updates["cWebsiteLink"] = links
        else:
            self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)
+        self.confirmation_view._refresh_website_reparse_candidates()
+        await self.confirmation_view._sync_message_view()
        count = len(links)
+        if self.confirmation_view.website_reparse_candidates:
+            action_hint = " Click **Reparse With Websites** to refresh the extraction."
+        else:
+            action_hint = " Click **Confirm Updates** to apply."
        await interaction.response.send_message(
            f"✅ Websites updated to {count} link{'s' if count != 1 else ''}. "
-            "Click **Confirm Updates** to apply.",
+            + action_hint,
            ephemeral=True,
        )


ResumeEditWebsitesModal.on_submit awaits _sync_message_view() (which performs a Discord message edit API call) before sending the interaction response. Modal submits must be acknowledged quickly; if the message edit is slow or rate-limited this can cause an "interaction failed" client error. Consider responding/defering first (e.g., interaction.response.defer(...) + followup) or scheduling _sync_message_view() asynchronously so the initial response is sent within the interaction deadline.

Copilot · 2026-03-26T01:28:19Z

packages/shared/src/five08/resume_profile_processor.py

+                    resolve_entry = f"{host}:{port}:" + ",".join(
+                        _format_curl_resolve_address(ip) for ip in resolved_ips
+                    )
+                    session: curl_requests.Session = curl_requests.Session(
+                        curl_options={CurlOpt.RESOLVE: [resolve_entry]}


CurlOpt.RESOLVE entries appear to be built as a single string with multiple resolved IPs joined by commas (f"{host}:{port}:" + ",".join(...)). libcurl’s RESOLVE/--resolve format is host:port:address per entry; multiple addresses typically require separate entries (or choosing one). As written, any hostname that resolves to multiple A/AAAA records may produce an invalid RESOLVE value and cause fetches to fail. Consider selecting a single validated IP (e.g., the first preferred IPv4/then IPv6) or constructing a list of one entry per IP in curl_options.

Suggested change

resolve_entry = f"{host}:{port}:" + ",".join(

_format_curl_resolve_address(ip) for ip in resolved_ips

)

session: curl_requests.Session = curl_requests.Session(

curl_options={CurlOpt.RESOLVE: [resolve_entry]}

resolve_entries = [

f"{host}:{port}:{_format_curl_resolve_address(ip)}"

for ip in resolved_ips

]

session: curl_requests.Session = curl_requests.Session(

curl_options={CurlOpt.RESOLVE: resolve_entries}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7aee2a7367

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T01:31:26Z

packages/shared/src/five08/resume_profile_processor.py

+                enrichments.append(
+                    ResumeSourceEnrichment(
+                        label="Personal Website",
+                        url=website_url,
+                        origin="resume_inference",


Recompute website confirmations after enriched re-extraction

confirmation_needed website enrichments are recorded from the first extraction pass, but this flow may immediately run a second extraction with inferred GitHub sources (_extract_profile_with_external_sources). If that second pass changes or removes website_links, the preview can still show stale inferred websites and offer “Reparse With Websites” for URLs that are no longer in the final extracted profile, which misleads operators and can trigger unnecessary external fetches.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4d4cf0705

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T01:46:34Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+        self._refresh_website_reparse_candidates()
+        self.add_item(ResumeEditWebsitesButton())
        if proposed_updates.get("cSocialLinks"):


Preserve existing websites when enabling manual website edits

The view now always adds ResumeEditWebsitesButton, including cases where no website update was proposed. In that path, ResumeEditWebsitesModal still initializes from proposed_updates["cWebsiteLink"] only, so contacts that already have CRM website links open with an empty editor; if an operator adds a single URL and confirms, the apply payload overwrites cWebsiteLink with only the new value and drops previously stored links. This creates an easy data-loss path whenever existing websites are present but extraction proposed none.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

packages/shared/src/five08/resume_profile_processor.py (1)
1149-1198: Apply the website cap to the combined candidate set.

Confirmed websites and CRM websites are sliced independently here, so one run can still enqueue up to four website fetches before GitHub is added. If PROFILE_SOURCE_MAX_WEBSITES is meant to bound prompt size and latency, merge/dedupe both website sources first and then apply the limit once.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/resume_profile_processor.py` around lines 1149 -
1198, In _build_initial_external_source_candidates, both
explicit_personal_websites and contact cWebsiteLink are sliced separately which
can exceed PROFILE_SOURCE_MAX_WEBSITES; instead, call _coerce_website_links on
both sources, map each URL to its normalized key using
normalized_website_identity_key, dedupe/merge them (prefer
explicit/resume_confirmation origin when a key appears in both), then take at
most PROFILE_SOURCE_MAX_WEBSITES from that merged list and construct
_ExternalProfileSourceCandidate entries (using origin "resume_confirmation" or
"crm" as chosen), and keep the existing GitHub handling unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1060-1063: The enrichment-aware calls to extractor.extract (the
one passing extra_sources and the later similar call) must fail open: wrap each
call in a try/except Exception as e so that any exception does not overwrite the
previously successful `extracted` result or force success=False; on exception
log/record the enrichment failure (e.g., set an `enrichment_error` or
`enrichment_failed` flag and capture the exception message) and continue using
the last good `extracted` value instead of blanking the preview or propagating
to the outer except; apply the same pattern to the second extractor.extract call
referenced around the later block.
- Around line 1098-1148: The current early return on existing_website_links in
_refresh_inferred_website_confirmation_enrichments prevents creating
confirmation_needed enrichments for new distinct sites; remove the return and
instead compute a set of normalized CRM website identity keys from
contact.get("cWebsiteLink") (using normalized_website_identity_key) and, when
iterating extracted.website_links[:PROFILE_SOURCE_MAX_WEBSITES], skip adding a
ResumeSourceEnrichment only if the normalized identity matches an existing CRM
identity or is in seen_source_keys or confirmed_personal_website_keys; otherwise
append the confirmation_needed enrichment and add the dedupe_source_key to
seen_source_keys so new distinct sites still get confirmation/reparse options.
- Around line 90-93: The regex _GITHUB_USERNAME_RE currently and use of search()
allow repo URLs like /org/repo to match and return the org as a username; change
the pattern to require the URL end (allow optional trailing slash) — e.g.
^(?:https?://)?(?:www\.)?github\.com/([A-Za-z0-9-]{1,39})/?$ — and replace uses
of search() with fullmatch()/match() (or use re.fullmatch with the new pattern)
so only profile URLs match; apply the same pattern+call-change to the other
occurrence referenced (the block around the second _GITHUB_USERNAME_RE usage).

---

Nitpick comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1149-1198: In _build_initial_external_source_candidates, both
explicit_personal_websites and contact cWebsiteLink are sliced separately which
can exceed PROFILE_SOURCE_MAX_WEBSITES; instead, call _coerce_website_links on
both sources, map each URL to its normalized key using
normalized_website_identity_key, dedupe/merge them (prefer
explicit/resume_confirmation origin when a key appears in both), then take at
most PROFILE_SOURCE_MAX_WEBSITES from that merged list and construct
_ExternalProfileSourceCandidate entries (using origin "resume_confirmation" or
"crm" as chosen), and keep the existing GitHub handling unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4040148a-7711-48f7-b71b-1390558169b5

📥 Commits

Reviewing files that changed from the base of the PR and between 7aee2a7 and f4d4cf0.

📒 Files selected for processing (4)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py
packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py
tests/unit/test_resume_profile_processor.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/test_resume_profile_processor.py
tests/unit/test_crm.py
apps/discord_bot/src/five08/discord_bot/cogs/crm.py

packages/shared/src/five08/resume_profile_processor.py

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T01:53:32Z

packages/shared/src/five08/resume_profile_processor.py

+
+        scheme = parsed.scheme.lower()
+        if scheme not in {"http", "https"}:
+            return "Profile URL must use http or https"
+        if parsed.username or parsed.password:
+            return "Profile URL must not include credentials"
+


_resolve_public_profile_request_target allows arbitrary explicit ports from profile URLs. Even with IP validation, this enables outbound port-scanning of public hosts (e.g. https://example.com:22) from the worker. Consider restricting allowed ports to 80/443 (or a small allowlist) and returning a clear validation error when an unsupported port is provided.

Copilot · 2026-03-26T01:53:32Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

        if links:
            self.confirmation_view.proposed_updates["cWebsiteLink"] = links
        else:
            self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)


The websites modal now seeds its default value from existing_websites when no cWebsiteLink update is proposed. If the user submits without making changes, on_submit will still set proposed_updates['cWebsiteLink'] to the existing list, creating a no-op CRM update (and can also affect downstream logic that treats presence of the key as an intentional change). Consider comparing the submitted normalized links against confirmation_view.existing_websites (by identity key) and, when they match, popping cWebsiteLink from proposed_updates instead of setting it.

Suggested change

if links:

self.confirmation_view.proposed_updates["cWebsiteLink"] = links

else:

self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

# Determine whether the submitted links are logically identical to the

# existing websites (by normalized identity key). If they are, avoid

# setting cWebsiteLink in proposed_updates so we don't create a no-op

# CRM update or signal an intentional change where there is none.

existing_websites = self.confirmation_view.existing_websites or []

def _identity_keys(urls: list[str]) -> list[str]:

return [

normalized_website_identity_key(normalize_website_url(u))

for u in urls

if str(u).strip()

]

submitted_keys = _identity_keys(links)

existing_keys = _identity_keys(existing_websites)

if submitted_keys == existing_keys:

# No logical change; ensure we don't propose an unnecessary update.

self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

else:

if links:

self.confirmation_view.proposed_updates["cWebsiteLink"] = links

else:

self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a1e8636b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T02:09:36Z

packages/shared/src/five08/resume_profile_processor.py

+        for website_url in extracted.website_links[:PROFILE_SOURCE_MAX_WEBSITES]:
+            source_key = normalized_website_identity_key(website_url)
+            if not source_key or source_key in existing_website_keys:


Filter existing websites before applying reparse cap

The reparse candidate loop limits extracted.website_links to PROFILE_SOURCE_MAX_WEBSITES before it excludes links already on the CRM contact. If the first two extracted links are already in cWebsiteLink and a new website appears later, that new site is never surfaced as confirmation_needed (and can’t be fetched/reparsed), so enrichment is silently skipped for valid resumes with more than two links. Apply the cap after filtering to new unique links.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T02:22:19Z

packages/shared/src/five08/resume_profile_processor.py

+        for candidate in candidates:
+            if candidate.source_key in seen_source_keys:
+                continue
+            seen_source_keys.add(candidate.source_key)
+
+            try:
+                fetched = self._fetch_external_profile_source_text(candidate.url)
+            except Exception as exc:
+                enrichments.append(
+                    ResumeSourceEnrichment(
+                        label=candidate.label,
+                        url=candidate.url,
+                        origin=candidate.origin,
+                        status="failed",
+                        detail=str(exc),
+                    )
+                )
+                continue


_fetch_external_profile_sources() marks candidate.source_key as seen before attempting the fetch. If two candidates share the same source_key (e.g., http vs https variants collapsing to the same normalized_website_identity_key) and the first fetch fails, the second candidate will be skipped, preventing a potentially successful fallback. Consider only adding to seen_source_keys after a successful fetch, or removing it again on exception so alternate candidates can be tried.

Copilot · 2026-03-26T02:22:20Z

packages/shared/src/five08/resume_profile_processor.py

+        explicit_website_links = self._coerce_website_links(explicit_personal_websites)
+        for website_url in explicit_website_links[:PROFILE_SOURCE_MAX_WEBSITES]:
+            source_key = normalized_website_identity_key(website_url)
+            if not source_key:
+                continue
+            candidates.append(
+                _ExternalProfileSourceCandidate(
+                    label="Personal Website",
+                    url=website_url,
+                    origin="resume_confirmation",
+                    source_key=f"website:{source_key}",
+                )
+            )
+
+        website_links = self._coerce_website_links(contact.get("cWebsiteLink"))[
+            :PROFILE_SOURCE_MAX_WEBSITES
+        ]


PROFILE_SOURCE_MAX_WEBSITES is applied separately to explicit_personal_websites and to CRM cWebsiteLink, so the initial enrichment pass can fetch up to 2 + 2 websites (plus GitHub). If the intent is a global cap on website fetches per resume, this should be enforced across both sources (or the constant renamed to reflect per-origin caps) to avoid unexpected extra network calls.

Copilot · 2026-03-26T02:22:20Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+            for url in existing_links
+            if normalized_website_identity_key(url)
+        ]
+        if submitted_keys == existing_keys:


The no-op detection compares submitted_keys == existing_keys as lists, which is order-sensitive. Submitting the same set of existing websites in a different order will be treated as an update and can trigger an unnecessary reparse/CRM update. Consider comparing a canonicalized representation (e.g., sorted keys or a set) instead of the raw list order.

Suggested change

if submitted_keys == existing_keys:

if set(submitted_keys) == set(existing_keys):

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py (1)
4177-4189: ⚠️ Potential issue | 🟠 Major

Make preview-path audit writes best-effort in this command flow.

This changed extraction/preview path still uses _audit_command(...) directly. If audit writes fail, the user flow can fail instead of returning the intended preview/error response.
🛡️ Suggested fix direction
-            self._audit_command(
+            self._audit_command_safe(
                 interaction=interaction,
                 action=action_name,
                 result="error",
                 metadata={
                     "filename": filename,
                     "attachment_id": attachment_id,
                     "stage": "extract_execute",
                     "error": str(exc),
                 },
                 resource_type="crm_contact",
                 resource_id=str(contact_id),
             )
Apply this consistently to audit calls inside _run_resume_extract_and_preview(...).
As per coding guidelines: "Audit logging must be best effort: command flows in Discord cogs should never fail if audit writes fail".

Also applies to: 4196-4207, 4223-4235, 4340-4351, 4379-4392, 4427-4440
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py` around lines 4177 -
4189, The audit writes in _run_resume_extract_and_preview currently call
self._audit_command(...) directly which can raise and break the preview/error
response; make these audit calls best-effort by wrapping each
self._audit_command(...) invocation in a try/except that catches all exceptions,
logs the failure (e.g., via self.logger.exception or similar) and continues
without re-raising so command flow still returns the preview/error response;
apply this change to every audit call inside _run_resume_extract_and_preview
(including the calls around extract_execute, the subsequent audit blocks at the
other noted sites) so audit failures never propagate to the user flow.

🧹 Nitpick comments (1)

packages/shared/src/five08/resume_profile_processor.py (1)

1346-1451: Significant code duplication between IP-literal and DNS-resolved fetch branches.

The response handling logic (redirect checks, status validation, content-type filtering, chunked body reading) is duplicated almost verbatim between the IP-literal branch (lines 1355-1392) and the DNS-resolved branch (lines 1412-1449). Consider extracting a helper method for the common response processing.

♻️ Suggested extraction of common response handling

+    def _handle_profile_response(
+        self,
+        response: Any,
+        current_url: str,
+    ) -> tuple[str | None, str | None]:
+        """Handle profile response, returning (extracted_text, redirect_url)."""
+        try:
+            if response.status_code in {301, 302, 303, 307, 308}:
+                redirect_to = response.headers.get("Location")
+                if not redirect_to:
+                    raise ValueError("Profile URL redirect missing Location header")
+                return None, urljoin(current_url, redirect_to)
+
+            response.raise_for_status()
+            content_type = str(response.headers.get("Content-Type", "")).lower()
+            content_length = response.headers.get("Content-Length")
+            if content_length:
+                try:
+                    content_length_value = int(content_length)
+                except (TypeError, ValueError):
+                    content_length_value = None
+                if (
+                    content_length_value is not None
+                    and content_length_value > PROFILE_SOURCE_MAX_BYTES
+                ):
+                    raise ValueError("Profile page exceeds size limit")
+
+            payload = bytearray()
+            for chunk in response.iter_content(chunk_size=8192):
+                if not chunk:
+                    continue
+                payload.extend(chunk)
+                if len(payload) > PROFILE_SOURCE_MAX_BYTES:
+                    raise ValueError("Profile page exceeds size limit")
+
+            return self._extract_profile_source_text(
+                body=bytes(payload),
+                content_type=content_type,
+            ), None
+        finally:
+            response.close()

Then use in both branches:

extracted_text, redirect_url = self._handle_profile_response(response, current_url)
if redirect_url:
    current_url = redirect_url
    continue
return extracted_text

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/resume_profile_processor.py` around lines 1346 -
1451, The two branches for IP-literal and DNS-resolved fetch duplicate the
entire response processing (redirect handling, raise_for_status, content-length
checks, iterative chunk reading and calls to _extract_profile_source_text);
extract that shared logic into a helper method (e.g.
_handle_profile_response(response, current_url)) that returns (extracted_text,
redirect_url) or raises on error, reuse it from both branches by calling it
after getting response, updating current_url and continuing on a redirect,
otherwise returning extracted_text; ensure the helper uses the same
symbols/semantics: response.iter_content, response.headers.get("Content-Type"),
response.headers.get("Content-Length"), PROFILE_SOURCE_MAX_BYTES and
_extract_profile_source_text, and still closes the response (or let callers
close) to preserve behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 896-907: The code compares submitted_keys and existing_keys as
ordered lists, causing false positives when the same website identities are
present in different orders; change the comparison to be order-insensitive by
comparing sets (or multisets if duplicates matter) of
normalized_website_identity_key results instead of lists. In practice, compute
submitted_keys and existing_keys as you already do, then check equality using
set(submitted_keys) == set(existing_keys) (or collections.Counter for
duplicate-aware comparison) and only call
self.confirmation_view.proposed_updates.pop("cWebsiteLink", None) when those
sets differ. Ensure you still use normalized_website_identity_key for both
submitted_links and existing_links so the identity logic remains consistent.

---

Outside diff comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 4177-4189: The audit writes in _run_resume_extract_and_preview
currently call self._audit_command(...) directly which can raise and break the
preview/error response; make these audit calls best-effort by wrapping each
self._audit_command(...) invocation in a try/except that catches all exceptions,
logs the failure (e.g., via self.logger.exception or similar) and continues
without re-raising so command flow still returns the preview/error response;
apply this change to every audit call inside _run_resume_extract_and_preview
(including the calls around extract_execute, the subsequent audit blocks at the
other noted sites) so audit failures never propagate to the user flow.

---

Nitpick comments:
In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 1346-1451: The two branches for IP-literal and DNS-resolved fetch
duplicate the entire response processing (redirect handling, raise_for_status,
content-length checks, iterative chunk reading and calls to
_extract_profile_source_text); extract that shared logic into a helper method
(e.g. _handle_profile_response(response, current_url)) that returns
(extracted_text, redirect_url) or raises on error, reuse it from both branches
by calling it after getting response, updating current_url and continuing on a
redirect, otherwise returning extracted_text; ensure the helper uses the same
symbols/semantics: response.iter_content, response.headers.get("Content-Type"),
response.headers.get("Content-Length"), PROFILE_SOURCE_MAX_BYTES and
_extract_profile_source_text, and still closes the response (or let callers
close) to preserve behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6c989ff-3493-4a26-92b8-b5eef1fee007

📥 Commits

Reviewing files that changed from the base of the PR and between f4d4cf0 and ab8bd35.

📒 Files selected for processing (4)

apps/discord_bot/src/five08/discord_bot/cogs/crm.py
packages/shared/src/five08/resume_profile_processor.py
tests/unit/test_crm.py
tests/unit/test_resume_profile_processor.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/test_crm.py

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b47fac97a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T02:29:25Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

        raw = self.websites_input.value or ""
        links = [line.strip() for line in raw.splitlines() if line.strip()]
-        if links:
+        self.confirmation_view.website_edits_override_inferred_candidates = True


Avoid dropping inferred websites on no-op modal submit

website_edits_override_inferred_candidates is set before checking whether the submitted website list actually changed, so a no-op submit (open modal, submit unchanged) still forces _refresh_website_reparse_candidates() to filter inferred confirmation_needed websites against an empty proposed set. In that case the reparse candidates disappear, the Reparse With Websites button is removed, and inferred website sources are hidden from the refreshed preview even though the user made no effective change, which can silently prevent the intended enrichment flow.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebeeb883ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T02:45:20Z

packages/shared/src/five08/resume_profile_processor.py

+            if fallback_extracted is not None:
+                return fallback_extracted
+            return self.extractor.extract(resume_text, extra_sources=None)


Reset enrichment usage after fail-open fallback

When enrichment extraction fails, this fallback re-runs parsing with extra_sources=None, but the previously fetched sources are still kept as status="used" and treated as already confirmed. In that path, the final preview can claim website/GitHub content was used even though the successful parse ignored external sources, and users lose a way to re-confirm/retry those sources because they are no longer surfaced as confirmation candidates.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T02:50:50Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+        elif links:
+            self.confirmation_view.website_edits_override_inferred_candidates = True
            self.confirmation_view.proposed_updates["cWebsiteLink"] = links
        else:
+            self.confirmation_view.website_edits_override_inferred_candidates = True
            self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)
+        self.confirmation_view._refresh_website_reparse_candidates()
+        await self.confirmation_view._sync_message_view()
        count = len(links)


In ResumeEditWebsitesModal.on_submit, you compute submitted_links using normalize_website_url, but when persisting the update you still write the raw links list into proposed_updates["cWebsiteLink"]. This can allow invalid/un-normalized entries to reach the apply step (where they may be dropped), and it can also make dedupe/no-op detection behave differently than what will actually be applied. Consider storing submitted_links (or rejecting invalid lines up-front) so the proposed update matches what will be applied/reparsed.

Suggested change

elif links:

self.confirmation_view.website_edits_override_inferred_candidates = True

self.confirmation_view.proposed_updates["cWebsiteLink"] = links

else:

self.confirmation_view.website_edits_override_inferred_candidates = True

self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

self.confirmation_view._refresh_website_reparse_candidates()

await self.confirmation_view._sync_message_view()

count = len(links)

elif submitted_links:

self.confirmation_view.website_edits_override_inferred_candidates = True

self.confirmation_view.proposed_updates["cWebsiteLink"] = submitted_links

else:

self.confirmation_view.website_edits_override_inferred_candidates = True

self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

self.confirmation_view._refresh_website_reparse_candidates()

await self.confirmation_view._sync_message_view()

count = len(submitted_links)

Copilot · 2026-03-26T02:50:50Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

        count = len(links)
-        await interaction.response.send_message(
+        if self.confirmation_view.has_reparse_candidates:
+            action_hint = (
+                " Click **Reparse With New Sources** to refresh the extraction."
+            )
+        else:
+            action_hint = " Click **Confirm Updates** to apply."
+        await interaction.followup.send(
            f"✅ Websites updated to {count} link{'s' if count != 1 else ''}. "
-            "Click **Confirm Updates** to apply.",
+            + action_hint,
            ephemeral=True,


The success message uses count = len(links) (raw user input) even when the submission is treated as a no-op (submitted_keys == existing_keys) or when some lines are invalid and get filtered out by normalize_website_url. This can misreport what will actually be applied and confuse users (e.g., “updated to 1 link” while cWebsiteLink was popped). Consider computing the count from the normalized/stored list and emitting a distinct message for the no-op case.

Copilot · 2026-03-26T02:50:50Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

            await interaction.followup.send(
                f"❌ Resume extraction failed: {result.get('error') or 'Unknown error'}",
                file=debug_file,


This failure path still says “Resume extraction failed” even though the command/UI has been renamed to “profile” throughout the rest of the flow. Updating this message to “Profile extraction failed” (and similar wording) will keep user-facing output consistent and reduce confusion.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1507e97320

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T02:55:17Z

packages/shared/src/five08/resume_profile_processor.py

+                if not has_external_profile_sources:
+                    raise ValueError("No resume or external profile sources available")
+                text = ""
+                normalized_filename = normalized_filename or "crm-profile-sources"


Allow LLM extraction when no resume attachment is present

When attachment_id is missing, this path forces text = "" and still proceeds with extraction from external sources. In ResumeProfileExtractor.extract, an empty resume_text short-circuits to heuristic mode, so the new source-only reprocess flow never uses LLM parsing even when website/GitHub content was fetched successfully. That materially reduces extraction quality for /reprocess-profile contacts without resumes and can miss fields operators expect from enriched sources.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-26T02:55:17Z

apps/discord_bot/src/five08/discord_bot/cogs/crm.py

+        github_username = str(contact.get("cGitHubUsername") or "").strip()
+        return bool(github_username)


Validate external-source fields before enabling reprocess flow

This gate treats any non-empty cGitHubUsername as a usable profile source, but the processor later applies stricter normalization and can reject malformed values. For contacts with no resume and an invalid GitHub value, the UI now routes into reprocess instead of upload handoff, then extraction fails with “No resume or external profile sources available,” creating a dead-end flow. Reuse the same normalization criteria as the processor when deciding has_profile_sources.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5273d00368

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T03:23:26Z

packages/shared/src/five08/resume_profile_processor.py

+                text = ""
+                normalized_filename = normalized_filename or "crm-profile-sources"


Feed external source text into fallback skill extraction

When no attachment is present this path sets text to an empty string, but later _coerce_profile_skill_result uses that same value for fallback skill extraction (skills_extractor.extract_skills(resume_text)) when the profile extractor returns no skills. In /reprocess-profile source-only runs, that means fallback skill parsing operates on "" even after website/GitHub content was successfully fetched, so skill updates can be silently missed. Use the fetched external-source corpus (or a combined source blob) instead of empty resume text for this fallback path.

Useful? React with 👍 / 👎.

Add website-aware resume reparsing

7497b73

Copilot AI review requested due to automatic review settings March 26, 2026 00:35

michaelmwu temporarily deployed to test March 26, 2026 00:35 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 26, 2026 00:36 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Address website reparse review feedback

91804fe

michaelmwu temporarily deployed to test March 26, 2026 01:08 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

apps/discord_bot/src/five08/discord_bot/cogs/crm.py Show resolved Hide resolved

apps/discord_bot/src/five08/discord_bot/cogs/crm.py Show resolved Hide resolved

apps/discord_bot/src/five08/discord_bot/cogs/crm.py Outdated Show resolved Hide resolved

Tighten website preview confirmation flow

7aee2a7

Copilot AI review requested due to automatic review settings March 26, 2026 01:23

michaelmwu temporarily deployed to test March 26, 2026 01:23 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 26, 2026 01:23 View session

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Copilot AI reviewed Mar 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Fix website reparse review follow-ups

f4d4cf0

michaelmwu temporarily deployed to test March 26, 2026 01:40 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Preserve existing websites in editor

19bcdb4

Copilot AI review requested due to automatic review settings March 26, 2026 01:48

michaelmwu temporarily deployed to test March 26, 2026 01:48 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 26, 2026 01:48 View session

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

packages/shared/src/five08/resume_profile_processor.py Show resolved Hide resolved

packages/shared/src/five08/resume_profile_processor.py Outdated Show resolved Hide resolved

packages/shared/src/five08/resume_profile_processor.py Outdated Show resolved Hide resolved

Harden profile enrichment parsing

4572ce1

michaelmwu temporarily deployed to test March 26, 2026 01:53 — with GitHub Actions Inactive

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Fix profile fetch response handling

9a1e863

michaelmwu temporarily deployed to test March 26, 2026 02:00 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Tighten website reparse validation

ab8bd35

Copilot AI review requested due to automatic review settings March 26, 2026 02:17

michaelmwu temporarily deployed to test March 26, 2026 02:17 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 26, 2026 02:17 View session

Clean up external source preview text

b47fac9

michaelmwu temporarily deployed to test March 26, 2026 02:19 — with GitHub Actions Inactive

Copilot AI reviewed Mar 26, 2026

View reviewed changes

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

apps/discord_bot/src/five08/discord_bot/cogs/crm.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Combine resume source reparse flow

ebeeb88

michaelmwu temporarily deployed to test March 26, 2026 02:34 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Support profile reprocess from sources

1507e97

Copilot AI review requested due to automatic review settings March 26, 2026 02:46

michaelmwu temporarily deployed to test March 26, 2026 02:46 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 26, 2026 02:46 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Tighten source-only profile reprocess

5273d00

michaelmwu temporarily deployed to test March 26, 2026 03:11 — with GitHub Actions Inactive

michaelmwu merged commit f0ed42c into main Mar 26, 2026
5 checks passed

michaelmwu deleted the michaelmwu/profile-web-fetch branch March 26, 2026 03:17

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

		if isinstance(preview_message, discord.Message):
		view.preview_message = preview_message

-        if links:
-            self.confirmation_view.proposed_updates["cWebsiteLink"] = links
-        else:
-            self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)
+        # Determine whether the submitted links are logically identical to the
+        # existing websites (by normalized identity key). If they are, avoid
+        # setting cWebsiteLink in proposed_updates so we don't create a no-op
+        # CRM update or signal an intentional change where there is none.
+        existing_websites = self.confirmation_view.existing_websites or []
+        def _identity_keys(urls: list[str]) -> list[str]:
+            return [
+                normalized_website_identity_key(normalize_website_url(u))
+                for u in urls
+                if str(u).strip()
+            ]
+        submitted_keys = _identity_keys(links)
+        existing_keys = _identity_keys(existing_websites)
+        if submitted_keys == existing_keys:
+            # No logical change; ensure we don't propose an unnecessary update.
+            self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)
+        else:
+            if links:
+                self.confirmation_view.proposed_updates["cWebsiteLink"] = links
+            else:
+                self.confirmation_view.proposed_updates.pop("cWebsiteLink", None)

	if submitted_keys == existing_keys:
	if set(submitted_keys) == set(existing_keys):

		github_username = str(contact.get("cGitHubUsername") or "").strip()
		return bool(github_username)

		text = ""
		normalized_filename = normalized_filename or "crm-profile-sources"

Conversation

michaelmwu commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

How Has This Been Tested?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

michaelmwu commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading