Skip to content

refactor CRM resume processing flow#206

Merged
michaelmwu merged 6 commits into
mainfrom
michaelmwu/dry-crm-constants
Mar 19, 2026
Merged

refactor CRM resume processing flow#206
michaelmwu merged 6 commits into
mainfrom
michaelmwu/dry-crm-constants

Conversation

@michaelmwu
Copy link
Copy Markdown
Member

@michaelmwu michaelmwu commented Mar 18, 2026

Description

Move resume extract/apply logic into shared code so the Discord bot can run synchronous resume flows directly while the worker reuses the same processor for background paths.
Share the resume processing models, reuse parsed resume hints, and DRY the bot-side resume validation and create-contact payload building.
Simplify CRM field handling to the concrete Espo fields and formats we actually use, including cLinkedIn, cOnboarder, cOnboardingState, Espo date/datetime parsing, and PDF/DOCX-only resume uploads.

Related Issue

None.

How Has This Been Tested?

uv run pytest tests/unit/test_resume_profile_processor.py tests/unit/test_crm.py tests/unit/test_worker_mailbox_resume_ingest.py
uv run ruff check apps/worker/src/five08/worker/models.py apps/worker/src/five08/worker/crm/resume_profile_processor.py packages/shared/src/five08/resume_processing_models.py packages/shared/src/five08/resume_profile_processor.py apps/discord_bot/src/five08/discord_bot/cogs/crm.py tests/unit/test_crm.py tests/unit/test_resume_profile_processor.py

Summary by CodeRabbit

  • New Features

    • Resume upload validation: accept only PDF/DOCX up to 10 MB with clear user-facing errors and audit logging.
    • Improved skills extraction that returns confidence and per-skill strength indicators.
  • Improvements

    • Faster, direct resume processing for quicker profile previews and CRM updates.
    • Standardized LinkedIn handling and clearer onboarding/status wording across UIs and messages.

Copilot AI review requested due to automatic review settings March 18, 2026 22:30
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

Warning

Rate limit exceeded

@michaelmwu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 19 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa9a9b80-83fd-4a73-9205-5068db36ecf5

📥 Commits

Reviewing files that changed from the base of the PR and between a1cf65c and 86b4e1c.

📒 Files selected for processing (1)
  • packages/shared/src/five08/resume_skills_extractor.py
📝 Walkthrough

Walkthrough

Consolidates resume processing into shared packages: adds shared models, document/ and skills processors, and a ResumeProfileProcessor; simplifies worker to a thin wrapper; and refactors the Discord CRMCog to call processors directly with new validation, constants, and hints propagation.

Changes

Cohort / File(s) Summary
Shared library — models & processors
packages/shared/src/five08/resume_processing_models.py, packages/shared/src/five08/resume_profile_processor.py, packages/shared/src/five08/resume_skills_extractor.py, packages/shared/src/five08/resume_document_processor.py
Adds typed Pydantic models, a ResumeProcessorConfig and ResumeProfileProcessor (extract/apply flows, optional DB persistence), a SkillsExtractor (LLM + heuristic fallback), and a DocumentProcessor (validation, extraction, caching).
Worker integration
apps/worker/src/five08/worker/crm/resume_profile_processor.py, apps/worker/src/five08/worker/models.py, apps/worker/src/five08/worker/config.py
Replaces large local processor with a thin subclass delegating to shared ResumeProfileProcessor; worker models aliased to shared models; removed configurable crm_linkedin_field; default allowed_file_types tightened to pdf,docx.
Discord bot CRM cog
apps/discord_bot/src/five08/discord_bot/cogs/crm.py
Rewires from queued jobs to direct in-process calls: adds _create_resume_profile_processor, _extract_resume_profile_direct, _apply_resume_profile_direct, _validate_resume_attachment, resume constants (allowed extensions, max size, date formats, LINKEDIN_FIELD), and threads hints through inference/summary builders.
Intake & people sync
apps/worker/src/five08/worker/crm/intake_form_processor.py, apps/worker/src/five08/worker/crm/people_sync.py
Introduces LINKEDIN_FIELD = "cLinkedIn" and replaces dynamic settings.crm_linkedin_field usage with the public constant.
Tests
tests/unit/test_crm.py, tests/unit/test_resume_profile_processor.py, tests/unit/test_skills_extractor.py, tests/unit/test_worker_config.py
Updates and adds tests for field renames (onboarding, LinkedIn), direct extract/apply flows, resume validation (file types/size), hint propagation, ResumeProcessorConfig behavior, skills heuristic extraction, and worker defaults.

Sequence Diagram

sequenceDiagram
    participant Bot as Discord Bot (CRMCog)
    participant Proc as ResumeProfileProcessor (Shared)
    participant Extractor as ResumeProfileExtractor (Shared)
    participant Skills as SkillsExtractor (Shared)
    participant CRM as EspoClient (CRM)
    participant DB as Postgres

    rect rgba(100,200,150,0.5)
    Note over Bot,DB: Resume Extraction Flow (Direct, In‑Process)
    Bot->>Proc: extract_profile_proposal(contact_id, attachment_id, filename)
    Proc->>CRM: fetch contact details
    Proc->>Proc: download & validate attachment
    Proc->>Extractor: extract text & parsed fields
    Proc->>Skills: extract skills from text
    Proc->>Proc: normalize/merge with CRM data
    Proc-->>Bot: return ResumeExtractionResult
    end

    rect rgba(100,150,200,0.5)
    Note over Bot,DB: Resume Application Flow (Direct, In‑Process)
    Bot->>Proc: apply_profile_updates(contact_id, updates, link_discord?)
    Proc->>CRM: bulk update contact fields
    alt Bulk update succeeds
        Proc->>CRM: verify updated fields
    else Bulk update fails
        Proc->>CRM: apply per-field with error collection
    end
    Proc->>DB: persist processing run (idempotent, if configured)
    Proc-->>Bot: return ResumeApplyResult
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Poem

🐰 I hopped through code with nimble paws,

Shared processors stitched without a pause.
Bot and worker now share the same trail,
Resumes parsed clean — no more a long tail.
Carrots for tests, and cheers for the merge!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor CRM resume processing flow' accurately summarizes the main change: restructuring resume processing to move logic into shared code and simplify field handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch michaelmwu/dry-crm-constants
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the CRM resume extraction/apply workflow into shared code so both the Discord bot and worker reuse the same processor, while also simplifying CRM field handling to the concrete Espo fields and formats currently in use.

Changes:

  • Introduces shared resume processing models and a shared ResumeProfileProcessor (plus skills/doc processing helpers).
  • Updates worker to use a compatibility shim that delegates to the shared processor.
  • Updates Discord bot CRM cog to run extract/apply directly (threaded), tightens resume upload validation (PDF/DOCX), and simplifies onboarding fields/date formatting.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/test_resume_profile_processor.py Adjusts tests around configurable LinkedIn field behavior in the worker processor.
tests/unit/test_crm.py Updates tests for onboarding field names/date formatting and stricter resume upload type validation; adds coverage for reusing inferred hints.
packages/shared/src/five08/resume_skills_extractor.py Adds shared skills extraction (LLM + heuristic fallback).
packages/shared/src/five08/resume_profile_processor.py Adds shared end-to-end resume extract/apply processor and configuration.
packages/shared/src/five08/resume_processing_models.py Adds shared Pydantic models for resume processing results and change tracking.
packages/shared/src/five08/resume_document_processor.py Adds shared document validation/text extraction with caching.
apps/worker/src/five08/worker/models.py Re-exports shared resume processing models to keep worker API/types consistent.
apps/worker/src/five08/worker/crm/resume_profile_processor.py Replaces worker implementation with a thin shim over the shared processor.
apps/discord_bot/src/five08/discord_bot/cogs/crm.py Switches from backend-job flow to direct shared processor calls; simplifies onboarding fields; adds centralized resume attachment validation; reuses parsed hints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +61 to +63
allowed_file_extensions: set[str] = field(
default_factory=lambda: {"pdf", "docx", "txt"}
)
Comment on lines +668 to +695
return ResumeApplyResult(
contact_id=contact_id,
updated_fields=sorted(updated_fields),
updated_values={
field: approved_updates[field]
for field in sorted(updated_fields)
if field in approved_updates
},
link_discord_applied=link_applied,
success=True,
)

if updated_fields:
return ResumeApplyResult(
contact_id=contact_id,
updated_fields=sorted(updated_fields),
updated_values={
field: approved_updates[field]
for field in sorted(updated_fields)
if field in approved_updates
},
link_discord_applied=link_applied,
success=False,
error="; ".join(batch_errors)
if batch_errors
else "Some fields did not persist after update",
)

Comment on lines +66 to +69
RESUME_ALLOWED_EXTENSIONS = frozenset({".pdf", ".docx"})
RESUME_ALLOWED_EXTENSIONS_LABEL = "PDF or DOCX"
RESUME_MAX_FILE_SIZE_BYTES = 10 * 1024 * 1024
RESUME_MAX_FILE_SIZE_MB = RESUME_MAX_FILE_SIZE_BYTES // (1024 * 1024)
Comment on lines +128 to +133
detected: set[str] = set()
for token in token_matches:
canonical = self._normalize_skill_name(token)
if canonical in COMMON_SKILLS and canonical not in DISALLOWED_SKILLS:
detected.add(canonical)

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
packages/shared/src/five08/resume_skills_extractor.py (1)

86-107: Consider adding a timeout for the OpenAI API call.

The LLM call lacks an explicit timeout. While worker-level timeouts may apply, adding a request-level timeout would provide more predictable failure modes.

💡 Optional: Add timeout parameter
             response = self.client.chat.completions.create(
                 model=self.model,
                 messages=[
                     {
                         "role": "system",
                         "content": (
                             "You extract professional skills from resumes for a CRM. "
                             "Focus on white-collar skills for product development orgs: "
                             "engineering, product, data, design, growth, and marketing. "
                             "Return JSON only, no prose. "
                             "Normalize skills to concise canonical names, lowercase. "
                             "Provide a strength from 1-5 when known, where 5 is strongest. "
                             "If uncertain, you may omit it or leave it blank. "
                             "Bias 3 for simple mentions, 4-5 for recent/current project usage, "
                             "and 1-2 for weak, outdated, or minimal exposure."
                         ),
                     },
                     {"role": "user", "content": prompt},
                 ],
                 temperature=0.1,
                 max_tokens=1200,
+                timeout=30.0,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/resume_skills_extractor.py` around lines 86 - 107,
The OpenAI call in self.client.chat.completions.create lacks a request-level
timeout; update the call to include a timeout parameter (e.g.,
timeout=self.request_timeout or a literal like timeout=30) so the request fails
fast instead of relying solely on worker-level timeouts, and add a configurable
attribute (e.g., self.request_timeout) on the enclosing class or method
signature so callers can set it; ensure you pass the timeout alongside
temperature and max_tokens in the same call and handle any timeout exceptions
where response is consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Line 7993: The command help text still reads "Resume file to upload (PDF, DOC,
or DOCX)" but the validator now rejects DOC files; update the help/description
string for the command parameter named "file" (the argument whose value is
currently set to file="Resume file to upload (PDF, DOC, or DOCX)") to reflect
the actual allowed types (e.g., "Resume file to upload (PDF or DOCX)"). Ensure
you only change that description string so it matches the validator behavior.
- Around line 4737-4745: The audit write in the resume validation failure path
currently calls self._audit_command(...) which can raise and abort the Discord
command; change this to use the best-effort helper self._audit_command_safe(...)
(or wrap the call in a try/except that swallows/logs audit errors) when logging
the failure (preserving interaction, action, result=failure_result and metadata
with "filename" and "reason":"invalid_file_type"); do the same replacement for
the other nearby resume-validation audit call that uses _audit_command so audit
failures do not propagate.
- Around line 66-70: The bot duplicates resume validation rules; replace the
hard-coded RESUME_ALLOWED_EXTENSIONS, RESUME_ALLOWED_EXTENSIONS_LABEL,
RESUME_MAX_FILE_SIZE_BYTES and RESUME_MAX_FILE_SIZE_MB with values derived from
the shared processor config so rules never drift: import or call
ResumeProcessorConfig.from_settings(...) (or the shared config provider used
elsewhere) and set the bot's allowed extensions, label and max file size from
its properties (convert bytes->MB for RESUME_MAX_FILE_SIZE_MB and format the
label from the extensions list) so the bot-side checks use the exact same rules
as ResumeProcessorConfig; update all other duplicated blocks (the other
occurrences noted) to follow the same pattern.

In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 680-694: The block that returns a ResumeApplyResult when
updated_fields is truthy is unreachable because an earlier return already
handles the same condition; remove this duplicate if updated_fields: return
ResumeApplyResult(...) block (which references updated_fields, approved_updates,
link_discord_applied, batch_errors) to eliminate dead code and keep only the
intended single return path for updated_fields in resume_profile_processor.py.

---

Nitpick comments:
In `@packages/shared/src/five08/resume_skills_extractor.py`:
- Around line 86-107: The OpenAI call in self.client.chat.completions.create
lacks a request-level timeout; update the call to include a timeout parameter
(e.g., timeout=self.request_timeout or a literal like timeout=30) so the request
fails fast instead of relying solely on worker-level timeouts, and add a
configurable attribute (e.g., self.request_timeout) on the enclosing class or
method signature so callers can set it; ensure you pass the timeout alongside
temperature and max_tokens in the same call and handle any timeout exceptions
where response is consumed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c07a903f-2104-4ada-b52f-7bce7ebe217d

📥 Commits

Reviewing files that changed from the base of the PR and between 77f44c4 and a357680.

📒 Files selected for processing (9)
  • apps/discord_bot/src/five08/discord_bot/cogs/crm.py
  • apps/worker/src/five08/worker/crm/resume_profile_processor.py
  • apps/worker/src/five08/worker/models.py
  • packages/shared/src/five08/resume_document_processor.py
  • packages/shared/src/five08/resume_processing_models.py
  • packages/shared/src/five08/resume_profile_processor.py
  • packages/shared/src/five08/resume_skills_extractor.py
  • tests/unit/test_crm.py
  • tests/unit/test_resume_profile_processor.py

Comment thread apps/discord_bot/src/five08/discord_bot/cogs/crm.py Outdated
Comment thread apps/discord_bot/src/five08/discord_bot/cogs/crm.py Outdated
Comment thread apps/discord_bot/src/five08/discord_bot/cogs/crm.py Outdated
Comment thread packages/shared/src/five08/resume_profile_processor.py Outdated
Copilot AI review requested due to automatic review settings March 18, 2026 22:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the resume extraction/apply flow so both the Discord bot (synchronous path) and the worker (background path) share the same resume processor/config/models, while also tightening CRM field expectations (e.g., cLinkedIn, cOnboarder, cOnboardingState) and restricting resume uploads to PDF/DOCX.

Changes:

  • Introduces shared resume processing primitives (ResumeProcessorConfig, ResumeProfileProcessor, typed result models) under packages/shared.
  • Updates the worker to use a compatibility shim that delegates to the shared processor.
  • Updates the Discord bot to run extract/apply directly (no backend job enqueue/polling), adds centralized resume attachment validation, and adjusts onboarding/LinkedIn field handling.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_skills_extractor.py Adds coverage for shared heuristic multi-word skill detection.
tests/unit/test_resume_profile_processor.py Adds tests for shared config clamping + partial-apply warning behavior.
tests/unit/test_crm.py Updates bot behavior tests for direct extract/apply, onboarding field changes, upload validation, and hint reuse.
packages/shared/src/five08/resume_skills_extractor.py New shared skills extractor with LLM + heuristic fallback.
packages/shared/src/five08/resume_profile_processor.py New shared end-to-end resume extract/apply processor and config.
packages/shared/src/five08/resume_processing_models.py New shared Pydantic models for extraction/apply payloads/results.
packages/shared/src/five08/resume_document_processor.py New shared document processor for PDF/DOCX validation + text extraction.
apps/worker/src/five08/worker/models.py Switches worker models to re-export shared resume processing models.
apps/worker/src/five08/worker/crm/resume_profile_processor.py Replaces worker implementation with shim subclassing shared processor.
apps/worker/src/five08/worker/config.py Restricts allowed resume types to pdf,docx.
apps/discord_bot/src/five08/discord_bot/cogs/crm.py Moves bot resume flows to direct shared processor calls, adds resume validation helper, and simplifies CRM field assumptions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +659 to +662
crm_update_payload = dict(approved_updates)
crm_update_payload["cResumeLastProcessed"] = datetime.now(
tz=timezone.utc
).strftime("%Y-%m-%d %H:%M:%S")
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 66-69: Replace the hard-coded LINKEDIN_FIELD constant with a value
read from the shared processor config (use the existing config key
crm_linkedin_field) and fall back to "cLinkedIn" if not present; specifically,
remove or override the top-level LINKEDIN_FIELD constant and instead initialize
the field name from the processor/shared config where EspoClient or CRM
operations reference LINKEDIN_FIELD (search for the symbol LINKEDIN_FIELD and
usages in crm-related functions/classes), then update all occurrences (including
the other noted locations) to use the runtime-configured field name so LinkedIn
search/update/label behavior respects crm_linkedin_field overrides.

In `@packages/shared/src/five08/resume_profile_processor.py`:
- Around line 737-746: The code returns attempted updates in
updated_fields/updated_values even when no writes succeeded; change the
ResumeApplyResult construction in the branch where no updates succeeded to set
updated_fields to an empty list and updated_values to an empty dict (instead of
using approved_updates), while keeping contact_id, link_discord_applied
(link_applied), success=False and error built from batch_errors; ensure
ResumeApplyResult(...) uses
sorted(approved_updates.keys())/dict(approved_updates) only in the successful
path where at least one write actually succeeded.
- Around line 629-637: The early return when approved_updates is empty prevents
the subsequent Discord-link enrichment from running, which rejects link-only
submissions; update the function containing the approved_updates check (the
block that returns a ResumeApplyResult with success=False and error="No valid
profile fields provided") to either remove or defer that return so that the
Discord enrichment logic that sets cDiscordUserID and cDiscordUsername still
executes when approved_updates is empty, and ensure the final ResumeApplyResult
reflects any discord-only updates before returning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3089bfe2-a08e-403e-8179-bb417ebf7fd4

📥 Commits

Reviewing files that changed from the base of the PR and between a357680 and 246e2d2.

📒 Files selected for processing (8)
  • apps/discord_bot/src/five08/discord_bot/cogs/crm.py
  • apps/worker/src/five08/worker/config.py
  • packages/shared/src/five08/resume_processing_models.py
  • packages/shared/src/five08/resume_profile_processor.py
  • packages/shared/src/five08/resume_skills_extractor.py
  • tests/unit/test_crm.py
  • tests/unit/test_resume_profile_processor.py
  • tests/unit/test_skills_extractor.py

Comment on lines +66 to 69
ESPO_DATE_FORMAT = "%Y-%m-%d"
ESPO_DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S"
LINKEDIN_FIELD = "cLinkedIn"
EspoClient = espo.EspoClient
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard-coding the LinkedIn CRM field name.

Line 68 pins LINKEDIN_FIELD to "cLinkedIn", but the shared processor config supports crm_linkedin_field overrides. This can break LinkedIn search/update/label behavior in deployments using a different CRM field.

🔧 Proposed fix
-LINKEDIN_FIELD = "cLinkedIn"
+LINKEDIN_FIELD = (
+    str(getattr(settings, "crm_linkedin_field", "cLinkedIn")).strip() or "cLinkedIn"
+)

Also applies to: 1695-1697, 5171-5172, 5354-5355, 7196-7197

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py` around lines 66 - 69,
Replace the hard-coded LINKEDIN_FIELD constant with a value read from the shared
processor config (use the existing config key crm_linkedin_field) and fall back
to "cLinkedIn" if not present; specifically, remove or override the top-level
LINKEDIN_FIELD constant and instead initialize the field name from the
processor/shared config where EspoClient or CRM operations reference
LINKEDIN_FIELD (search for the symbol LINKEDIN_FIELD and usages in crm-related
functions/classes), then update all occurrences (including the other noted
locations) to use the runtime-configured field name so LinkedIn
search/update/label behavior respects crm_linkedin_field overrides.

Comment thread packages/shared/src/five08/resume_profile_processor.py Outdated
Comment on lines +737 to +746
return ResumeApplyResult(
contact_id=contact_id,
updated_fields=sorted(approved_updates.keys()),
updated_values=dict(approved_updates),
link_discord_applied=link_applied,
success=False,
error="; ".join(batch_errors)
if batch_errors
else "No fields were updated",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not report failed writes as updated fields.

This branch runs when no field update succeeded, but it currently returns updated_fields/updated_values populated from attempted inputs. That misleads callers into treating failed updates as applied.

Suggested fix
             return ResumeApplyResult(
                 contact_id=contact_id,
-                updated_fields=sorted(approved_updates.keys()),
-                updated_values=dict(approved_updates),
+                updated_fields=[],
+                updated_values={},
                 link_discord_applied=link_applied,
                 success=False,
                 error="; ".join(batch_errors)
                 if batch_errors
                 else "No fields were updated",
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/five08/resume_profile_processor.py` around lines 737 -
746, The code returns attempted updates in updated_fields/updated_values even
when no writes succeeded; change the ResumeApplyResult construction in the
branch where no updates succeeded to set updated_fields to an empty list and
updated_values to an empty dict (instead of using approved_updates), while
keeping contact_id, link_discord_applied (link_applied), success=False and error
built from batch_errors; ensure ResumeApplyResult(...) uses
sorted(approved_updates.keys())/dict(approved_updates) only in the successful
path where at least one write actually succeeded.

Copilot AI review requested due to automatic review settings March 18, 2026 23:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the resume extraction/apply workflow into shared modules so both the Discord bot and the worker can reuse the same processor, while also standardizing EspoCRM field usage (notably cLinkedIn, cOnboarder, cOnboardingState) and tightening resume upload validation to PDF/DOCX only.

Changes:

  • Introduces shared resume processing models + processor + skills extractor in packages/shared, and updates the worker to use a compatibility shim.
  • Updates the Discord bot to run resume extract/apply flows directly (sync via asyncio.to_thread) and reuses parsed resume hints in failure summaries.
  • Simplifies CRM field handling and updates tests to reflect new onboarding/linkedin fields and new resume validation rules.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_worker_config.py Updates worker defaults tests to reflect removal of configurable LinkedIn field mapping.
tests/unit/test_skills_extractor.py Adds coverage for shared heuristic skill extraction detecting multi-word phrases.
tests/unit/test_resume_profile_processor.py Adds shared config tests, removes configurable LinkedIn field tests, and expands apply-path behavior tests (aliases, link-only, partial success).
tests/unit/test_crm.py Updates bot CRM tests for new onboarding fields, Espo datetime formatting, resume validation (PDF/DOCX-only), hint reuse, and direct extract/apply paths.
packages/shared/src/five08/resume_skills_extractor.py Adds shared skills extractor with LLM + heuristic fallback.
packages/shared/src/five08/resume_profile_processor.py Adds shared end-to-end resume extraction + CRM apply workflow and persistence of processing runs.
packages/shared/src/five08/resume_processing_models.py Adds shared Pydantic models for extract/apply results and change/skip reporting.
packages/shared/src/five08/resume_document_processor.py Adds shared document text extraction/validation + hashing/caching.
apps/worker/src/five08/worker/models.py Re-exports shared resume processing models instead of duplicating them in worker models.
apps/worker/src/five08/worker/crm/resume_profile_processor.py Replaces worker processor implementation with a thin shim over the shared processor/config.
apps/worker/src/five08/worker/crm/people_sync.py Standardizes LinkedIn field handling to cLinkedIn.
apps/worker/src/five08/worker/crm/intake_form_processor.py Standardizes intake payload LinkedIn field mapping to cLinkedIn.
apps/worker/src/five08/worker/config.py Removes configurable LinkedIn field and restricts allowed resume types to pdf,docx.
apps/discord_bot/src/five08/discord_bot/cogs/crm.py Switches resume flows to direct shared processing, standardizes onboarding/linkedin fields, adds resume attachment validation helper, and reuses parsed hints in summaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +24
self.allowed_extensions = {ext.strip().lower() for ext in allowed_extensions}
self.max_file_size = max(1, int(max_file_size_mb)) * 1024 * 1024
self._content_cache: dict[str, str] = {}

Comment on lines +211 to +243
def _parse_strength(self, value: Any) -> int | None:
raw: Any = value
if isinstance(value, dict):
raw = value.get("strength")
if raw is None:
return None
if isinstance(raw, str) and not raw.strip():
return None
try:
numeric = int(float(raw))
except Exception:
return None
if numeric < 1 or numeric > 5:
return None
return numeric

def _parse_skill_with_strength(self, value: str) -> tuple[str, int | None]:
raw = value.strip()
match = re.match(r"^(.*)\(\s*(\d*)\s*\)\s*$", raw)
if match is None:
return self._normalize_skill_name(raw), None

base = match.group(1).strip()
parsed_strength = self._parse_strength(match.group(2))
if not base:
return "", None
normalized_base = self._normalize_skill_name(base)
if not normalized_base:
return "", None
if parsed_strength is None:
return normalized_base, None
return normalized_base, parsed_strength

@michaelmwu michaelmwu merged commit 28bdde5 into main Mar 19, 2026
4 checks passed
@michaelmwu michaelmwu deleted the michaelmwu/dry-crm-constants branch March 19, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants