fix: improve resume location parsing and editing#180
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds end-to-end location editing: UI modal/button, normalization utilities, resume-extraction heuristics for city/state/country/timezone, and tests; integrates location rendering/collapsing into the CRM update confirmation view and preview generation. Changes
Sequence DiagramsequenceDiagram
autonumber
actor User
participant DiscordUI as "Discord UI\n(Button)"
participant Modal as "Location Modal\n(Form)"
participant Normalizer as "Normalization\n(crm_normalization)"
participant CRM as "CRM View\n(ResumeUpdateConfirmationView)"
participant DB as "Database"
User->>DiscordUI: Click Edit Location
DiscordUI->>Modal: Open (pre-populate values)
User->>Modal: Submit city/state/country/timezone
Modal->>Normalizer: normalize_city(state/city)
Normalizer-->>Modal: normalized_city
Modal->>Normalizer: normalize_state(input)
Normalizer-->>Modal: normalized_state
Modal->>Normalizer: normalize_country(input)
Normalizer-->>Modal: normalized_country
Modal->>Normalizer: normalize_timezone(input)
Normalizer-->>Modal: normalized_timezone
Modal->>CRM: Return proposed_updates (location fields)
CRM->>CRM: _has_location_updates()
CRM->>CRM: _format_location_summary()
CRM->>DB: Persist proposed_updates
CRM-->>User: Display Proposed Changes with consolidated "Location" line
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR tightens resume/CRM location normalization (rejecting invalid countries and obvious garbage in city/state) and adds Discord UI affordances to review/edit location fields before applying updates to the CRM.
Changes:
- Strengthen location normalization: canonical-country validation, US state abbreviation expansion, and plausibility checks for city/state text.
- Improve resume location extraction to better repair two-part non-country locations (city + region).
- Add a Discord “Edit Location” modal and group location fields into a single human-readable “Location” line in preview/applied-updates embeds, with expanded unit coverage.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
packages/shared/src/five08/crm_normalization.py |
Adds canonical country validation + aliases, US state abbreviation expansion, and plausibility checks for city/state. |
packages/shared/src/five08/resume_extractor.py |
Updates deterministic parsing to treat 2-part non-country locations as city+region and strengthens LLM prompt constraints. |
apps/discord_bot/src/five08/discord_bot/cogs/crm.py |
Introduces Edit Location modal/button; groups location fields into a single “Location” summary in preview + applied updates. |
tests/unit/test_crm_normalization.py |
Adds/updates normalization tests for country/state/city plausibility and US state expansion. |
tests/unit/test_resume_extractor.py |
Adds extraction test ensuring invalid LLM location fields are overridden by deterministic parsing; updates state expectations. |
tests/unit/test_crm.py |
Adds tests for grouped location rendering and the new Edit Location modal/button behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| def _normalize_location_text(value: str) -> str: | ||
| return re.sub(r"\s+", " ", value).strip(" ,") |
There was a problem hiding this comment.
_normalize_location_text currently strips spaces/commas but not trailing periods. This makes normalize_country reject common resume punctuation like Taiwan. / Germany. (and then higher-level location parsing may fail for City, Region, Taiwan.). Consider stripping terminal punctuation (at least .) during normalization so canonical country matching still works.
| return re.sub(r"\s+", " ", value).strip(" ,") | |
| # Collapse internal whitespace and strip common trailing punctuation/spaces. | |
| return re.sub(r"\s+", " ", value).strip(" ,.") |
| letters_only = re.sub(r"[^A-Za-z]", "", normalized) | ||
| if len(letters_only) <= 2: |
There was a problem hiding this comment.
normalize_state uses letters_only = re.sub(r"[^A-Za-z]", "", normalized) to reject short tokens. This unintentionally rejects non-ASCII region names that otherwise pass _is_plausible_location_phrase (e.g., Chinese/Japanese/Korean region strings), because letters_only becomes empty and the function returns None. Consider counting Unicode letters instead (e.g., based on str.isalpha()), so the “short token” guard doesn’t exclude valid non-Latin locations.
| letters_only = re.sub(r"[^A-Za-z]", "", normalized) | |
| if len(letters_only) <= 2: | |
| letter_count = sum(1 for ch in normalized if ch.isalpha()) | |
| if letter_count <= 2: |
| if len(parts) == 2: | ||
| if _looks_like_state_region(region): | ||
| return city, _normalize_state(region), None | ||
| country = _normalize_country(region) | ||
| if country: |
There was a problem hiding this comment.
In _parse_location_candidate, the _looks_like_state_region(region) branch runs before trying _normalize_country(region). Since _looks_like_state_region returns true for any 2-letter uppercase token, inputs like London, UK or San Francisco, US will be treated as a “state” and short-circuited, causing country parsing to fail (and may return (city, None, None) with the new stricter normalize_state). Consider attempting _normalize_country(region) first, or tightening _looks_like_state_region to only match known US state abbreviations (instead of [A-Z]{2}).
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
tests/unit/test_crm_normalization.py (1)
55-61: Add the same garbage-string rejection check fornormalize_state.The PR tightens both city and state parsing, but this block only proves the long bullet-prefixed garbage value is rejected for
normalize_city. Ifnormalize_statestarts accepting the same junk, this file will miss it.Suggested assertion
assert normalize_state("CA") == "California" assert normalize_state("Kaohsiung City") == "Kaohsiung City" + assert ( + normalize_state("○ A Python Django Api Handles Account Creation And Management") + is None + ) assert normalize_state("Js") is None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_crm_normalization.py` around lines 55 - 61, The test currently verifies that normalize_city rejects the long bullet-prefixed garbage string but does not assert the same for normalize_state; add an analogous assertion that normalize_state("○ A Python Django Api Handles Account Creation And Management") is None so the test covers state normalization rejecting the same junk. Locate the assertions block containing normalize_city and normalize_state in tests/unit/test_crm_normalization.py and add the new assertion next to the existing normalize_city check to ensure normalize_state behavior is validated.tests/unit/test_resume_extractor.py (1)
759-826: Also assert timezone backfill in this repaired-location path.This test proves the invalid country is replaced and the region is repaired, but it never checks the
Taiwantimezone inference that similar extractor tests already cover. A regression could keepaddress_city/state/countrycorrect while droppingresult.timezone.Suggested assertion
assert result.address_city == "Nanzih" assert result.address_state == "Kaohsiung City" assert result.address_country == "Taiwan" + assert result.timezone == "UTC+08:00"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_resume_extractor.py` around lines 759 - 826, The test test_extract_discards_invalid_country_and_repairs_current_location_region currently verifies address_city/state/country but omits timezone; update this test (which uses ResumeProfileExtractor and assigns result = extractor.extract(...)) to also assert the inferred timezone (expecting "Asia/Taipei") on result.timezone to ensure timezone backfill is present when the invalid country is replaced and the region is repaired.tests/unit/test_crm.py (1)
669-693: Cover non-canonical manual edits in the location modal submit test.This case only feeds already-normalized city/state/country values, so it mainly proves passthrough plus timezone normalization. A regression that skips
normalize_country/normalize_statein the new modal path would still pass.Consider a companion submit case with mixed-case or abbreviated inputs, plus an invalid country value, and assert the payload is normalized or rejected as intended.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_crm.py` around lines 669 - 693, Add a companion async test (e.g., test_edit_location_modal_submit_normalizes_or_rejects) that exercises ResumeEditLocationModal.on_submit using mixed-case/abbreviated inputs (e.g., "nAnZiH", "ca", "uSa" or "US") and an invalid country value to cover non-canonical manual edits; ensure the test sets modal.city_input/_state_input/_country_input/_timezone_input values, calls await modal.on_submit(mock_interaction), and then asserts that ResumeUpdateConfirmationView.proposed_updates contains normalized keys via normalize_country/normalize_state (or that the modal rejects/handles the invalid country as the code path dictates) and that mock_interaction.response.send_message was called (or not) according to the rejection behavior so the normalization and validation paths are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/discord_bot/src/five08/discord_bot/cogs/crm.py`:
- Around line 1153-1154: The ResumeEditLocationButton is only added when
self._has_location_updates(proposed_updates) is true, making the location editor
unreachable if the parser returns no location; change the logic so the button is
added whenever location data is missing or incomplete in proposed_updates (i.e.,
when not self._has_location_updates(proposed_updates) or when specific location
fields like city/state/country/timezone are absent) so users can open the modal
to add a new location; update the condition around
add_item(ResumeEditLocationButton()) in the same block to check for
missing/incomplete location fields rather than only existing updates.
- Around line 954-964: The modal currently only uses
normalize_timezone(raw_timezone) and thus rejects abbreviation inputs like
"PST"/"EST"; update the timezone parsing to mirror _parse_location_input by
first trying normalize_timezone(raw_timezone) and if that returns falsy, check
_LOCATION_TIMEZONE_ABBREV_MAP for a mapped value (or call the same helper used
by _parse_location_input) so timezone accepts both offset-style and abbreviation
inputs; adjust the timezone variable assignment and the invalid_fields check
(referencing raw_timezone and timezone) accordingly.
In `@packages/shared/src/five08/crm_normalization.py`:
- Around line 90-317: The country lookup rejects legitimate names because
_location_lookup_key() only strips periods and the canonical/alias maps
(_CANONICAL_COUNTRY_NAMES, _COUNTRY_CANONICAL_MAP, _COUNTRY_ALIASES) lack many
variants (e.g., "Democratic Republic of the Congo", accented forms like "Côte
d’Ivoire"). Fix by (1) updating _CANONICAL_COUNTRY_NAMES to include missing
canonical names (e.g., "Democratic Republic of the Congo") and common official
variants, (2) enhancing _location_lookup_key() to perform robust Unicode
normalization (unicodedata.normalize to NFKD + strip combining marks), normalize
apostrophes/quotes (curly to straight), remove punctuation and extra whitespace
and then casefold, and (3) rebuild _COUNTRY_CANONICAL_MAP and _COUNTRY_ALIASES
keys using that normalized lookup key so alias entries (add accented and
punctuation variants) map correctly to the canonical values; update any
reference code that uses _COUNTRY_CANONICAL_MAP/_COUNTRY_ALIASES to call the
improved _location_lookup_key() before lookup.
- Around line 425-444: The fallback is too permissive and lacks non-US aliases;
update normalize_state() and the shared fallback logic by first checking a
curated map of non-US region abbreviations/names (e.g.,
CANADA_PROVINCE_ABBREVIATIONS, OTHER_REGION_NAMES) after
_US_STATE_ABBREVIATIONS/_US_STATE_NAMES and return canonical values if found,
then tighten _is_plausible_location_phrase (reduce max_words to ~3, max_length
to ~40) and add a short stopword list (e.g.,
"engineer","developer","javascript","senior","jr","sr","intern") to reject
job/tech phrases; also require letters_only length >= 3 before accepting and
calling _title_case_location(normalized). Apply the same stricter checks in
normalize_city()-related paths so 1–2 letter tokens (like "JS") are not
title-cased while known region abbreviations are still preserved via the curated
alias maps.
---
Nitpick comments:
In `@tests/unit/test_crm_normalization.py`:
- Around line 55-61: The test currently verifies that normalize_city rejects the
long bullet-prefixed garbage string but does not assert the same for
normalize_state; add an analogous assertion that normalize_state("○ A Python
Django Api Handles Account Creation And Management") is None so the test covers
state normalization rejecting the same junk. Locate the assertions block
containing normalize_city and normalize_state in
tests/unit/test_crm_normalization.py and add the new assertion next to the
existing normalize_city check to ensure normalize_state behavior is validated.
In `@tests/unit/test_crm.py`:
- Around line 669-693: Add a companion async test (e.g.,
test_edit_location_modal_submit_normalizes_or_rejects) that exercises
ResumeEditLocationModal.on_submit using mixed-case/abbreviated inputs (e.g.,
"nAnZiH", "ca", "uSa" or "US") and an invalid country value to cover
non-canonical manual edits; ensure the test sets
modal.city_input/_state_input/_country_input/_timezone_input values, calls await
modal.on_submit(mock_interaction), and then asserts that
ResumeUpdateConfirmationView.proposed_updates contains normalized keys via
normalize_country/normalize_state (or that the modal rejects/handles the invalid
country as the code path dictates) and that
mock_interaction.response.send_message was called (or not) according to the
rejection behavior so the normalization and validation paths are covered.
In `@tests/unit/test_resume_extractor.py`:
- Around line 759-826: The test
test_extract_discards_invalid_country_and_repairs_current_location_region
currently verifies address_city/state/country but omits timezone; update this
test (which uses ResumeProfileExtractor and assigns result =
extractor.extract(...)) to also assert the inferred timezone (expecting
"Asia/Taipei") on result.timezone to ensure timezone backfill is present when
the invalid country is replaced and the region is repaired.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2a50a859-4f68-4423-b9a3-c05af581bbe8
📒 Files selected for processing (6)
apps/discord_bot/src/five08/discord_bot/cogs/crm.pypackages/shared/src/five08/crm_normalization.pypackages/shared/src/five08/resume_extractor.pytests/unit/test_crm.pytests/unit/test_crm_normalization.pytests/unit/test_resume_extractor.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raw_city = (self.city_input.value or "").strip() | ||
| raw_state = (self.state_input.value or "").strip() | ||
| raw_country = (self.country_input.value or "").strip() | ||
| raw_timezone = (self.timezone_input.value or "").strip() | ||
|
|
There was a problem hiding this comment.
ResumeEditLocationModal.on_submit treats user-entered strings like "None"/"null" as real values (e.g., normalize_city("None") returns "None"), so submitting the modal can write the literal string "None" into addressCity/addressState rather than clearing the field. Consider pre-normalizing inputs with the same logic as _location_component (treat case-insensitive "none"/"null" as empty) before calling normalize_city/normalize_state, or make the normalization helpers reject these sentinel values.
| raw_city = (self.city_input.value or "").strip() | |
| raw_state = (self.state_input.value or "").strip() | |
| raw_country = (self.country_input.value or "").strip() | |
| raw_timezone = (self.timezone_input.value or "").strip() | |
| def _clean_location_input(value: str | None) -> str: | |
| """Normalize raw user location input before CRM normalization. | |
| Treats case-insensitive 'none'/'null' as empty so those values clear | |
| existing fields instead of being stored literally. | |
| """ | |
| cleaned = (value or "").strip() | |
| if cleaned.lower() in {"none", "null"}: | |
| return "" | |
| return cleaned | |
| raw_city = _clean_location_input(self.city_input.value) | |
| raw_state = _clean_location_input(self.state_input.value) | |
| raw_country = _clean_location_input(self.country_input.value) | |
| raw_timezone = _clean_location_input(self.timezone_input.value) |
| city_input: discord.ui.TextInput = discord.ui.TextInput( | ||
| label="City", | ||
| required=False, | ||
| max_length=100, | ||
| ) | ||
| state_input: discord.ui.TextInput = discord.ui.TextInput( | ||
| label="State / Region", | ||
| required=False, | ||
| max_length=100, | ||
| ) | ||
| country_input: discord.ui.TextInput = discord.ui.TextInput( | ||
| label="Country", | ||
| required=False, | ||
| max_length=100, | ||
| ) | ||
| timezone_input: discord.ui.TextInput = discord.ui.TextInput( | ||
| label="Timezone", | ||
| required=False, | ||
| max_length=100, | ||
| ) |
There was a problem hiding this comment.
The location modal allows up to 100 characters per field, but normalize_city/normalize_state currently reject values over 40 chars (via _is_plausible_location_phrase(..., max_length=40)). This mismatch will cause the UI to accept input that the backend will always mark invalid. Consider aligning max_length on the TextInputs with the normalization constraints (or loosening the normalization max_length if 100 is intentional).
| @classmethod | ||
| def _has_location_updates(cls, values: dict[str, Any]) -> bool: | ||
| return any(values.get(field) for field in cls._LOCATION_FIELDS) | ||
|
|
There was a problem hiding this comment.
ResumeUpdateConfirmationView._has_location_updates is introduced but appears unused in the module. If it’s not needed, consider removing it to keep the view API minimal; otherwise, wire it into the call sites that need it (e.g., deciding whether to render grouped location output).
| @classmethod | |
| def _has_location_updates(cls, values: dict[str, Any]) -> bool: | |
| return any(values.get(field) for field in cls._LOCATION_FIELDS) |
| def _normalize_location_text(value: str) -> str: | ||
| return re.sub(r"\s+", " ", value).strip(" ,.") | ||
|
|
||
|
|
||
| def _location_lookup_key(value: str) -> str: | ||
| normalized = _normalize_location_text(value) | ||
| normalized = normalized.replace("’", "'").replace("‘", "'") | ||
| normalized = unicodedata.normalize("NFKD", normalized) | ||
| normalized = "".join( | ||
| ch for ch in normalized if not unicodedata.combining(ch) | ||
| ).casefold() | ||
| collapsed = "".join(ch if ch.isalnum() else " " for ch in normalized) | ||
| return re.sub(r"\s+", " ", collapsed).strip() | ||
|
|
||
|
|
||
| _COUNTRY_CANONICAL_MAP: dict[str, str] = { | ||
| _location_lookup_key(value): value for value in _CANONICAL_COUNTRY_NAMES | ||
| } | ||
| _COUNTRY_ALIASES: dict[str, str] = { | ||
| _location_lookup_key(key): value for key, value in _RAW_COUNTRY_ALIASES.items() | ||
| } | ||
|
|
||
|
|
||
| def _is_plausible_location_phrase( | ||
| value: str, | ||
| *, | ||
| max_words: int, | ||
| max_length: int, | ||
| ) -> bool: | ||
| if not value or len(value) > max_length: | ||
| return False | ||
| tokens = [ | ||
| token.casefold() | ||
| for token in re.findall(r"[^\W\d_][^\W\d_'-]*", value, flags=re.UNICODE) | ||
| ] | ||
| if not tokens or len(tokens) > max_words: | ||
| return False | ||
| if any(token in _LOCATION_STOPWORDS for token in tokens): | ||
| return False | ||
| for ch in value: | ||
| if ch.isalpha() or ch in {" ", "-", "'", ".", "(", ")"}: | ||
| continue | ||
| return False |
There was a problem hiding this comment.
normalize_city/normalize_state currently reject strings containing typographic apostrophes (e.g., “’” / “‘”) because _is_plausible_location_phrase only allows ASCII ', while _location_lookup_key already normalizes these for countries. This can cause valid city/region names with curly quotes to be treated as invalid. Consider normalizing curly quotes in _normalize_location_text (or allowing them in _is_plausible_location_phrase) so city/state validation matches the country normalization behavior.
Description
Tighten CRM/resume location parsing so country values must be valid countries, city/state fields reject obvious garbage, and two-part locations like
Nanzih, Kaohsiung Cityare repaired as city plus region instead of a bad country.Add a resume-review
Edit Locationmodal for City, State / Region, Country, and Timezone, and render grouped human-readable location summaries in both the preview and applied-updates embeds.Expand unit coverage for normalization, resume extraction, resume confirmation UI, and non-US location parsing.
Related Issue
N/A
How Has This Been Tested?
uv run pytest tests/unit/test_crm_normalization.py tests/unit/test_resume_extractor.py tests/unit/test_crm.py -qSummary by CodeRabbit
New Features
Bug Fixes
Tests