Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions COMPLEXITY_IMPROVEMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Code Complexity Improvements Summary

## Overview
This PR reduces code complexity by extracting helper functions and removing code duplication between the synchronous and asynchronous API clients.

## Changes Made

### 1. Created Shared Error Handler Module (`src/dex_python/error_handler.py`)
**Problem:** The `_handle_error` method was duplicated identically in both `client.py` and `async_client.py` (51 lines each, 102 lines total).

**Solution:**
- Created a new `error_handler.py` module with `handle_error_response()` function
- Both clients now delegate to this shared function
- Reduced duplication by ~100 lines
- Improved maintainability - error handling logic is now in one place

### 2. Simplified `merge_cluster` Function (deduplication.py)
**Problem:** Cyclomatic complexity of 16 (Grade C), making it hard to understand and test.

**Solution:** Extracted three helper functions:
- `_select_primary_row()`: Selects which contact should be primary (CC: 7)
- `_merge_contact_fields()`: Merges data from multiple contacts (CC: 6)
- `_consolidate_related_records()`: Consolidates emails and phones (CC: 3)

**Result:** `merge_cluster` reduced from CC 16 → 3 (Grade A)

### 3. Simplified `find_fuzzy_name_duplicates` Function (deduplication.py)
**Problem:** Cyclomatic complexity of 12 (Grade C), complex nested loops.

**Solution:** Extracted two helper functions:
- `_create_soundex_blocks()`: Creates soundex-based blocks for efficient matching (CC: 7)
- `_find_matches_in_block()`: Finds fuzzy matches within a single block (CC: 4)

**Result:** `find_fuzzy_name_duplicates` reduced from CC 12 → 3 (Grade A)

### 4. Simplified `main` Function (scripts/review_duplicates.py)
**Problem:** Cyclomatic complexity of 20 (Grade C), handling too many concerns.

**Solution:** Extracted three helper functions:
- `_fetch_unresolved_groups()`: Fetches duplicate groups from database (CC: 2)
- `_display_contact_group()`: Displays contacts as a table (CC: 10)
- `_handle_user_choice()`: Processes user's selection (CC: 2)

**Result:** `main` reduced from CC 20 → 10 (Grade B, 50% improvement)

### 5. Simplified `save_contacts_batch` Function (scripts/sync_with_integrity.py)
**Problem:** Cyclomatic complexity of 12 (Grade C), doing too much in one function.

**Solution:** Extracted three helper functions:
- `_check_contact_changed()`: Checks if contact needs updating (CC: 3)
- `_enrich_contact_data()`: Parses names and extracts job data (CC: 1)
- `_save_contact_related_data()`: Saves emails and phones (CC: 5)

**Result:** `save_contacts_batch` reduced from CC 12 → 8 (Grade B, 33% improvement)

## Test Coverage

Added comprehensive tests for all new helper functions:

### Error Handler Tests (`tests/unit/test_error_handler.py`)
- 13 new tests covering all error scenarios
- Tests for 401, 429, 400, 404, 500 status codes
- Tests for entity-specific 404 errors (contacts, reminders, notes)
- Tests for edge cases (invalid JSON, missing error messages)

### Deduplication Helper Tests (`tests/unit/deduplication/test_helpers.py`)
- 13 new tests for helper functions
- Tests for `_select_primary_row()` with explicit and auto-selection
- Tests for `_merge_contact_fields()` field merging logic
- Tests for `_create_soundex_blocks()` blocking algorithm
- Tests for `_find_matches_in_block()` fuzzy matching
- Tests for `_consolidate_related_records()` data consolidation

## Testing Results

- All 186 tests pass (160 original + 26 new)
- No regressions in existing functionality
- Type checking passes with `mypy --strict`
- Linting passes with `ruff`

## Benefits

1. **Improved Maintainability**: Functions are smaller and focused on single responsibilities
2. **Better Testability**: Helper functions can be tested independently
3. **Reduced Duplication**: Shared error handler eliminates ~100 lines of duplicate code
4. **Enhanced Readability**: Lower complexity makes code easier to understand
5. **Easier Debugging**: Smaller functions are easier to reason about and debug

## Cyclomatic Complexity Summary

| Function | Before | After | Improvement |
|----------|--------|-------|-------------|
| `merge_cluster` | 16 (C) | 3 (A) | 81% ↓ |
| `find_fuzzy_name_duplicates` | 12 (C) | 3 (A) | 75% ↓ |
| `review_duplicates.main` | 20 (C) | 10 (B) | 50% ↓ |
| `save_contacts_batch` | 12 (C) | 8 (B) | 33% ↓ |

All functions that were Grade C (high complexity) are now Grade A or B.
235 changes: 148 additions & 87 deletions scripts/review_duplicates.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,142 @@ def setup_db(cursor: sqlite3.Cursor) -> None:
pass


def _fetch_unresolved_groups(cursor: sqlite3.Cursor) -> list[str]:
"""Fetch all duplicate groups that haven't been resolved yet.

Args:
cursor: Database cursor.

Returns:
List of group IDs.
"""
cursor.execute("""
SELECT DISTINCT duplicate_group_id
FROM contacts
WHERE duplicate_group_id IS NOT NULL
AND (duplicate_resolution IS NULL OR duplicate_resolution = '')
""")
return [row[0] for row in cursor.fetchall()]


def _display_contact_group(
console: Console, cursor: sqlite3.Cursor, group_id: str, group_num: int, total: int
) -> list[str]:
"""Display contacts in a duplicate group as a table.

Args:
console: Rich console for output.
cursor: Database cursor.
group_id: Group ID to display.
group_num: Current group number.
total: Total number of groups.

Returns:
List of contact IDs in this group.
"""
console.rule(f"Group {group_num}/{total} (ID: {group_id})")

# Fetch contacts in this group
cursor.execute(
"""
SELECT id, first_name, last_name, job_title
FROM contacts
WHERE duplicate_group_id = ?
""",
(group_id,),
)
contacts = cursor.fetchall()

if len(contacts) < 2:
return []

table = Table(show_header=True, header_style="bold magenta")
table.add_column("#", style="dim", width=4)
table.add_column("ID", style="cyan", width=12, overflow="ellipsis")
table.add_column("Name", style="bold")
table.add_column("Job Title")
table.add_column("Emails")
table.add_column("Phones")

contact_ids = []

for idx, row in enumerate(contacts):
c_id, first, last, job = row
contact_ids.append(c_id)

cursor.execute("SELECT email FROM emails WHERE contact_id = ?", (c_id,))
emails = "\n".join([r[0] for r in cursor.fetchall() if r[0]])

cursor.execute("SELECT phone_number FROM phones WHERE contact_id = ?", (c_id,))
phones = "\n".join([r[0] for r in cursor.fetchall() if r[0]])

table.add_row(
str(idx + 1),
c_id,
f"{first or ''} {last or ''}".strip(),
job or "",
emails,
phones,
)

console.print(table)
return contact_ids


def _handle_user_choice(
cursor: sqlite3.Cursor,
conn: sqlite3.Connection,
console: Console,
choice: str,
contact_ids: list[str],
group_id: str,
) -> tuple[int, int]:
"""Handle user's choice for a duplicate group.

Args:
cursor: Database cursor.
conn: Database connection.
console: Rich console for output.
choice: User's choice ('s', 'q', or a number).
contact_ids: List of contact IDs in the group.
group_id: Group ID.

Returns:
Tuple of (labeled_count, rejected_count) increments.
"""
if choice == "s":
# Mark as false positive
cursor.execute(
"""
UPDATE contacts
SET duplicate_resolution = 'false_positive'
WHERE duplicate_group_id = ?
""",
(group_id,),
)
conn.commit()
console.print("[yellow]✔ Marked as false positive.[/yellow]\n")
return 0, 1

# Label as confirmed with selected primary
selected_idx = int(choice) - 1
primary_id = contact_ids[selected_idx]

cursor.execute(
"""
UPDATE contacts
SET duplicate_resolution = 'confirmed',
primary_contact_id = ?
WHERE duplicate_group_id = ?
""",
(primary_id, group_id),
)
conn.commit()
res_msg = f"[green]✔ Confirmed. Primary: ...{primary_id[-8:]}[/green]\n"
console.print(res_msg)
return 1, 0


def main() -> None:
data_dir = Path(os.getenv("DEX_DATA_DIR", "output"))
db_path = data_dir / "dex_contacts.db"
Expand All @@ -42,14 +178,7 @@ def main() -> None:

setup_db(cursor)

# Get all distinct group IDs that haven't been resolved yet
cursor.execute("""
SELECT DISTINCT duplicate_group_id
FROM contacts
WHERE duplicate_group_id IS NOT NULL
AND (duplicate_resolution IS NULL OR duplicate_resolution = '')
""")
groups = [row[0] for row in cursor.fetchall()]
groups = _fetch_unresolved_groups(cursor)

console = Console()
console.clear()
Expand All @@ -68,57 +197,15 @@ def main() -> None:

try:
for i, group_id in enumerate(groups):
console.rule(f"Group {i + 1}/{len(groups)} (ID: {group_id})")

# Fetch contacts in this group
cursor.execute(
"""
SELECT id, first_name, last_name, job_title
FROM contacts
WHERE duplicate_group_id = ?
""",
(group_id,),
contact_ids = _display_contact_group(
console, cursor, group_id, i + 1, len(groups)
)
contacts = cursor.fetchall()

if len(contacts) < 2:
if len(contact_ids) < 2:
continue

table = Table(show_header=True, header_style="bold magenta")
table.add_column("#", style="dim", width=4)
table.add_column("ID", style="cyan", width=12, overflow="ellipsis")
table.add_column("Name", style="bold")
table.add_column("Job Title")
table.add_column("Emails")
table.add_column("Phones")

contact_ids = []

for idx, row in enumerate(contacts):
c_id, first, last, job = row
contact_ids.append(c_id)

cursor.execute("SELECT email FROM emails WHERE contact_id = ?", (c_id,))
emails = "\n".join([r[0] for r in cursor.fetchall() if r[0]])

cursor.execute(
"SELECT phone_number FROM phones WHERE contact_id = ?", (c_id,)
)
phones = "\n".join([r[0] for r in cursor.fetchall() if r[0]])

table.add_row(
str(idx + 1),
c_id,
f"{first or ''} {last or ''}".strip(),
job or "",
emails,
phones,
)

console.print(table)

# Options
choices = [str(x + 1) for x in range(len(contacts))] + ["s", "q"]
# Get user choice
choices = [str(x + 1) for x in range(len(contact_ids))] + ["s", "q"]
choice = Prompt.ask(
"\n[bold]Actions:[/bold]\n"
" [cyan][1-N][/cyan] Label this ID as PRIMARY (Confirm Duplicates)\n"
Expand All @@ -133,38 +220,12 @@ def main() -> None:
console.print("\n[bold red]Exiting...[/bold red]")
break

elif choice == "s":
# Mark as false positive
cursor.execute(
"""
UPDATE contacts
SET duplicate_resolution = 'false_positive'
WHERE duplicate_group_id = ?
""",
(group_id,),
)
conn.commit()
console.print("[yellow]✔ Marked as false positive.[/yellow]\n")
rejected_count += 1

else:
# Label as confirmed with selected primary
selected_idx = int(choice) - 1
primary_id = contact_ids[selected_idx]

cursor.execute(
"""
UPDATE contacts
SET duplicate_resolution = 'confirmed',
primary_contact_id = ?
WHERE duplicate_group_id = ?
""",
(primary_id, group_id),
)
conn.commit()
res_msg = f"[green]✔ Confirmed. Primary: ...{primary_id[-8:]}[/green]\n"
console.print(res_msg)
labeled_count += 1
# Handle the choice
labeled_inc, rejected_inc = _handle_user_choice(
cursor, conn, console, choice, contact_ids, group_id
)
labeled_count += labeled_inc
rejected_count += rejected_inc

except KeyboardInterrupt:
console.print("\n[bold red]Interrupted![/bold red]")
Expand Down
Loading