Skip to content
This repository was archived by the owner on Nov 22, 2025. It is now read-only.

Conversation

@meabed
Copy link
Contributor

@meabed meabed commented Aug 26, 2025

Summary

  • Enhanced name detection algorithm to intelligently strip numbers from email addresses
  • Improved parsing for patterns like john1.due2@example.com → "John Due" instead of "John1 Due2"
  • Added smart fallback logic to preserve original when cleaning would result in invalid names

Changes

Core Algorithm Improvements

  • Intelligent number stripping: Multiple strategies for cleaning names

    • Removes all numbers to extract pure alphabetic names
    • Falls back to removing only trailing numbers if needed
    • Preserves original if cleaning produces invalid results (e.g., a1.b2 stays as is since a and b are too short)
  • Smart validation: Correctly handles reserved words like "admin", "support" even after number stripping

  • Enhanced confidence scoring: Adjusted confidence levels based on cleaning success

    • 85% confidence for successfully cleaned names with dot separator
    • 75% confidence for cleaned names with other separators
    • 60% confidence fallback when keeping original alphanumeric patterns

Examples of Improved Parsing

Email Previous Result New Result Confidence
john1.due2@example.com John1 Due2 John Due 85%
mary123.smith456@example.com Mary123 Smith456 Mary Smith 85%
2john.smith@example.com 2john Smith John Smith 85%
dev3.ops4@example.com Dev3 Ops4 Dev Ops 85%
test1.user2@example.com Test1 User2 Test User 85%
a1.b2@example.com A1 B2 A1 B2 (unchanged - too short) 60%

Test Plan

  • All 164 existing tests pass with no regressions
  • Added new test cases for intelligent number stripping
  • Verified backward compatibility for edge cases
  • Tested with various alphanumeric patterns
  • Confirmed reserved words (admin, support, etc.) are still filtered correctly
  • Validated confidence scoring adjustments

Breaking Changes

None - this is a backward-compatible enhancement that improves name detection accuracy while maintaining the same API.

meabed and others added 2 commits August 26, 2025 17:42
Enhanced the name detection algorithm to be more robust and smarter when handling numbers and special characters in email addresses. The algorithm now intelligently strips numbers from names while preserving valid name parts.

Key improvements:
- Intelligently strips numbers from patterns like "john1.due2" → "John Due"
- Multiple cleaning strategies: removes all numbers, trailing numbers, or leading numbers based on context
- Smart fallback logic: keeps original when stripping would result in too-short names
- Correctly handles reserved words (admin, support, etc.) even after number stripping
- Maintains backward compatibility with confidence scoring

Examples:
- john1.due2@example.com → John Due (85% confidence)
- mary123.smith456@example.com → Mary Smith (85% confidence)
- 2john.smith@example.com → John Smith (85% confidence)
- a1.b2@example.com → A1 B2 (keeps original as cleaned would be too short)

All 164 tests pass with no regressions.
@meabed meabed merged commit 68a3218 into master Aug 26, 2025
@meabed meabed deleted the develop branch August 26, 2025 21:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants