feat: improve name detection with intelligent number stripping #520

meabed · 2025-08-26T21:48:38Z

Summary

Enhanced name detection algorithm to intelligently strip numbers from email addresses
Improved parsing for patterns like john1.due2@example.com → "John Due" instead of "John1 Due2"
Added smart fallback logic to preserve original when cleaning would result in invalid names

Changes

Core Algorithm Improvements

Intelligent number stripping: Multiple strategies for cleaning names
- Removes all numbers to extract pure alphabetic names
- Falls back to removing only trailing numbers if needed
- Preserves original if cleaning produces invalid results (e.g., a1.b2 stays as is since a and b are too short)
Smart validation: Correctly handles reserved words like "admin", "support" even after number stripping
Enhanced confidence scoring: Adjusted confidence levels based on cleaning success
- 85% confidence for successfully cleaned names with dot separator
- 75% confidence for cleaned names with other separators
- 60% confidence fallback when keeping original alphanumeric patterns

Examples of Improved Parsing

Email	Previous Result	New Result	Confidence
`john1.due2@example.com`	John1 Due2	John Due	85%
`mary123.smith456@example.com`	Mary123 Smith456	Mary Smith	85%
`2john.smith@example.com`	2john Smith	John Smith	85%
`dev3.ops4@example.com`	Dev3 Ops4	Dev Ops	85%
`test1.user2@example.com`	Test1 User2	Test User	85%
`a1.b2@example.com`	A1 B2	A1 B2 (unchanged - too short)	60%

Test Plan

All 164 existing tests pass with no regressions
Added new test cases for intelligent number stripping
Verified backward compatibility for edge cases
Tested with various alphanumeric patterns
Confirmed reserved words (admin, support, etc.) are still filtered correctly
Validated confidence scoring adjustments

Breaking Changes

None - this is a backward-compatible enhancement that improves name detection accuracy while maintaining the same API.

Enhanced the name detection algorithm to be more robust and smarter when handling numbers and special characters in email addresses. The algorithm now intelligently strips numbers from names while preserving valid name parts. Key improvements: - Intelligently strips numbers from patterns like "john1.due2" → "John Due" - Multiple cleaning strategies: removes all numbers, trailing numbers, or leading numbers based on context - Smart fallback logic: keeps original when stripping would result in too-short names - Correctly handles reserved words (admin, support, etc.) even after number stripping - Maintains backward compatibility with confidence scoring Examples: - john1.due2@example.com → John Due (85% confidence) - mary123.smith456@example.com → Mary Smith (85% confidence) - 2john.smith@example.com → John Smith (85% confidence) - a1.b2@example.com → A1 B2 (keeps original as cleaned would be too short) All 164 tests pass with no regressions.

meabed and others added 2 commits August 26, 2025 17:42

Release 2.7.0-develop.0

925c614

meabed merged commit 68a3218 into master Aug 26, 2025

meabed deleted the develop branch August 26, 2025 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve name detection with intelligent number stripping #520

feat: improve name detection with intelligent number stripping #520

Uh oh!

meabed commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: improve name detection with intelligent number stripping #520

feat: improve name detection with intelligent number stripping #520

Uh oh!

Conversation

meabed commented Aug 26, 2025

Summary

Changes

Core Algorithm Improvements

Examples of Improved Parsing

Test Plan

Breaking Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants