Codec 317 | Fix. ColognePhonetic: Duplicate code in some cases#424
Codec 317 | Fix. ColognePhonetic: Duplicate code in some cases#424Shalujha0907 wants to merge 5 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Hello @Shalujha0907
-1: This PR doesn't fix the bug and adds more broken test cases.
Note the rule "Collapse of all multiple consecutive code digits".
src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
Outdated
Show resolved
Hide resolved
|
Hello @Shalujha0907 This is now fixed in git master. Please verify your use case from git master or a 1.22.0-SNAPSHOT from https://repository.apache.org/content/repositories/snapshots/commons-codec/commons-codec/1.22.0-SNAPSHOT/ If appropriate, then close or update this PR with additional tests or fixes for this duplicate issue. Thank you! |
|
The Jira ticket is https://issues.apache.org/jira/browse/CODEC-317 |
Hello @garydgregory Understood! May be we don't need a fix. Thank You! |
|
Hello @Shalujha0907 Please check git master. |
Hello! Sure! I have seen it! |
|
Closing it. |
ColognePhonetic: Duplicate code in some cases
This PR fixes the above issue.
Summary
This PR fixes duplicate-code handling in ColognePhonetic when processing characters that do not directly produce output (especially H) and adds regression tests for the affected scenarios.
Root Cause
The duplicate filter depends on the previous effective phonetic code, but skipped/intermediate characters were still influencing lastCode, so adjacent-equivalence checks were performed against the wrong value.
Impact
Fixes incorrect duplicate handling around skipped/intermediate characters.
Preserves expected Cologne Phonetic output rules.
Improves confidence via targeted regression tests.