Fix POSIX character classes for non-ASCII letters in RE (#4841) by liannacasper · Pull Request #4843 · codenameone/CodenameOne

liannacasper · 2026-05-01T03:56:42Z

Summary

RECharacter.getType() returned UNASSIGNED for every char >= 128, so [[:alpha:]], [[:alnum:]], [[:lower:]], and [[:upper:]] silently failed on non-Latin letters. The reported regex test:\s*([[:alpha:]][[:alnum:]]*) did not match "test: ç123" because the leading ç (c-cedilla) was treated as unassigned.
Delegate to java.lang.Character.getType(c) for c >= 128 in the non-RE_UNICODE branch. The RECharacter constants (UPPERCASE_LETTER=1, LOWERCASE_LETTER=2, ...) are the Unicode general-category numeric codes and match Character.getType()'s return values exactly, so a byte cast is safe. The RE_UNICODE preprocessor branch keeps its existing table-based lookup with a UNASSIGNED fallthrough.
Add five tests covering Latin-with-cedilla, Greek, Cyrillic, CJK ideographs, vulgar fractions, and currency symbols. One test is a regression for the exact failing case from the issue and asserts both the match and the captured group. Sources stay ASCII-only (CI javac uses the platform default encoding) by using \uXXXX escapes.

Test plan

mvn -Dtest=RETest test from maven/core-unittests — 10/10 pass (5 pre-existing + 5 new).
Confirmed regression: temporarily reverting only the RECharacter change makes 4 of the 5 new tests fail with the expected non-Latin matching errors; the 5th (testPosixDigitIsAsciiOnlyForOtherNumbers) documents that [[:digit:]] remains decimal-digit-only, which is true in both states.
Pre-existing tests (testNestedPosixAlphaCharacterClassSupport, testLegacyPosixAlphaCharacterClassSupport, testPosixClassesAndEscapes, etc.) continue to pass — no behavioral change for ASCII input.

🤖 Generated with Claude Code

RECharacter.getType() returned UNASSIGNED for any char >= 128, so [[:alpha:]], [[:alnum:]], [[:lower:]], and [[:upper:]] silently failed to match non-Latin letters. As reported in #4841, the regex "test:\s*([[:alpha:]][[:alnum:]]*)" did not match "test: c123" when the identifier began with a non-ASCII letter. Delegate to java.lang.Character.getType(c) for c >= 128 in the non-RE_UNICODE branch. The RECharacter constants (UPPERCASE_LETTER=1, LOWERCASE_LETTER=2, ...) are the Unicode general-category numeric codes and match Character.getType()'s return values exactly, so a byte cast is safe. The RE_UNICODE preprocessor branch keeps its existing table-based lookup with a UNASSIGNED fallthrough. Add five tests covering Latin-with-cedilla, Greek, Cyrillic, CJK ideographs, vulgar fractions, and currency symbols, including a regression test for the exact failing case from the issue. Tests use \uXXXX escapes to keep sources ASCII-only (CI javac uses the platform default encoding). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-01T04:07:22Z

✅ Continuous Quality Report

Test & Coverage

✅ Tests: 2445 total, 0 failed, 0 skipped
📊 Line coverage: 53.41% [HTML preview] [Download]
- Lowest covered classes
  - com.codename1.ui.Display$EdtException – 0.00%
  - com.codename1.ui.plaf.CSSBorder$LinearGradient – 0.00%
  - com.codename1.util.EasyThread$InQueueRunnable – 0.00%
  - com.codename1.components.FloatingActionButton$ReleaseActionListener – 0.00%
  - com.codename1.io.Oauth2$ShowAuthenticationActionListener – 0.00%
  - com.codename1.io.Oauth2$RefreshTokenActionListener – 0.00%
  - com.codename1.util.EasyThread$RunAndWaitRunnable – 0.00%
  - com.codename1.components.ToastBar$FlushAnimationCallback – 0.00%
  - com.codename1.ui.plaf.CSSBorder$ColorStop – 0.00%
  - com.codename1.ui.ElevationComparator – 0.00%

Static Analysis

SpotBugs [Report archive]
- ✅ ByteCodeTranslator: 0 findings (no issues)
- ✅ android: 0 findings (no issues)
- ✅ codenameone-maven-plugin: 0 findings (no issues)
- ✅ core-unittests: 0 findings (no issues)
- ✅ ios: 0 findings (no issues)
✅ PMD: 0 findings (no issues) [Report archive]
✅ Checkstyle: 0 findings (no issues) [Report archive]

Generated automatically by the PR CI workflow.

shai-almog · 2026-05-01T04:20:50Z

Compared 86 screenshots: 86 matched.

Native Android coverage

📊 Line coverage: 9.75% (5291/54243 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
- Other counters: instruction 7.67% (26003/339142), branch 3.48% (1132/32522), complexity 4.52% (1410/31163), method 7.92% (1153/14567), class 12.97% (253/1950)
- Lowest covered classes
  - kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
  - kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
  - kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
  - kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
  - kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
  - kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
  - kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

✅ Native Android screenshot tests passed.

Native Android coverage

📊 Line coverage: 9.75% (5291/54243 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
- Other counters: instruction 7.67% (26003/339142), branch 3.48% (1132/32522), complexity 4.52% (1410/31163), method 7.92% (1153/14567), class 12.97% (253/1950)
- Lowest covered classes
  - kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
  - kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
  - kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
  - kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
  - kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
  - org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
  - kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
  - kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

Benchmark Results

Detailed Performance Metrics

Metric	Duration
Base64 payload size	8192 bytes
Base64 benchmark iterations	6000
Base64 native encode	1115.000 ms
Base64 CN1 encode	165.000 ms
Base64 encode ratio (CN1/native)	0.148x (85.2% faster)
Base64 native decode	734.000 ms
Base64 CN1 decode	243.000 ms
Base64 decode ratio (CN1/native)	0.331x (66.9% faster)
Image encode benchmark status	skipped (SIMD unsupported)

shai-almog · 2026-05-01T04:30:38Z

Compared 42 screenshots: 42 matched.
✅ Native iOS screenshot tests passed.

Benchmark Results

VM Translation Time: 0 seconds
Compilation Time: 196 seconds

Build and Run Timing

Metric	Duration
Simulator Boot	77000 ms
Simulator Boot (Run)	1000 ms
App Install	15000 ms
App Launch	9000 ms
Test Execution	301000 ms

…stub The CI Ant build sets -bootclasspath to Ports/CLDC11/dist/CLDC11.jar, whose java.lang.Character stub does not expose getType, isLetter, or isLetterOrDigit. The previous fix used Character.getType(c), which compiled fine under the maven build (full JDK rt.jar) but fails the Ant build with "cannot find symbol: method getType(char)". Compose the same effect from the methods that the CLDC11 stub does expose: isLowerCase, isUpperCase, isDigit, isSpaceChar. This covers cased letters in Latin (with diacritics), Greek, and Cyrillic, plus decimal digits and space separators -- enough to fix the reported case from #4841 ("test: c-cedilla 123" matching "test:\\s*([[:alpha:]][[:alnum:]]*)"). Limitation: characters whose Unicode general category is OTHER_LETTER (CJK ideographs, Hebrew, Arabic, Devanagari, ...), TITLECASE_LETTER, MODIFIER_LETTER, or LETTER_NUMBER cannot be distinguished from UNASSIGNED with the CLDC11 API surface and remain unmatched by [[:alpha:]] / [[:alnum:]]. Lifting that limitation requires either the RE_UNICODE preprocessor branch or extending the CLDC11 stub -- both out of scope for this fix. Tests document the limitation by asserting only on cased scripts. Verified: javac -bootclasspath CLDC11.jar -source 1.5 -target 1.5 compiles RECharacter and RE cleanly; mvn test from core-unittests runs all 10 RETest tests, including the regression for the exact failing input from the issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

liannacasper · 2026-05-01T04:54:10Z

Pushed cf8ee6a to address the CI failure: javac complained that Character.getType(char) was missing because the framework is built with -bootclasspath ../Ports/CLDC11/dist/CLDC11.jar, and the CLDC11 stub does not expose getType / isLetter / isLetterOrDigit.

The fix now uses only methods the stub does expose -- isLowerCase, isUpperCase, isDigit, isSpaceChar -- composing the same effect for the cased scripts (Latin with diacritics, Greek, Cyrillic) plus decimal digits and separators. That is enough for the reported failing case "test: c-cedilla 123" against "test:\\s*([[:alpha:]][[:alnum:]]*)".

Known limitation: OTHER_LETTER (CJK, Hebrew, Arabic, ...), TITLECASE_LETTER, MODIFIER_LETTER, and LETTER_NUMBER cannot be distinguished from UNASSIGNED with the CLDC11 API surface, so [[:alpha:]] / [[:alnum:]] still won't match them. Lifting that requires either the existing RE_UNICODE preprocessor branch (full Unicode tables) or extending the CLDC11 stub -- both out of scope for this issue. The tests are scoped accordingly.

Verified locally:

javac -bootclasspath CLDC11.jar -source 1.5 -target 1.5 src/com/codename1/util/regex/*.java -- compiles clean.
mvn -Dtest=RETest test -- 10/10 pass (5 pre-existing + 5 new, including the exact regression from the issue).

shai-almog merged commit 1750d63 into master May 1, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix POSIX character classes for non-ASCII letters in RE (#4841)#4843

Fix POSIX character classes for non-ASCII letters in RE (#4841)#4843
shai-almog merged 2 commits intomasterfrom
fix-4841-posix-non-latin-letters

liannacasper commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

shai-almog commented May 1, 2026 •

edited

Loading

Uh oh!

shai-almog commented May 1, 2026 •

edited

Loading

Uh oh!

liannacasper commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liannacasper commented May 1, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Continuous Quality Report

Test & Coverage

Static Analysis

Uh oh!

shai-almog commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Native Android coverage

Native Android coverage

Benchmark Results

Detailed Performance Metrics

Uh oh!

shai-almog commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Build and Run Timing

Uh oh!

liannacasper commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 1, 2026 •

edited

Loading

shai-almog commented May 1, 2026 •

edited

Loading

shai-almog commented May 1, 2026 •

edited

Loading