Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CharArrayBuffer.subSequence(beganIndex,endIndex) to return right result. #195

Merged
merged 1 commit into from Apr 18, 2020

Conversation

ufolr
Copy link
Contributor

@ufolr ufolr commented Apr 17, 2020

CharArrayBuffer implemented the CharSequence interface,
and CharSequence define method:
public CharSequence subSequence(final int beginIndex, final int endIndex)

Threre are two problem in original implementation of CharArrayBuffer.subSequence

  1. It call CharBuffer.wrap(buffer.array, beginIndex, endIndex) directly,but the 3rd parameter of the CharBuffer.wrap method is length, it means how many chars need to copy, not endIndex. It makes some problem such as pass CharArrayBuffer as a CharSequence in to Regex's matcher method like following:
        charSequence.append("aabb123c");   
        Pattern p = Pattern.compile("\\D*(\\d+)\\D*");   
        Matcher m = p.matcher(charSequence);   
        Assert.assertEquals("123", m.find()?m.group(1):""); 
  1. CharArrayBuffer's sub-sequence should return the result as same class of CharArrayBuffer. But' the old implementation use CharBuffer.wrap to make a result which instance of HeapCharBuffer.

This problem exists in both 4.x and 5.x. I'd like see this problem to be fixed.

@ufolr ufolr force-pushed the Fix_CharArrayBuffer_SubSequence branch from 9837838 to 6bdce5f Compare April 17, 2020 17:45
@ok2c
Copy link
Member

ok2c commented Apr 17, 2020

CharArrayBuffer's sub-sequence should return the result as same class of CharArrayBuffer. But' the old implementation use CharBuffer.wrap to make a result which instance of HeapCharBuffer.

@ufolr Where is this requirement stated anywhere? I could not find any evidence of such requirement in CharSequence javadocs.

@ufolr
Copy link
Contributor Author

ufolr commented Apr 17, 2020

CharArrayBuffer's sub-sequence should return the result as same class of CharArrayBuffer. But' the old implementation use CharBuffer.wrap to make a result which instance of HeapCharBuffer.

@ufolr Where is this requirement stated anywhere? I could not find any evidence of such requirement in CharSequence javadocs.

@ok2c There is no specification. Just my suggestion. I think its return should be intuitive. Such as sub noodle is a noodle lol.
There maybe some method which accept a CharArrayBuffer as parameter, but can't take the sub-sequence result of CharArrayBuffer.
This will cause me confusion.
Anyway, just my suggestion.

@ok2c
Copy link
Member

ok2c commented Apr 17, 2020

@ufolr While "sub noodle is a noodle" is conceptually nice, it does cause an unnecessary array allocation and an array copy, which is a very expensive operation. It completely defeats the purpose of having efficient access to a sub sequence of characters contained in the buffer.
Could we fix the bug without causing unnecessary buffer allocation?

@ufolr
Copy link
Contributor Author

ufolr commented Apr 18, 2020

@ok2c Yep. I saw that waste of performance. So I make CharArrayBuffer can reuse the existing char array by adding an offset pos tag. If there is no write option, the CharArrayBuffer will not do allocation or array copy. Just like a HeapCharBuffer do.
Too do this, I add an constructor CharArrayBuffer(final char[] b, final int off, final int len) to CharArrayBuffer.

@ok2c
Copy link
Member

ok2c commented Apr 18, 2020

@ufolr I am sorry but this is too much so that "sub noodle could be a noodle". Let's just fix the bug and let the rest stay the same.

@ufolr ufolr force-pushed the Fix_CharArrayBuffer_SubSequence branch from 34cf025 to 045ef92 Compare April 18, 2020 10:26
@ufolr
Copy link
Contributor Author

ufolr commented Apr 18, 2020

@ufolr I am sorry but this is too much so that "sub noodle could be a noodle". Let's just fix the bug and let the rest stay the same.

@ok2c OK, I revert the changes about "sub noodle", just fix the wrong result value problem.
"sub noodle could be a noodle" is just a suggestion, and does not have to be modified.

@ufolr ufolr force-pushed the Fix_CharArrayBuffer_SubSequence branch from 045ef92 to 5d2b151 Compare April 18, 2020 13:00
@ok2c ok2c merged commit 5d8c512 into apache:master Apr 18, 2020
@ok2c
Copy link
Member

ok2c commented Apr 18, 2020

@ufolr Cool! Many thanks for contributing the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants