New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior of String.indexOf() on different platforms #5271
Comments
That's a good overview. However I would prefer to leave indexOf accessible as is and add a consistent version to StringTools instead, for those who need it and are willing to take the performance hit for checking against edge-cases that are more likely than not programming errors and hence need to be taken care of in a context specific manner anyway. |
|
As requested, I created a PR with a test case. The builds failed with error messages I don't really understand. I guess it is a problem with the CI setup and not the test case itself. Do you need me to do anything else here? |
I could't find your PR, where is it located? |
Here #5277 |
It seems that your code trigger an infinite loop somewhere, when running into macro code. Could you check that? |
I can now see the following in the log, so the CI build fails because of the assertions - which is expected.
|
Fixing this probably requires adding something like this to every target's indexOf implementation: if (str == "") {
if (startIndex == null) {
return 0;
}
if (startIndex > length) {
return length;
}
if (startIndex < 0) {
return 0;
}
return startIndex;
} Call me controversial, but this is retarded. :P And good luck fixing this on Flash... Random idea: Let's add a feature that's like a static extension, but more dominant. Like |
That's actually a great idea! |
Huh, how about making a PR with the improvement?;) |
I don't think it should be added to std lib. It's mostly for flash to avoid native |
IMHO:
We should align the behavior on Javascript specification, so
anyString.indexOf("",anyValue) == 0
We should left indexOf(null) unspecified.
…On Mon, Sep 25, 2017 at 11:16 AM, Alexander Kuzmenko < ***@***.***> wrote:
I don't think it should be added to std lib. It's mostly for flash to
avoid native push call.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA-bwK0pVoe2MgRZMwkWf0psxbspCIsAks5sl29YgaJpZM4IlbJ_>
.
|
This seems so strange to me... you can either have consistency or not, so if you leave one special case unspecified I don't see why we should go out of our way to specify another special case. |
You can agree with me that we rarely specify null values, whereas the empty
string is actually specified in EcmaScript
…On Mon, Sep 25, 2017 at 11:05 PM, Simon Krajewski ***@***.***> wrote:
This seems so strange to me... you can either have consistency or not, so
if you leave one special case unspecified I don't see why we should go out
of our way to specify another special case.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA-bwMnjqqUWlyLse_KZ19ZHi7IrN6-jks5smBWagaJpZM4IlbJ_>
.
|
Sure, but the thing is, if we have to add additional branching to something as basic as indexOf anyway, we might as well also check for null values. |
RE: push - I am really cautious about array push ever since discovering memory issues in as3/flash, and kind of often paranoid and often use What is the current status of clearing arrays optimally? I am never sure if I should be looking at polygonal structures rather than array or list... could/should some of polygonal be added to / or requested to be added to standard ds? Really it's nice if Haxe std lib points you to optimal fast safe approaches rather than having to trawl github for alternative data structures and methods. RE: for if null - I find it really a pain that we don't have an 'if' for null. ifNull( value ) might be nice, it's really a pain to be doing if( value == null ), not so much about less characters and forgetting to use == null and a specific ifNull maybe easier, I know flash allows if( value ) to be same as if( value == null ) but I have no idea on the downsides of this assumption, but I sometimes would like the simple flash way. Anyway I am off topic as usual let you get back to index discussions. |
Reading through this issue again, I still agree with myself from last year. If we really want to mess up something like |
Surprisingly, I still agree with myself too :) Checking for empty string does not change anything wrt to speed of the implemention (especially if we check for .length == 0 instead of == "") and does provide a specified cross platform behavior regarding this specific case. null should be left unspecified as others parts of the stdlib. Since it's just a precheck it does not prevent us from using native method if there's one. Also, we can leave Flash out of this change unless easy to deal with. |
Just to be clear, that is my position as well and contradicts Nicolas' comment which you thumbed up. |
Ok, I thumbed up to have at least some improvement. I would prefer consistency for both cases however. |
I still think that we should not overly specify what happens when null is passed as argument unless it makes senses for null to be passed to this particular method. That's consistency wrt other API methods. For other (not null) values it makes sense to specify unless it gives us a very slow code path, which is not the case here so let's move forward. |
* started unicode unit tests * more unicode tests and api changes * turn off neko (will not be made unicode compatible) * flash unicode support * added polyfill for missing IE String.fromCodePoint * make ucs2 with utf16 encoding the default for tests compliance * [typer] allow the String module to have multiple module types I think taking the closure of e.g. String's `fromCharCode` on JS causes a `_String_Impl` class to be created. * [eval] disable unicode tests * [java/cs] Update Bytes code to support the RawNative encoding Also fix `String.fromCharCode` when the code point is a surrogate pair * [unicode] move some code from HL to common * [hxcpp] Some work to prepare for utf16 strings * [hxcpp] Remove compiler warnings. * [hxcpp] Allow hxcpp strings to be non-utf8 * [hxcpp] Use the same hash for utf8 and wide-char representations. Remove assumptions about hxcpp string format from crypto.Sha * [hxcpp] Do not assume utf8 on hxcpp. Add optimization for Std.string in the case of passed String * Use native utf8 encoding for hxcpp. Disable unicode tests for cpp without 'hxcpp_smart_strings' define * Add hxcpp_smart_strings define to compile * [cpp] remove unused variables * Add some unicode indexOf tests * [php] load the file with polyfills * [php] Converted String to multibyte * [php] fixed StringTools.fastCodeAt() for utf8 * [php] bytes io seems fixed * [php] fixed xml.Parser for unicode * [php] fixed haxe.JsonParser for unicode * [php] php strings are binary-safe * [php] fixed sha224 & sha256 tests * [hxcpp] Export spcial cpp types as fsUnkown to cppia. Change definition of cpp.Star to allow null setting. Add cpp.Native class for some easier access to cpp.Star pointers. Bump hxcpp_api_version to 400. * [lua] refactor lib installation methods * [lua] lib version adjustment * update luarocks lib name * [lua] Add Utf8 extern, use it for base String class * [lua] use native string tools inside Bytes * [lua] use fast byte decoding for utf8 * [lua] remove slice allocation inside of Bytes * [lua] fix utf8 in xml * [lua] fix stack overflow for byte decoding * [lua] adjust offsets for byte encoding * [lua] get rid of defunct Utf8 implementation * [lua] remove hardcoded stack size limit * [lua] set utf8 handling for related std string methods and tests * [lua] Std.string checks for a userdata metatable and will use that if present * [php] fixed JsonParser and xml Parser * [lua] remove special utf8 handling logic from json/xml parser * [lua] skip sha tests for now * add more tests because I hate myself * fix * [eval] it passes! * [eval] cleanup * [eval] try to get substr/substring right and add some tests * [eval] fix and test Reflect.compare * [as3] make fromCodePoint public * [eval] make (last)indexOf ECMA-compliant with regards to "" see #5271 * [lua] use safe table method for decoding bytes to strings * [lua] reformat NativeStringTools and remove non-existent charCodeAt function * [lua] use NativeStringTools for byte management in Process * [python] get tests to pass
Branch is here: https://github.com/Simn/haxe/tree/String.indexOf neko
HL
PHP
Flash/As3
Java
C#
Python
Lua
C++
|
So:
- string offsets are clamped to length
- the empty string is the only thing that is going to match at the
end (and it will generate the string length)
- negative offsets are clamped to 0
Trying to understand test 60:
"dog".lastIndexOf("", -1) = 0
offset = dog.length - 1 = 2 - should the be the same as test 56?
…On Thu, Sep 6, 2018 at 9:54 PM, Simon Krajewski ***@***.***> wrote:
Branch is here: https://github.com/Simn/haxe/tree/String.indexOf
neko
src/unit/issues/Issue5271.hx:12: 0 should be -1
src/unit/issues/Issue5271.hx:13: 0 should be -1
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:15: 0 should be -1
src/unit/issues/Issue5271.hx:19: 1 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:21: 0 should be -1
src/unit/issues/Issue5271.hx:27: 3 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:30: 0 should be -1
src/unit/issues/Issue5271.hx:42: 0 should be -1
src/unit/issues/Issue5271.hx:43: 0 should be -1
src/unit/issues/Issue5271.hx:44: 0 should be -1
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:47: 1 should be 0
src/unit/issues/Issue5271.hx:49: 1 should be 0
src/unit/issues/Issue5271.hx:50: 1 should be 0
src/unit/issues/Issue5271.hx:51: 0 should be -1
src/unit/issues/Issue5271.hx:53: 3 should be 2
src/unit/issues/Issue5271.hx:57: 3 should be 2
src/unit/issues/Issue5271.hx:58: 3 should be 2
src/unit/issues/Issue5271.hx:59: 3 should be 2
src/unit/issues/Issue5271.hx:60: 0 should be -1
HL
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:19: 1 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:27: 3 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:44: 0 should be 1
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:50: 1 should be 2
src/unit/issues/Issue5271.hx:51: 0 should be -1
src/unit/issues/Issue5271.hx:58: 3 should be 4
src/unit/issues/Issue5271.hx:59: 3 should be 10
src/unit/issues/Issue5271.hx:60: 0 should be -1
PHP
src/unit/Test.hx:276: ABORTED : ErrorException: mb_strpos(): Empty delimiter in /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/php/_Boot/HxString.php:68
Stack trace:
#0 [internal function]: php\Boot::php\{closure}(2, 'mb_strpos(): Em...', '/home/travis/bu...', 68, Array)
#1 /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/php/_Boot/HxString.php(68): mb_strpos('', '', 0, 'UTF-8')
#2 /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/unit/issues/Issue5271.php(28): php\_Boot\HxString::indexOf('', '')
#3 /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/Reflect.php(36): unit\issues\Issue5271->test()
#4 /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/unit/TestMain.php(4314): Reflect::callMethod(Object(unit\issues\Issue5271), Object(php\_Boot\HxClosure), Object(Array_hx))
#5 /home/travis/build/Simn/haxe/tests/unit/bin/php/index.php(16): unit\TestMain::main()
#6 {main} in unit.issues.Issue5271.test
src/unit/Test.hx:279: STACK :
Called from local function (/home/travis/build/Simn/haxe/tests/unit/bin/php/lib/php/_Boot/HxString.php line 68)
Called from php._Boot.HxString.indexOf (/home/travis/build/Simn/haxe/tests/unit/bin/php/lib/php/_Boot/HxString.php line 68)
Called from unit.issues.Issue5271.test (/home/travis/build/Simn/haxe/tests/unit/bin/php/lib/unit/issues/Issue5271.php line 28)
Called from Reflect.callMethod (/home/travis/build/Simn/haxe/tests/unit/bin/php/lib/Reflect.php line 36)
Called from /home/travis/build/Simn/haxe/tests/unit/bin/php/lib/unit/TestMain.php line 4314
Flash/As3
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:51: 0 should be -1
src/unit/issues/Issue5271.hx:60: 0 should be -1
Java
src/unit/issues/Issue5271.hx:12: 0 should be -1
src/unit/issues/Issue5271.hx:13: 0 should be -1
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:15: 0 should be -1
src/unit/issues/Issue5271.hx:19: 1 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:21: 0 should be -1
src/unit/issues/Issue5271.hx:27: 3 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:30: 0 should be -1
src/unit/issues/Issue5271.hx:42: 0 should be -1
src/unit/issues/Issue5271.hx:44: 0 should be -1
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:47: 1 should be 0
src/unit/issues/Issue5271.hx:50: 1 should be 0
src/unit/issues/Issue5271.hx:53: 3 should be 2
src/unit/issues/Issue5271.hx:58: 3 should be 2
src/unit/issues/Issue5271.hx:59: 3 should be 2
src/unit/issues/Issue5271.hx:60: 0 should be 2
C#
src/unit/issues/Issue5271.hx:12: 0 should be -1
src/unit/issues/Issue5271.hx:13: 0 should be -1
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/Test.hx:276: ABORTED : System.ArgumentOutOfRangeException: Argument is out of range.
Parameter name: startIndex
at System.String.IndexOfOrdinal (System.String value, Int32 startIndex, Int32 count, CompareOptions options) [0x00000] in <filename unknown>:0
at System.String.IndexOf (System.String value, Int32 startIndex, Int32 count, StringComparison comparisonType) [0x00000] in <filename unknown>:0
at System.String.IndexOf (System.String value, Int32 startIndex, StringComparison comparisonType) [0x00000] in <filename unknown>:0
at haxe.lang.StringExt.indexOf (System.String me, System.String str, Null`1 startIndex) [0x00000] in <filename unknown>:0
at unit.issues.Issue5271.test () [0x00000] in <filename unknown>:0
at unit.issues.Issue5271.__hx_invokeField (System.String field, Int32 hash, System.Object[] dynargs) [0x00000] in <filename unknown>:0
at haxe.lang.Runtime.callField (System.Object obj, System.String field, Int32 fieldHash, System.Object[] args) [0x00000] in <filename unknown>:0
at haxe.lang.Closure.__hx_invokeDynamic (System.Object[] dynArgs) [0x00000] in <filename unknown>:0
at Reflect.callMethod (System.Object o, System.Object func, Array args) [0x00000] in <filename unknown>:0
at unit.TestMain.main () [0x00000] in <filename unknown>:0 in unit.issues.Issue5271.test
src/unit/Test.hx:279: STACK :
Called from System.String.IndexOfOrdinal (null line 0)
Called from System.String.IndexOf (null line 0)
Called from System.String.IndexOf (null line 0)
Called from haxe.lang.StringExt.indexOf (null line 0)
Called from unit.issues.Issue5271.test (null line 0)
Called from unit.issues.Issue5271.__hx_invokeField (null line 0)
Called from haxe.lang.Runtime.callField (null line 0)
Called from haxe.lang.Closure.__hx_invokeDynamic (null line 0)
Called from Reflect.callMethod (null line 0)
Called from unit.TestMain.main (null line 0)
Python
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:30: 0 should be 2
src/unit/issues/Issue5271.hx:48: 0 should be 1
src/unit/issues/Issue5271.hx:54: 0 should be 1
src/unit/issues/Issue5271.hx:55: 1 should be 2
src/unit/issues/Issue5271.hx:56: 2 should be 3
Lua
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:15: 0 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:21: 0 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:30: 0 should be -1
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:51: 0 should be -1
src/unit/issues/Issue5271.hx:60: 0 should be -1
C++
src/unit/issues/Issue5271.hx:14: 0 should be -1
src/unit/issues/Issue5271.hx:15: 0 should be -1
src/unit/issues/Issue5271.hx:20: 1 should be -1
src/unit/issues/Issue5271.hx:21: 0 should be -1
src/unit/issues/Issue5271.hx:28: 3 should be -1
src/unit/issues/Issue5271.hx:29: 3 should be -1
src/unit/issues/Issue5271.hx:30: 0 should be -1
src/unit/issues/Issue5271.hx:45: 0 should be -1
src/unit/issues/Issue5271.hx:51: 0 should be -1
src/unit/issues/Issue5271.hx:60: 0 should be -1
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#5271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABlp1gbhSKf5Prpn3N1CHLLPPMNA_vUeks5uYSkwgaJpZM4IlbJ_>
.
|
I didn't want to think about this too much. In Eval I've fixed all these cases by doing |
fixed for python in Simn@e97ecb6 |
I've fixed neko, php and C#. |
@Simn I don't see changes in neko, maybe not merged yet ? these should be quite compatible with HL, if you can patch as well |
I've opened a PR so you can find the changes. |
Closing this in favor of #7402. |
* started unicode unit tests * more unicode tests and api changes * turn off neko (will not be made unicode compatible) * flash unicode support * added polyfill for missing IE String.fromCodePoint * make ucs2 with utf16 encoding the default for tests compliance * [typer] allow the String module to have multiple module types I think taking the closure of e.g. String's `fromCharCode` on JS causes a `_String_Impl` class to be created. * [eval] disable unicode tests * [java/cs] Update Bytes code to support the RawNative encoding Also fix `String.fromCharCode` when the code point is a surrogate pair * [unicode] move some code from HL to common * [hxcpp] Some work to prepare for utf16 strings * [hxcpp] Remove compiler warnings. * [hxcpp] Allow hxcpp strings to be non-utf8 * [hxcpp] Use the same hash for utf8 and wide-char representations. Remove assumptions about hxcpp string format from crypto.Sha * [hxcpp] Do not assume utf8 on hxcpp. Add optimization for Std.string in the case of passed String * Use native utf8 encoding for hxcpp. Disable unicode tests for cpp without 'hxcpp_smart_strings' define * Add hxcpp_smart_strings define to compile * [cpp] remove unused variables * Add some unicode indexOf tests * [php] load the file with polyfills * [php] Converted String to multibyte * [php] fixed StringTools.fastCodeAt() for utf8 * [php] bytes io seems fixed * [php] fixed xml.Parser for unicode * [php] fixed haxe.JsonParser for unicode * [php] php strings are binary-safe * [php] fixed sha224 & sha256 tests * [hxcpp] Export spcial cpp types as fsUnkown to cppia. Change definition of cpp.Star to allow null setting. Add cpp.Native class for some easier access to cpp.Star pointers. Bump hxcpp_api_version to 400. * [lua] refactor lib installation methods * [lua] lib version adjustment * update luarocks lib name * [lua] Add Utf8 extern, use it for base String class * [lua] use native string tools inside Bytes * [lua] use fast byte decoding for utf8 * [lua] remove slice allocation inside of Bytes * [lua] fix utf8 in xml * [lua] fix stack overflow for byte decoding * [lua] adjust offsets for byte encoding * [lua] get rid of defunct Utf8 implementation * [lua] remove hardcoded stack size limit * [lua] set utf8 handling for related std string methods and tests * [lua] Std.string checks for a userdata metatable and will use that if present * [php] fixed JsonParser and xml Parser * [lua] remove special utf8 handling logic from json/xml parser * [lua] skip sha tests for now * add more tests because I hate myself * fix * [eval] it passes! * [eval] cleanup * [eval] try to get substr/substring right and add some tests * [eval] fix and test Reflect.compare * [as3] make fromCodePoint public * [eval] make (last)indexOf ECMA-compliant with regards to "" see HaxeFoundation#5271 * [lua] use safe table method for decoding bytes to strings * [lua] reformat NativeStringTools and remove non-existent charCodeAt function * [lua] use NativeStringTools for byte management in Process * [python] get tests to pass
Synopsis
The return value of the String's indexOf() method across different platforms is inconsistent, when the search string is null or empty or the startIndex is out of bound.
Examples
"".indexOf(null)
"".indexOf("")
"".indexOf("", 0)
"".indexOf("", 1)
"".indexOf("", -1)
"ab".indexOf(null)
"ab".indexOf("")
"ab".indexOf("", 0)
"ab".indexOf("", 1)
"ab".indexOf("", 4)
"ab".indexOf("", -1)
Reason for different behaviors is that Haxe simply delegates to the platforms native indexOf implementation if present. It should however provide measures that ensure a cross platform consistent behavior of it's own API.
Suggestion
Having out-of-bound handling in mind and being in sync with the behavior of StringTools.startsWith - where it states
If start is the empty String "", the result is true
- I would however suggest the following specification."".indexOf(null)
"".indexOf("")
"".indexOf("", 0)
"".indexOf("", 1)
"".indexOf("", -1)
"ab".indexOf(null)
"ab".indexOf("")
"ab".indexOf("", 0)
"ab".indexOf("", 1)
"ab".indexOf("", 4)
"ab".indexOf("", -1)
This could be achieved by Haxe pre-checking these conditions before delegating to the native indexOf implementations and returning specified values if applicable.
The text was updated successfully, but these errors were encountered: