Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reverseind(s, i): convert indices in reverse(s) to indices in s #9249

Merged
merged 1 commit into from
Dec 10, 2014

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Dec 4, 2014

The main use of reverse(s::String) appears to be regex searching backwards in a string (see the discussion in #6165), but it is insanely hard to correctly convert the resulting indices (of matches in the reversed string) back to indices in the original string (see e.g. #9227). reverseind(s, i) solves this problem: reverse(s)[i] == s[reverseind(s, i)].

For example, in REPLCompletions this replaces code that is either tricky (nextind(partial, endof(partial)) - m.offset, which relies on the fact that the match is ASCII and that ASCII is encoded by a single code unit in all of our encodings) or slow chr2ind(str, length(str)-ind2chr(rstr, last(r))+1) (which is linear time in length(str)).

(It might be useful to check whether reversed strings are being searched in other packages, and whether the indices are handled correctly.)

cc: @dhoegh, @jiahao

@stevengj stevengj added the domain:unicode Related to unicode characters and encodings label Dec 4, 2014
@stevengj
Copy link
Member Author

stevengj commented Dec 4, 2014

So far, the only example outside of Base that I can find which uses indices of a search result in a reversed string is in URLParse.jl. It appears to get it wrong.

@nalimilan
Copy link
Member

Wouldn't it make sense to have a function which would reverse the string, look for matches, and return the indices in the original string directly? Not saying reverseind wouldn't be useful, though.

@stevengj
Copy link
Member Author

stevengj commented Dec 4, 2014

(Travis failure looks unrelated.)

@stevengj
Copy link
Member Author

stevengj commented Dec 4, 2014

@nalimilan, that could make sense, but seems like it should be a separate PR (that would use reverseind). As long as we are exporting reverse for string processing, I think we should export the index mapping.

(It could even just be an rsearch(s::String, r::Regex) method, although we also have eachmatch and ismatch etcetera. However, it would unfortunately behave somewhat differently than rsearch(s::String, r::String) because it wouldn't be easy to automatically reverse a Regex, so in the former case you'd have to make sure that the user specifies a reversed regex. This was discussed in #6276.)

@stevengj
Copy link
Member Author

stevengj commented Dec 7, 2014

Again, the Travis error (on Linux) seems unrelated:

Worker 9 terminated.
ERROR: ProcessExitedException()
in wait at ./task.jl:284
in wait at ./task.jl:194
in wait_full at ./multi.jl:576
in remotecall_fetch at multi.jl:676
in remotecall_fetch at multi.jl:681
in anonymous at task.jl:1450
while loading /tmp/julia/share/julia/test/runtests.jl, in expression

@tkelman
Copy link
Contributor

tkelman commented Dec 8, 2014

Worker 9 was on bitarray, that's #9176

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2014

@StefanKarpinski, ok to merge (after rebase)?

@StefanKarpinski
Copy link
Sponsor Member

Seems good to me.

stevengj added a commit that referenced this pull request Dec 10, 2014
add reverseind(s, i): convert indices in reverse(s) to indices in s
@stevengj stevengj merged commit 6695c4c into JuliaLang:master Dec 10, 2014
@stevengj stevengj deleted the reverseind branch December 10, 2014 05:35
@PallHaraldsson
Copy link
Contributor

[Not sure if disallowed to comment on "Merged" as a "closed" issue..]

Just looking at this it seems "return the corresponding index" means (and must mean?) "return the corresponding byte index". Maybe that needs to be explicit.. I want to get rid of any kind of indexing (as a default). Seems this is one function in my way.. Any good way to know how many functions depend on the implementation (byte-indexing)? I guess, I would have too take a look at any function using UFT8String. I understand any function can access private data - and that is "rude", but I guess that it is the rule in Base and an exception to "rude"-rule..

Hopefully this function is not speed-critical, as it seems I would want to let it return code-point indexes eventually..

@stevengj
Copy link
Member Author

It returns whatever index is appropriate for the given string type. This won't change if indexes are changed to opaque values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants