Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringIndexError when indexing String with Umlauts #29915

Closed
axsk opened this issue Nov 3, 2018 · 1 comment
Closed

StringIndexError when indexing String with Umlauts #29915

axsk opened this issue Nov 3, 2018 · 1 comment

Comments

@axsk
Copy link
Contributor

axsk commented Nov 3, 2018

julia> ("xöb")[3]
ERROR: StringIndexError("xöb", 3)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:216
 [3] getindex(::String, ::Int64) at ./strings/string.jl:209
 [4] top-level scope at none:0

julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Stumbled upon this when debugging a still unexplained segfault

julia> myFunctionInvolvingTheAboveWithASecretString() Unreachable reached at 0x1324555d6

signal (4): Illegal instruction: 4
in expression starting at no file:0
Type at ./dict.jl:696
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
Type at ./show.jl:193
Type at ./show.jl:271 [inlined]
show_default at ./show.jl:325
show at ./show.jl:315
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
show_delim_array at ./show.jl:695
show_delim_array at ./show.jl:680 [inlined]
show at ./show.jl:714
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
show_default at ./show.jl:332
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
show at ./show.jl:315 [inlined]
print at ./strings/io.jl:31
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
print_to_string at ./strings/io.jl:122
string at ./strings/io.jl:155 [inlined]
macro expansion at ./logging.jl:321 [inlined]
parsedata at /Users/alex/home/dev/winston/src/webcrawl.jl:77
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
macro expansion at ./logging.jl:308 [inlined]
#search#1 at /Users/alex/home/dev/winston/src/webcrawl.jl:46
unknown function (ip: 0x132439283)
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
#search at ./none:0 [inlined]
#crawl#1 at /Users/alex/home/dev/winston/src/webcrawl.jl:137
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
#crawl at ./none:0
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1831
do_call at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:324
eval_stmt_value at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:363 [inlined]
eval_body at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:682
jl_interpret_toplevel_thunk_callback at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:795
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x11f7f734f)
unknown function (ip: 0x6)
jl_interpret_toplevel_thunk at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:804
jl_toplevel_eval_flex at /Users/osx/buildbot/slave/package_osx64/build/src/toplevel.c:813
jl_toplevel_eval_in at /Users/osx/buildbot/slave/package_osx64/build/src/builtins.c:622
eval at ./boot.jl:319
eval_user_input at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
macro expansion at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:259
jl_apply at /Users/osx/buildbot/slave/package_osx64/build/src/./julia.h:1537 [inlined]
start_task at /Users/osx/buildbot/slave/package_osx64/build/src/task.c:268
Allocations: 92944920 (Pool: 92915632; Big: 29288); GC: 232
[1] 21238 illegal hardware instruction julia

@stev47
Copy link
Contributor

stev47 commented Nov 3, 2018

From the documentation:

String literals are encoded using the UTF-8 encoding. UTF-8 is a variable-width encoding, meaning that not all characters are encoded in the same number of bytes. In UTF-8, ASCII characters – i.e. those with code points less than 0x80 (128) – are encoded as they are in ASCII, using a single byte, while code points 0x80 and above are encoded using multiple bytes – up to four per character. This means that not every byte index into a UTF-8 string is necessarily a valid index for a character. If you index into a string at such an invalid byte index, an error is thrown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants