-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in unique
for partial InlineString
#32
Comments
It is a bug. |
similar bug exists for julia> using InlineStrings
julia> z = inlinestrings(["$i GBP" for i ∈ rand(1.0:0.01:500.0, 1_000)]);
julia> unique(last.(z, 3))
4-element Vector{String15}:
"GBP"
"GBP"
"GBP"
"GBP" |
I think the bug is in the interaction between julia> abc = InlineString3("abc")
"abc"
julia> first(abc, 2) == "ab"
true
julia> first(abc, 2) == InlineString15("ab")
true
julia> first(abc, 2) == InlineString3("ab") # whoops!
false I think because julia> bitstring(abc)
"01100001011000100110001100000011"
julia> bitstring(first(abc, 2))
"01100001011000100110001100000010"
julia> bitstring(InlineString3("ab"))
"01100001011000100000000000000010" So i suppose we could change It seems this bug is fixed, and all current tests pass, if we just remove this definitions: InlineStrings.jl/src/InlineStrings.jl Line 277 in ffbc1de
InlineStrings.jl/src/InlineStrings.jl Lines 286 to 287 in ffbc1de
|
Fixes #32. The core issue here is we're taking a few shortcuts in some operations like chop, chomp, first, last where we just shuffle the bits around and OR the new length. The problem is there can be "extra bits" in the inline string that can then affect operations like `==`. So we need to ensure in these optimized "modifying" operations, these extra bits get zeroed out to ensure a consistent bit representation.
* Ensure shuffled bits get cleared out in "modifying" operations Fixes #32. The core issue here is we're taking a few shortcuts in some operations like chop, chomp, first, last where we just shuffle the bits around and OR the new length. The problem is there can be "extra bits" in the inline string that can then affect operations like `==`. So we need to ensure in these optimized "modifying" operations, these extra bits get zeroed out to ensure a consistent bit representation. * remove old code * fix tests
Brilliant, thanks a lot all! Now we just need a new release ;) |
New release with the fix should be out 🤞 https://github.com/JuliaStrings/InlineStrings.jl/releases/tag/v1.1.3 |
Hooray! Thanks so much - I use the package for data analysis in the context of litigation where code has to be disclosed in court, so it's always a bit awkward to depend on unreleased versions of packages. |
Just came across this when working with a column of prices in British pounds, all of which were prepended with
GBP
. In trying to check whether indeed all entries in the (Vector{String15}
) column start withGBP
, I didunique(first.(my_col, 3))
and was surprised to get back a length 196 vector with all entries showingGBP
.A reproducer is:
The fact that there are 994 (rather than 1,000) unique results suggests to me that where the entire string is the same (because the same "price" was randomly chosen twice in setup of the xample), the first three characters are seen as the same by
unique
as well.Is this intended, i.e. is there some fundamental issue with working with "
SubInlineString
s" that means I shouldn't be doing what I'm doing or is this a bug?The text was updated successfully, but these errors were encountered: