New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several bugs in reverse(::UTF8String), add full coverage tests #12646
Conversation
@JeffBezanson might want to comment about the issues with wanting to remove unused functions from |
How does the performance compare to the C version? |
|
@stevengj I haven't had time to do hard-core benchmarking yet (and got hammered when I mentioned performance as opposed to bug fixes as a goal not that long ago!). So far looks OK, but I've got to test all cases. I'm not worried though about performance, I've learned that I can make Julia generally as fast as C with a small amount of effort! 😀 |
@ScottPJones, just a couple of typical numbers for random strings would be helpful. |
@stevengj I will, I just have some paid work to get done tonight, if I hope to get some sleep! |
5393344
to
f15806e
Compare
Note the test failure:
from @test startswith(readall(`$exename -h`), "julia [options] [program] [args...]") in |
@ScottPJones that was my fault. You just need to rebase this PR on the latest master. |
f15806e
to
8d8d802
Compare
@stevengj This is mostly faster than the C code, or the same when dealing with strinsgs with mostly ASCII or Latin1 characters, it's slower when dealing with characters that take 4 bytes in UTF-8 (> 0xffff). I put both the testing routine and results from my laptop in this gist: https://gist.github.com/ScottPJones/8feed7aa12f4ab25e76b |
@jakebolewski No problem! I had to rebase anyway after I ran full benchmarks and saw I needed to improve the performance a bit more. |
@ScottPJones, looks great! The slight performance penalty for safety looks perfectly fine here. |
I think you should go ahead and delete the old C code and the corresponding flisp code in this PR. No point in keeping buggy unused code in the tree just to implement a non-standard unused function in Scheme. |
Yes, I was planning on doing that as soon as this gets merged, thanks! |
You can do the removal as another commit on this branch. (also need to squash the three existing commits) |
It would be cleaner to have a single PR. |
OK, other people have asked me to do focused single issue PRs, and I would think removing something from flisp is really separate from this. |
(if you really want it as a single PR, and only 2 commits, I'll do so, I just want to be sure what to do, in the face of conflicting requests at different times) |
@ScottPJones, if your PR replaces X with Y, then removing X is a part of the same issue. |
The main reason to separate bugfix commits is so that they can be backported, but that's not going to happen with this commit (it depends on too many other changes that have been made in 0.4). In this case, I would tend to squash everything into a single commit because they are all a logical unit ("replace X with Y and test"). But I don't think it's a big deal as-is. |
How to partition a set of changes into commits and pull requests is a mostly stylistic issue, but for some workflows (eg. |
If X depends on A, and Z depends on A, and a PR rewrites X to no longer depend on A, That is why I really think that that (which might have some bikeshedding about removing functionality from flisp, broken and unused or not), should be in a separate PR, and not hold this up. About squashing, Stefan had praised me the last time for having separate commits for doing what Kate normally does, i.e. separate commits in a PR for the bug fixing vs. the testing, and I also just this last week got told I made a mistake by putting bug fix / test in the same PR. (I hope I'm not being too annoying about this! I don't want to waste anybody's time, including my own) |
8d8d802
to
dac0ba1
Compare
Note: I have the removal of |
A bug-fix and a test (that exposes that bug) should always be in the same commit. If you incorrectly resolve a merge conflict (assuming it's incorrect on src), the tests for that commit should fail. A good rule of thumb: if you can't switch the order of two commits in a PR they should be in the same commit. |
OK, that makes sense (although I've seen a lot of exceptions to that being merged in). |
Improve performance of reverse(str::UTF8String) Fix speedup Add tests for reverse of UTF8String
dac0ba1
to
b20d87e
Compare
@hayd Is that rule of thumb documented somewhere? I find it very useful, thanks for taking the time. |
The intent of the original C code was to get good performance assuming the string is valid. IIRC the comment at the top of the file discusses this. |
The Julia code I wrote is frequently the same speed or faster (only issue is with lots of 3 or 4 byte UTF-8 characters). |
I'm ok with this change, just explaining why things were that way. |
OK, thanks! Is there any way to efficiently do the some sort of type-punning as you had done in C, in Julia? (while keeping the sanity checks to keep it from going past the end, and having a fallback for platforms that don't allow unaligned access) Maybe using the Ptr type? |
Anything more needed here, or can it be merged? Thanks! |
Fix several bugs in reverse(::UTF8String), add full coverage tests
reverse
on aUTF8String
used the C functionu8_reverse
, which I discovered in testing has several bugs.I have rewritten it in Julia, and added tests that fully cover the function.
I wanted to remove
u8_reverse
fromsrc/support/utf8.c
, however that function is used byflisp
for thestring.reverse
function, even though that function is apparently never used anywhere in any of the .scm code I have found in Base.I wonder if the unused string functions in flisp, that are depending on broken C code, can simply be removed and save some space.