-
-
Notifications
You must be signed in to change notification settings - Fork 741
FReD, replacement engine for std.regex #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FReD, replacement engine for std.regex
Merged. |
The OSX/32 and FreeBSD/32 phobos tests have been failing since this was merged in. Two different asserts: http://d.puremagic.com/test-results/test_data.ghtml?dataid=113354 http://d.puremagic.com/test-results/test_data.ghtml?dataid=113355 |
Now that's awesome, since I have no access to Mac OS X nor FreeBSD. Yet MacOS definetely worked in the past. The first one is not a regex engine but a in "kickstarter" part, it's rather isolated. The only reason I can see for it to fail is that somehow bit-table construction got screwed up. And that smacks of compiler bug. The second one beats me, since getting to 205 means \w+, \b were working in various combinations already. At any rate, I can enable debug output and see what's going on down here. Can I get access to these boxes ? |
@blackwhale: I don't have an OS X VM or something I could give you access to, just my main box, but if you could tell me what exactly you need, I'd gladly send you any debug output (just no time to fully track down the issue atm). |
Ok, first of try to unittest regex.d separately with full debug, empty.d assuming is your dummy main: if it compiles all right ./regex should now blurt a ton of debug info. Then just send me whatever it produces, I'm going to diff it against my win32 test run. |
@blackwhale: Using Phobos 1d8fe48, dlang/dmd@b3bbae5 on OS X Lion:
|
strange it means that OSX also don't get to run the actual regex test now... |
Here you go:
If anybody is annoyed by the GitHub notification spam, I can keep this private between Dmitry and me as well, since he's probably the only one to whom the debug stuff makes much sense anyway. |
I'd gather it's pretty useless for others, hopefully there would be no need for dumps this big. Let's first do sanity check this, change the line 2900, to see if this will print the length of "ababx" (5): Then comment out lines 2747-2792, that should look like this: If this works afterwards, then I'd have to dig somewhere around memchr stuff on OSX. Plus it seems like the only platform specific thing there. |
Eww, this smells like a codegen bug or some serious kind of corruption: If I add the message to the assert on line 2900, it isn't triggered anymore, instead the line 7136 assert mentioned above fires later on. Trying to get an idea what is going on, I added |
Ouch, something funky is going on. Could be something to do with memory layout then. I think codegen here should be more or less the same for all platforms though. Another thing to try to isolate bug is commenting out lines that invoke kickstarter at all. bool search(Kickstart)(ref Kickstart kick, ref dchar res, ref size_t pos) And disable the test block around line 2900 for now, then see if other tests pass. |
@blackwhale: I commented out the two lines you mentioned, and disabled the whole |
That's ridiculous, wordboundary seems to utterly fail (and almost works the opposite way). First stuff this debug writeln (line 3437): debug writeln("-- wordboundary ch", front, " " back, ": ", af," ^ " ,ab, "=", af^ab); Then add line |
I spent some time on this one today.. here's what I've found so far. braddr@freebsd$ cat junk.d extern(C) int printf(const char*, ...); void main() $ dmd junk.d && ./junk so, string literals are not word aligned. This causes code inside struct ShiftOr's search() code to misbehave: This seems to be true on both osx/32 and freebsd/32. I haven't tried freebsd/64. I'm pretty sure that it'd be easy enough to cause with wchar's on any platform. |
Sorry, ignore the wchar part and here's a version with dstring's explicitly: braddr@freebsd$ cat junk.d braddr@freebsd$ ../trunk/dmd/src/dmd junk.d && ./junk I'm still trying to coax the simple test into showing similar alignment issues for os/x, but it's clear from debug code that the parameter making its way into search is misaligned: haystack.ptr = 0x8e15e, idx = 0 |
Continuing the investigation of the osx bug, I see the same basic symptoms as discussed in the rest of the thread. There needs to be a LOT more unit tests in std.internal.uni. There's a lot of non-trivial code in there and very few tests. The one test that claims to be slow isn't.. it doesn't even compile. @blackwhale any way you can get access to an osx box and dig into this? It's all your code, right? The bottom line is that this needs to be fixed or reverted. Regressing the regex engine between releases is a bad idea. This has lingered for far too long. |
|
About small number of tests inside std.internal.uni ---- these functions are actually used & tested inside std.regex. |
used elsewhere isn't a valid excuse. Each module should be independently tested. |
New version of std.regex, with all changes that followed from review.