Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI #636

Closed
Closed

Conversation

thomasuster
Copy link
Contributor

haxe compile.hxml; bin/TestMain 100

Seems to produce runtime errors like the following...

newest

Disabling HXCPP_GC_MOVING fixes it. Ran it 100 times on my labtop.

Waiting for travis OSX CI but if it passes please merge.

@Simn
Copy link
Member

Simn commented Sep 19, 2017

Shouldn't we instead figure out why these errors occur?

@hughsando
Copy link
Member

hughsando commented Sep 19, 2017 via email

@thomasuster
Copy link
Contributor Author

If it's a quick fix I'd half agree. I spent some hours on it already. The flag is not on by default so it makes more sense to me to not have it on for these tests anyways(maybe run tests both on and off or make generational default).

@hughsando hughsando closed this in b839712 Sep 20, 2017
@hughsando
Copy link
Member

The moving case is a "harder" case than the non-moving, so it is good to test this one - once it works, the non-moving should work too.
In this case, there was a logic error in the non-moving code but maybe it would not usually show up.

@thomasuster
Copy link
Contributor Author

Awesome! Nice work

@hughsando
Copy link
Member

I was asked how I debugged this - here is an outline:
First, I worked out how easy it was to reproduce. If I ran the test fresh (rather than in a code loop) on windows, it would crash about 1 in 5 times. This is good, because windows has the best debugging.
I added "HXCPP_DEBUG_LINK" define to show symbols in release - it still crashed, which is good.
The native debugger showed crashing in the mark phase, so I uncommented the " #define HXCPP_GC_DEBUG_LEVEL 1" line in Immix.cpp. This gives a much better call stack when debugging crashes in marking. After adding this, it still crashed, which is good.
The call stack showed it was always crashing marking the mBase in the mDeque object in the thread object. The deque uses a technique with it inherits from a Array<Dynamic> which I thought maybe I had misspelled one of the overrides, so I wasted some time there.
So I decided to trace changes to the mBase value. Since the issue was related to GC_MOVING, I put a printf in the moving code and indeed saw that the mBase pointer was being moved, but when it crashed, it was using the old pointer, not the moved pointer. The 'visit' operation is supposed to adjust these pointers. so I put a printf in there and it was not being called, so that is the bug.
To work out why visit was not called, I added some code to break in "VisitBlock" when it hit the thread object and worked out that it was the 'allocStart' mask being 0 that preventing visit from being called. This is now the bug.
So now, I stepped though the allocation of the thread object and watched for where it set the start flags - here I saw it was or-ing on gImmixStartFlag, but it was zero - so that was the bug.
From here, it was pretty obvious gImmixStartFlag was not initialized.
Because the gImmixStartFlag is set just after the thread object is created, it was only this object and the deque that were created in a bad state. The bug is random because the threaded test code we added is non-deterministic in order, so sometimes the deque would be moved and sometimes it would not.
This bug may also have shown up in a non-moving case - but I'm not sure.

@thomasuster
Copy link
Contributor Author

@hughsando Thank you for the detailed explanation!

@codeservice
Copy link
Contributor

codeservice commented Sep 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants