Skip to content

Conversation

@Simn
Copy link
Member

@Simn Simn commented Mar 28, 2025

Investigating CI hangs...

Simn added 4 commits March 28, 2025 18:31
This reverts commit 97cc112.

# Conflicts:
#	src/optimization/analyzer.ml
this is probably not the problem though
@Simn
Copy link
Member Author

Simn commented Mar 28, 2025

I'm having a hard time understanding how what I've done so far could cause compiler hangs. While there's an amount of parallelism, there's almost no actual synchronization that could deadlock. Outside of genjvm, the only mutex that was added is around a mkdir operation.

Up until now, the only problems were caused by unsynchronized access to shared data, which usually causes some explicit exception. A stack overflow could in theory cause a hang due to tail-call elimination, but I don't see how we would cause one. The changes in 47c3bf7 could of course be related to something like this, though I'd expect there to be problems even outside of parallel-world if that was the case.

I haven't seen the setup timeout in the server tests since 7b20817, but there was a hang in the misc test and a few seemingly random failures in some server tests. For now, the best I can do is observe.

@Simn
Copy link
Member Author

Simn commented Mar 29, 2025

Logging some failures:

  • Hand on misc test Running haxe projects/Issue2538/compile-fail.hxml
  • Hang on misc test Running haxe projects/Issue4742/compile-fail.hxml
  • Failure on server test Issue8004 with @:variant("Neko", "neko", "test.n")

A striking commonality between these three is that they all compile to neko.

Edit: Unfortunately, "Setup timeout" has returned too.

Edit: Development hangs on Running haxe projects/Issue4742/compile-fail.hxml once more.

@Simn
Copy link
Member Author

Simn commented Mar 29, 2025

One interesting thing about both 2538 and 4742 is that the tests raise an Error exception during the analyzer. I thought this wouldn't be a problem, but thinking about it some more it's not obvious how the control flow should behave here.

Copy link
Member Author

@Simn Simn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts that this is how real parallel programmers handle such things, but it makes some sense to me...

@Simn
Copy link
Member Author

Simn commented Mar 30, 2025

Looks like that did the trick! I can't say I fully understand the behavior here, but the conclusion is to not have unhandled exceptions in your task runners.

@Simn Simn closed this Mar 30, 2025
@Simn Simn deleted the revert_the_world branch March 30, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants