Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between MLton and MPL block-sizes ? #112

Closed
hai-nguyen-van opened this issue Feb 19, 2020 · 6 comments
Closed

Difference between MLton and MPL block-sizes ? #112

hai-nguyen-van opened this issue Feb 19, 2020 · 6 comments

Comments

@hai-nguyen-van
Copy link
Contributor

hai-nguyen-van commented Feb 19, 2020

Hello! I've been trying to compile our solver (Heron) with MPL instead of MLton. Our goal is to switch to MPL in order to enjoy parallelism in the future.

No runtime-args. Most of our regression tests pass, but not tests that are memory-greedy. In such cases, our program is simply killed without any information.

With runtime-args. I tried to increase the block-size to 256M. Results are slightly better but now the output is

ERROR  [P00|gc/chunk.c:326]: Out of memory. Unable to allocate new chunk of size 268435456.
Aborted (core dumped)

Our solver still correctly executes with MLton. Do you think this problem is related to block-sizes ?

Known issues. Int.toString and Real.fromString are used but not in parallel so it should not be relevant. None of the unsupported MLton Features are used.

Thanks. Best,

@shwestrick
Copy link
Collaborator

shwestrick commented Feb 21, 2020

So your programs are definitely running out of memory. Often, this triggers an error message like you are seeing, but not always, because Linux might also kill your program without warning.

Do you have a sense for how big the working sets are for the failing programs? To estimate working set size, you could measure the resident set size of the program when compiled with MLton. And how much memory does your machine have?

Note that parallel programs use more memory than sequential programs: on P processors it's possible to need as much as a factor P times more memory.

@hai-nguyen-van
Copy link
Contributor Author

hai-nguyen-van commented Feb 23, 2020

Oh I see. So here is the result of ps u made on one of the regression tests:

./heron --use examples/PowerWindow.tesl MLton MPL*
Virtual set size (in KB) 95,192 4,735,836
Resident set size (in KB) 85,824 4,249,488

*: execution made with runtime-args: @mpl procs 1 --

I believe there's no workaround then... except splitting workings sets to share between procs 🤔

@shwestrick
Copy link
Collaborator

I'm surprised to see such a large gap. If you're willing to share, could I take a look at your code?

@hai-nguyen-van
Copy link
Contributor Author

hai-nguyen-van commented Feb 23, 2020

For sure, just clone the git and checkout commit 9a927476bec865fbbf9f09e4c01b6c22dcca8fd2

git clone https://github.com/heron-solver/heron.git
cd heron
git checkout 9a927476bec865fbbf9f09e4c01b6c22dcca8fd2

To compile with MLton, simply do

make

... or with MPL:

docker run -v "$PWD":"$PWD" -w "$PWD" shwestrick/mpl make CC=mpl

Then, to run the solver on the example aforementioned:

./heron --use examples/PowerWindow.tesl

EDIT: Some important optimizations have been made on the working sets and now the MPL-produced program is not killed anymore. Still, memory usage remains relatively huge. So, please checkout on the above mentioned commit to see the program being killed.

@shwestrick
Copy link
Collaborator

Ah, I've figured it out. Here's a commit that fixes the memory problem: shwestrick/heron@e41059c

The GC in MPL is integrated with the scheduler and is actually disabled entirely if the fork/join library is not loaded. So first, I needed to put this in the .mlb:

$(SML_LIB)/basis/fork-join.mlb

Then, there is also a heuristic in the GC design which I had to work around. At the moment, MPL avoids performing GC at the "top level" because it often interferes with parallelism. For sequential programs this is obviously broken, but we've only been running MPL on parallel programs so far! So, I put in a ForkJoin.par to trick MPL into thinking that your program is parallel ;)

With these changes, when I run ./heron --use examples/PowerWindow.tesl I see a max resident set size of 52428 KB. Much more reasonable!

@hai-nguyen-van
Copy link
Contributor Author

Great, this is a good start. i will investigate to which extent we can enjoy MPL's parallelism for our problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants