Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSP: tasks seem to be executed duplicately and concurrently #2818

Closed
pieter-bos opened this issue Oct 4, 2023 · 4 comments · Fixed by #2980
Closed

BSP: tasks seem to be executed duplicately and concurrently #2818

pieter-bos opened this issue Oct 4, 2023 · 4 comments · Fixed by #2980
Milestone

Comments

@pieter-bos
Copy link
Contributor

Setup:

import mill._
import scalalib._
import os._

object tasks extends JavaModule {
    def directory = T.sources {
        Thread.sleep(1000)
        os.write(T.dest / (Thread.currentThread.getId.toString + "-" + System.nanoTime), "hello")
        T.dest
    }

    def resources = T { 
        directory()
    }

    def sources = T { 
        directory()
    }

    def allStuff = T { 
        resources()
        sources()
        ()
    }
}

As far as I understand mill's concurrency model: different tasks may be executed in parallel and their order is not guaranteed (other than implied by task dependencies). However, one specific task is always run with a fresh T.dest and only once at a time. This seems to be the case if I run e.g. mill -j 8 tasks.allStuff here: directory is evaluated only once (assumed from run time) and from a fresh T.dest (afterwards there is only one file there).

Via BSP it seems this is not the case. I tried importing this build configuration from IntelliJ, and can also recreate this with a small python script. The idea is I request buildTarget/sources and buildTarget/resources without waiting for a response for the first - as IntelliJ also seems to do. Afterwards I can see that directory was run twice in parallel within the same T.dest:

$ ls out/tasks/directory.dest/
47-13089034942802  49-13089037254006

It seems to me that this should never be possible, though I apologize if I'm misunderstanding something :)

@pieter-bos
Copy link
Contributor Author

All the above was tested on 0.11.4. I notice there's significant changes to BSP stuff in 0.11.5, but the issue does still reproduce in 0.11.5.

@lefou
Copy link
Member

lefou commented Oct 4, 2023

I'd assume, that if the source and the resource request run in parallel, and as there is no previous cached result, both also need to evaluate the directory target (for the first time). Mill has no mechanism to detect and avoid concurrent execution of the same task. This is rarele needed, as the task dependency tree only runs each dependency once, even in parallel mode. But in case you run two Mill processes in parallel or BSP runs multiple requests in parallel, this can become an issue.

@lefou
Copy link
Member

lefou commented Oct 4, 2023

I opened discussion #2820 to address this

@lefou
Copy link
Member

lefou commented Jan 18, 2024

I opened PR #2980 with a fix.

Thank you @pieter-bos for this excellent bug description and the python script to reproduce!

lefou added a commit that referenced this issue Jan 21, 2024
…2980)

With this change, we synchronize evaluations of the same terminal task.
This isn't necessarily needed for normal Mill executions, but in a BSP
context, where multiple requests where handled concurrently in the same
Mill instance, evaluating the same task concurrently can happen.

We don't synchronize multiple Mill-instances (e.g. run in two shells) or
multiple evaluator-instances (which should have different `out`-dirs
anyway).

Fix: #2818

Pull request: #2980
@lefou lefou added this to the 0.11.7 milestone Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants