-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running bacalhau get
fails to merge results when downloading
#3677
Comments
Yes this an expected behaviour as both executions are generating conflicting This job works as both executions are generating random files under
The result directory structure looks like:
|
Yes, I understand this behavior is expected given the current implementation of the code - but from a UX perspective this unexpected imo. We're basically saying "You're holding it wrong" for a relatively simple task and implying we expect users to write jobs that name all their output files different when run on multiple nodes. Perhaps we could make the merge behavior something users opt into rather than out of? |
Agreed - we should save both simultaneously, and tell the user that. Eg /<output name>-<short node ID> For all nodes |
Would something like:
be acceptable? |
the problem is you won't get deterministic outputs or structure of the result directory. but at the same time I hate this kind of magic that works most of the time, but fails on others. Also we are not really doing a good job in merging stdout, stderr or exit code. also I wouldn't group results by node-id as the same node might run multiple executions for the same job, such as when we have a fat node in the network. We don't do this today, but we should. So options can be:
|
Is determinism in output still something we care about? I'm not sure I see this as a problem, rather a side effect of the system.
I hate the magic too, because it's the furthest thing from magic when it doesn't work. I don't think we should be merging outputs, it's almost always not what I want when I download the results of a job.
Humm, yeah this is a fair point. On the other hand I like seeing which result came from which node. sooo....
What if we group by node and execution, that is:
I am not sure what this should look like when there is more than 1 task, thoughts? |
We need to balance between correctness and have a great UX
It is not a great user experience if users want to pipe the download command or do some sort of automation when the the download path is not predictable. I understand they play with wildcards, but still not a nice UX
Why do users care about the node id when reading results? I understand it might be useful for debugging, which users can know by calling
Too many levels is not a great experience. We can always add a |
Steps to reproduce:
bacalhau get
bacalhau get j-1cbabf4e-3ac9-4496-adb5-34d43248a308 Fetching results of job 'j-1cbabf4e-3ac9-4496-adb5-34d43248a308'... error downloading job: cannot merge results as output already exists: /home/frrist/workspace/src/github.com/bacalhau-project/bacalhau/job-j-1cbabf4e/output_custom/output.txt. Try --raw to download raw results instead of merging them
The text was updated successfully, but these errors were encountered: