-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize dummy and simulation modes #1859
Comments
Regards simulation mode reliability: it currently crashes on reload. But I won't bother fixing that if we can remove the mode instead. |
I've actually used both of these and appreciate having the separation there - for suite development, the simulation mode is great as it removes any particulars of the batch system when trying to identify tasks and the relationship between them. Dummy mode is great, especially when moving suites between machines: I like to be able to test that the job submission is set up properly without worrying about the particulars of the scripts, e.g. if not all the data is available yet, etc. I have a hard time remembering which one is which (simulation vs. dummy) but I like having the hierarchy of "check the tasks" => "check job submission" => "run live" without needing to change the suite (although I admittedly will sometimes drop the processor count request for the dummy mode, but I want to check that it works for real before going live!) |
@trwhitcomb - good to see you're still keeping tabs on us :-) OK, this raises a couple of questions. Or a question and a statement, anyway:
Still, you have a point. Modified proposal:
(Regarding 2. - can script items, e.g. |
Not sure why having logs (and generating job scripts) is a good thing. Surely there's no need for logs etc. to be generated from a task that sleeps some number of seconds purely for workflow testing? Similarly, I don't see the benefit of creating a bunch of task processes with all the associated overheads (cylc message etc.) when just testing workflow.
Depends where you're running I guess, I don't know that its something we can safely assume isn't needed. There are some weird and convoluted hpc platforms out there... |
@arjclark - the point of a simulation mode IMO is simply to simulate running a suite without the huge overheads and wait-times of real jobs, not just to test the workflow (testing the scheduling is the most important aspect, of course, but it can be useful for other things too, e.g. to quickly populate a suite DB just as a real run would). Simulation by means of dummy jobs achieves this goal with zero complexity and maintenance cost to us - simply dummy out By contrast, the current simulation mode is quite complicated (and currently somewhat broken!) because we have to fake out job submission, job execution, task communications, and everything associated with those processes, and then ensure it doesn't break in future despite all those differences from live mode. Why should we bother with all that just to avoid the small overhead of running dummy jobs on the suite host? (if that's a problem you certainly won't be able to run the real suite!) |
To be as clear as mud, I'll put my complete "modified proposal" down in one place: modified proposal
|
I think I'm OK with this. It simplifies the code, and keeps the ability to test the workflow and the job submission. For the list that @hjoliver gave, (2) and (3) may even be able to be controlled via an option switch rather than a separate mode (so there's just live mode and dummy mode, and dummy mode can use or ignore the job submission settings). |
@trwhitcomb - that's good, just gotta convince @arjclark now 😁 . I like your dummy mode option switch idea. I |
@hjoliver - maybe one to discuss in our "issue prioritisation" conversation when we see you? |
@arjclark - sure. BTW, I'm not quite as determined to dump simulation mode as the verbosity of this ticket might suggest. I just don't like the way it is currently implemented, and I think the marginal benefit (if any) over dummy mode does not justify maintaining it as is. However, another option might be a better-designed simulation mode modelled (in terms of simplicity) on the dummy mode: alternative proposal
We should probably also check that anything related to real job execution - timeouts, retries, event handlers - are disabled in simulation (and dummy?) mode. (may already be OK). |
One issue that I've run into is that the simulation mode/dummy mode doesn't handle it well when you have tasks that use message triggers. I suppose you could handle this by explicitly specifying a dummy mode script, but as long as we're discussing switching how these modes work, since you need to specify the outputs from a task in the suite.rc file, it would be nice to actually emit those from the simulated task (in addition to any sleep commands) so that things dependent on the messages don't stall out. |
@trwhitcomb - that's #1420 - easy enough to fix, and yes we might as well do both at once. |
[meeting]
|
(small, later: may need to do dummy and simulation separately - dummy is definitely small) |
In dummy mode, tasks are supposed to run (real) dummy jobs instead of the real jobs.
In simulation mode, tasks are supposed to not run real jobs at all, just simulate job execution.
The purpose of both of these is really to test-run the suite quickly without running compute-intensive real jobs.
As implemented, dummy mode is not very useful on real suites. Task
script
items (but notenv-script
,pre-script
, etc.) are replaced with dummy scripting, but everything else is left alone - job submission and hosting for e.g. (and users shouldn't be submitting dummy jobs to a remote HPC with resource directives intended for a huge parallel model run...).Simulation mode, on the other hand, probably shouldn't be assumed to be a reliable test of the system because it has to fake a bunch important processes associated with job submission and execution, and it doesn't generate the usual job output files or populate the suite DB properly. Even if it was reliable, I can't really think of a use-case for it that wouldn't be covered by a better dummy mode.
Proposal
script
itemThis will result in a dummy mode that works for any real suite: every task will run dummy background jobs on the suite host. (Potentially we could rename it "simulation mode").
If a user wants to do more complex things, such as run dummy jobs on the real task hosts, that can be done with a small built-for-purpose test suite, or by editing the real suite.
@cylc/core - do you agree with this?
(motivation: to diagnose this problem #1857 (comment) I had to run a large real suite in dummy mode, and it required extensive surgery to get the result that I'm suggesting should be automatic in dummy mode).
The text was updated successfully, but these errors were encountered: