You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FabSim3 provided a multi-threading functionality to decrease the total job submission time for a large number of ensemble/replica
however, we the ensemble runs are really high (>30k) due to a high number of SSH connection, the submission process may FAILED
...
raise SSHException("SSH session not active")
paramiko.ssh_exception.SSHException: SSH session not active
there are a number of ways that we can handle this issue, but there will be degradation on total submission time
what do you think @djgroen ? what are your suggestions to tackle this issue?
The text was updated successfully, but these errors were encountered:
Any command with run() in job will scale the number of SSH connections by the job count. These need to be merged such that file staging is done for all jobs (or at least multiple jobs) in one go, rather than separately for each job.
That line changes permission for one file for each job. That one may actually be straightforward to refactor, because I think you can just chmod all subdirs in that results dir with a single command at the right time (i.e. after everything is uploaded)?
Lastly, there are several run commands after this line
I reduced the number of SSH connection during the job submission (c8cfadc, d162081)
Federica (@FGugole) could you please confirm that new implementation fixed your problem
also, could you please mention the campaign size that you used, and the total submission time
Yes, the new implementation fixed my problem and I was able to submit a campaign with 60k samples. The job submission took a long time (at least 10 days), but it was successful!
FabSim3 provided a multi-threading functionality to decrease the total job submission time for a large number of ensemble/replica
however, we the ensemble runs are really high (>30k) due to a high number of SSH connection, the submission process may FAILED
there are a number of ways that we can handle this issue, but there will be degradation on total submission time
what do you think @djgroen ? what are your suggestions to tackle this issue?
The text was updated successfully, but these errors were encountered: