-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTCondor support #100
Comments
cc @jrbourbeau
…On Wed, Jul 18, 2018 at 7:22 AM, S ***@***.***> wrote:
Does this project have plans for HTCondor support?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#100>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszCVVGaVy-TDso7yUO8acGXkE6BGZks5uHxqAgaJpZM4VUavf>
.
|
I don't think we have any current plans to implement an HTCondorCluster but it does seem in scope for this project. If an interested developer wanted to give it a try, I suspect it would be relatively straightforward. That said, I don't have any experience with HTCondor so take my measure of "straightforward" with a grain of salt. |
I think that @jrbourbeau has a prototype that he's testing on an in-house
cluster
…On Thu, Jul 19, 2018 at 1:06 PM, Joe Hamman ***@***.***> wrote:
I don't think we have any current plans to implement an HTCondorCluster
but it does seem in scope for this project. If an interested developer
wanted to give it a try, I suspect it would be relatively straightforward.
That said, I don't have any experience with HTCondor so take my measure of
"straightforward" with a grain of salt.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#100 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszPTMcVfq8A5KAw1O8KGtxuEL2Xl8ks5uILyTgaJpZM4VUavf>
.
|
Sorry for the delayed response. @mrocklin is correct, I'm currently working on an |
@jrbourbeau That's great! Looking forward to it. |
@jrbourbeau any update? Do you have something to share? Maybe @szs8 would like to take a look at this? |
Apologies for letting this linger. I got side-tracked with other work and haven't come back to this yet. @szs8 if you, or anyone else, are interested in contributing here, please feel free to take over. |
@jrbourbeau Where does the existing implementation live? |
@jrbourbeau do you have any prototype of HTCondorCluster to share, as mentioned above? |
Hi |
If anyone with access to an HTCondor cluster give it a try, I'd be happy to help! |
Hi |
I was just asked by a supervisor about this compatibility. Google searches brought me to @matyasselmeci and their repository here. I'm hoping they or someone else can give another status update. |
Hi @djhoese and others, My project was written to work with dask/distributed instead of dask/dask-jobqueue so some work would have to be done to adapt it. My code differs from the other from the other batch system interfaces primarily because we wanted it to work without a shared filesystem, by having the user build a special tarball (using build-worker-tarball or build-worker-tarball-conda) and having the worker jobs use HTCondor file transfer to transfer it. The main difficulty I ran into in the end was that I couldn't figure out a way of building the worker tarball that was sufficiently user-friendly; because of this and because of a lack of interest in our department, the project fizzled. I don't have time to work on the project anymore but I'm happy to answer any questions if someone wants to use the code as a starting point... |
@matyasselmeci That's too bad. Do you have any idea how much work would be needed to migrate your distributed PR to jobqueue? What if you assumed that users had a shared/networked file system? |
I haven't looked at jobqueue at all, so I don't know how much the interface is different from distributed. I can get you an estimate later this week. Since it sounds like there's interest in this again, I'll talk to my PI about getting some time to work on it. |
@matyasselmeci That would be great. My "secret" goal is to possibly get a JupyterHub instance running on the University of Wisconsin clusters, or at least at the SSEC where I work. I assume you also work on campus? If you want to meet in person to discuss some of this stuff and my ideas, let me know. |
@matyasselmeci or @simone-codeluppi, again I would be happy to help extract the relevant bits of your scripts to a JobQueueCluster implementation for HTCondor. I think this should be quite easy if you've already run a Dask cluster using |
@guillaumeeb yes, it looks like the changes wouldn't be too severe. One thing is that dask_condor uses HTCondor's Python bindings (which include some compiled code) to submit and control jobs instead of command-line tools (as the other JobQueueClusters do), so it may have to stay a separate project. @djhoese yes, I work at Comp Sci. My colleagues and I would be very interested in meeting with you and discussing your projects; mind if I contact you via email? (I found your SSEC email address in the UW directory, if that's a good one to use.) |
@matyasselmeci Yes that works. |
I've started work on this and it shouldn't be too difficult; I can have a branch in dask_condor that's usable for external testing in a week or two. @mrocklin, if you're OK with adding an optional dependency on htcondor ( |
Probably not my decision. I'd check with @guillaumeeb
…On Tue, Jan 22, 2019, 8:20 AM Matyas Selmeci ***@***.*** wrote:
I've started work on this and it shouldn't be too difficult; I can have a
branch in dask_condor that's usable for external testing in a week or two.
@mrocklin <https://github.com/mrocklin>, if you're OK with adding an
optional dependency on htcondor (
extras_require=dict(htcondor="htcondor>=8.6.0") in setup.py) then I can
turn it into a pull request after that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#100 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszABT8yamqinOpNzJHV4LX-kEbbRJks5vFzpBgaJpZM4VUavf>
.
|
I've no strong argument against it, provinding its not too big and don't break other things! I've some questions though:
My main concern is future ease of maintenance, I'd rather not have a too specific solution for HTCondor. But I think the priority is to have a solution! |
I'm using JobQueueCluster as the base class; I'll try to make it as compatible as possible and will follow the conventions the other batch system implementations are using. |
I mostly have to override start_workers and stop_workers/stop_jobs since it doesn't work by creating submission script and running it. If you want, I can split out some of the advanced features (e.g. file transfer) into a subclass that won't be part of the PR and I can maintain separately. |
@matyasselmeci - thanks for sticking around here and your continued interest. I have not personally used HTCondor so forgive me if my questions are off base a bit. I'm mostly interested in discussing if there is value in have the It seems we could basically template out a job script (just like in the PBSCluster) that looks like:
and then the submit/cancel commands: submit_command = 'condor_submit'
cancel_command = 'condor_rm' It seems like additional arguments are possible in the job script to specify wall time, resources, etc. Now, I could be missing a fundemental part of HTCondor so please correct me if I'm wrong here. |
Little more complicated than that; HTCondor is used for submitting multiple jobs at the same time so the job ID has the form Basically the devil is in the details. It's possible to do it that way but you'll need to do a lot of pre-processing of the parameters anyway or else it won't behave the way you expect it to. |
I've no strong opinion here. I would prefer to have the same scheme for all implementations as said before, but I'm happy if we can have a different working solution, that still fits in dask-jobqueue. If you've started work on this and have some interest in it, I say keep going and ping us as soon as you have something thats understandable so that we can give feedback! |
Just a note: it may be useful to look at |
Closed by #245. |
Does this project have plans for HTCondor support?
The text was updated successfully, but these errors were encountered: