Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend OARCluster implementation to let OAR take into account the memory parameter #595

Merged
merged 4 commits into from Nov 10, 2022

Conversation

ychiat35
Copy link
Contributor

Following the issue #594 , we propose an extension of the existing OARCluster implementation to let OAR take into account the memory parameter.

The OAR scheduler does not deal with memory internally. Indeed, by default it is not possible to indicate to OAR to reserve a specific amount if memory on the wanted computing resources (e.g., one core with 256 GB memory). However, it is possible to leverage from OAR resource properties to ensure that the wanted resources have at least the wanted amount of memory.

Since the OAR property names are not standardized by OAR, their names might differ from one cluster to another. Consequently, we introduce a new parameter in OARCluster class: oar_mem_core_property_name. It lets users specify the name of the memory property of their own OAR cluster. This property will be used by adding a new #OAR -p line to the OAR submission. If the parameter is not used or set to None, our modification does not modify the current behavior of the OARCluster class, but users will be warned that the memory parameter will not be taken into account by OAR.

Copy link
Member

@guillaumeeb guillaumeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments on details, but in the general principle this looks good to me! I don't see any blocker to have this in, especially if it can ease the use of dask-jobqueue on OAR based clusters!

dask_jobqueue/jobqueue.yaml Outdated Show resolved Hide resolved
dask_jobqueue/oar.py Outdated Show resolved Hide resolved
dask_jobqueue/oar.py Outdated Show resolved Hide resolved
dask_jobqueue/oar.py Show resolved Hide resolved
dask_jobqueue/oar.py Outdated Show resolved Hide resolved
dask_jobqueue/oar.py Outdated Show resolved Hide resolved
dask_jobqueue/oar.py Outdated Show resolved Hide resolved
dask_jobqueue/tests/test_oar.py Show resolved Hide resolved
dask_jobqueue/tests/test_oar.py Outdated Show resolved Hide resolved
@ychiat35
Copy link
Contributor Author

ychiat35 commented Nov 4, 2022

Thanks a lot for the your quick review!
Updates following your comments are just committed. About your "worker_memory/worker_cores calculation" question, you can find above my explanations. Hope that it's clear for you and do not hesitate if you have other comments :)

Copy link
Member

@guillaumeeb guillaumeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ychiat35, this looks good to me!

@guillaumeeb guillaumeeb merged commit baabb4a into dask:main Nov 10, 2022
@lesteve
Copy link
Member

lesteve commented Nov 10, 2022

A bit late to the party, sorry! Here are the few comments I had, this would be very welcome if they could be tackled in a further PR:

  • use memory_per_core_property rather than mem_core_property to be more consistent with the memory parameter and also to be more explicit
  • test the case where there is already a property defined and you need to combine both properties on the same line (I don't think it is tested in this PR but maybe I am wrong and I missed it)
  • there is no way to silence the warning right now, right? So maybe have something like memory_per_core_property='not_applicable' for the clusters where there is no such property. If you tell me this rarely happens in practice then maybe it is fine to forget about this item.

More questions (not related to this PR in particular but to OAR support more generally):

  • where you running the tests on a OAR cluster?
  • it would be great to add a CI for OAR, if there was a single docker image with everything that is needed (similarly to the SLURM docker image we are using for the CI) or maybe with oar-docker if that does not exist

@ychiat35
Copy link
Contributor Author

Thanks @lesteve for your comments! Please find my commit in #598

  • I renamed the memory_per_core_property_name parameter to be more explicit.

  • I added also possibility to silence the warning for clusters where the property does not exist. One unit test is also added at the end of test_oar.py to include this case.

  • About the case where there is already a property defined, the unit test exists normally. At the end of the test_header() part, I tested the 'memory_per_core_property' combined with 'the gpu_count=1' request. Does it correspond to your expectation?

By the way, the OAR clusters that I use are Grid5000 and igrida :)

@lesteve
Copy link
Member

lesteve commented Nov 22, 2022

About the case where there is already a property defined, the unit test exists normally.

Yep I missed it somehow, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants