Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Documentation feedback #40
Here are a few high level thoughts on the current documentation:
Looking at the main example on the main page I'm curious if it is realistic:
from dask_jobqueue import PBSCluster cluster = PBSCluster(processes=6, threads=4, memory="16GB") cluster.start_workers(10) from dask.distributed import Client client = Client(cluster)
Should we include project, queue, resource specs, and other keywords that might both be necessary for realistic use and also recognizable to users of that kind of system? Similarly I think it would be very useful to include a few real-world examples in the example deployments documentation. I suspect that this was the original intent of that page (nice idea!). Perhaps we can socialize this on the pangeo issue tracker and ask people to submit PRs for their clusters?
I recommend that we remove the history section from the main page
Description of how it works
My experience trying to explain these projects to users of HPC systems is that most of them are familiar with job scripts. I wonder if we might include a "How does this work?" section that shows the job script that we generate, and explain that we submit this several times.
referenced this issue
Apr 20, 2018
I agree with all the suggestions.
The main example in the main page is realistic to me (I probably updated it according to my use
cluster = PBSCluster(queue='regular', project='DaskOnPBS', local_directory=os.getenv('TMPDIR', '/tmp'), threads=4, processes=6, memory='16GB', resource_spec='select=1:ncpus=24:mem=100GB')
interface keyword is important too and should appear.
One thing worth noting on "how it works", is the fact that the Dask Scheduler is started on the host where the JobQueueCluster is initialized, so either on the Jupyter Notebook node, or on a login/interactive node from the HPC cluster for a typical use. This is not clear for all users, and there may be some more thoughts to have on this.
From my point of view, this is OK for prototyping or online analysis of data, but we should maybe propose a more appropriate way to submit batch process. My current opinion about that is to submit the main python script to the job queueing system with enough resources for the scheduler and main process, say 4 CPU and 20GB RAM, and with a long enough wall time for the computation, say 24 hours for example. Workers would then be spawned to the job queueing system from that node by the JobQueueCluster, potentially with a shorter walltime and different resources need.
I provided this on the dask documentation issue but I'll paste it here as well:
cluster = PBSCluster(processes=18, threads=4, memory="6GB", project='P48500028', queue='premium', resource_spec='select=1:ncpus=36:mem=109G', walltime='02:00:00', interface='ib0')
This is what I've been using on NCAR's HPC system Cheyenne.
Thanks @jhamman !…
On Mon, Apr 23, 2018 at 7:15 PM, Joe Hamman ***@***.***> wrote: I provided this on the dask documentation issue but I'll paste it here as well: cluster = PBSCluster(processes=18, threads=4, memory="6GB", project='P48500028', queue='premium', resource_spec='select=1:ncpus=36:mem=109G', walltime='02:00:00', interface='ib0') This is what I've been using on NCAR's HPC system Cheyenne. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#40 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszJdZsDu2Adwq-DDoAmVC6lesTgJvks5trmCZgaJpZM4Tdcvr> .
Looking at the most recent build of the docs, we could probably cleanup the docstrings a bit for the API docs:
I'd like to see us cleanup a few things:
Some of the docstrings probably can go in the