Update docs (#50)

* Add "how this works" section and fill in examples * add dask stylesheet * [skip ci] add history.rst * avoid recommending small jobs * [skip ci] respond to comments
dask · May 3, 2018 · e999e5a · e999e5a
1 parent 9f7e585
commit e999e5a
Show file tree

Hide file tree

Showing 4 changed files with 112 additions and 35 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -187,3 +187,6 @@
     'distributed': ('https://distributed.readthedocs.io/en/stable/',
                     'https://distributed.readthedocs.io/en/stable/objects.inv')
 }
+
+def setup(app):
+    app.add_stylesheet("http://dask.pydata.org/en/latest/_static/style.css")
diff --git a/docs/examples.rst b/docs/examples.rst
@@ -2,18 +2,49 @@ Example Deployments
 ===================
 
 Deploying dask-jobqueue on different clusters requires a bit of customization.
-Below, we provide a few example deployments:
+Below, we provide a few examples from real deployments in the wild:
 
+Additional examples from other cluster welcome `here <https://github.com/dask/dask-jobqueue/issues/40>`_.
 
-Example PBS Deployment
-----------------------
+PBS Deployments
+---------------
 
 .. code-block:: python
 
    from dask_jobqueue import PBSCluster
 
-   cluster = PBSCluster(processes=6, threads=4, memory="16GB")
-   cluster.start_workers(10)
+   cluster = PBSCluster(queue='regular',
+                        project='DaskOnPBS',
+                        local_directory=os.getenv('TMPDIR', '/tmp'),
+                        threads=4,
+                        processes=6,
+                        memory='16GB',
+                        resource_spec='select=1:ncpus=24:mem=100GB')
 
-   from dask.distributed import Client
-   client = Client(cluster)
+   cluster = PBSCluster(processes=18,
+                        threads=4,
+                        memory="6GB",
+                        project='P48500028',
+                        queue='premium',
+                        resource_spec='select=1:ncpus=36:mem=109G',
+                        walltime='02:00:00',
+                        interface='ib0')
+
+SGE Deployments
+---------------
+
+Examples welcome `here <https://github.com/dask/dask-jobqueue/issues/40>`_
+
+SLURM Deployments
+-----------------
+
+.. code-block:: python
+
+   from dask_jobqueue import SLURMCluster
+
+   cluster = SLURMCluster(processes=4,
+                          threads=2,
+                          memory="16GB",
+                          project="woodshole",
+                          walltime="01:00",
+                          queue="normal")
diff --git a/docs/history.rst b/docs/history.rst
@@ -0,0 +1,13 @@
+History
+=======
+
+This package came out of the `Pangeo <https://pangeo-data.github.io/>`_
+collaboration and was copy-pasted from a live repository at
+`this commit <https://github.com/pangeo-data/pangeo/commit/28f86b9c836bd622daa14d5c9b48ab73bbed4c73>`_.
+Unfortunately, development history was not preserved.
+
+Original developers from that repository include the following:
+
+-  `Jim Edwards <https://github.com/jedwards4b>`_
+-  `Joe Hamman <https://github.com/jhamman>`_
+-  `Matthew Rocklin <https://github.com/mrocklin>`_
diff --git a/docs/index.rst b/docs/index.rst
@@ -2,16 +2,16 @@
 Dask-Jobqueue
 =============
 
-*Easy deployment of Dask Distributed on job queuing systems such as
-PBS, Slurm, or SGE.*
+*Easy deployment of Dask Distributed on job queuing systems like
+PBS, Slurm, and SGE.*
 
 Motivation
 ----------
 
 1. While ``dask.distributed`` offers a flexible distributed parallel computing
    Python, it is not always easy to deploy on systems that use job queuing
    systems. Dask-jobqueue provides a Pythonic interface for deploying and
-   managing dask clusters.
+   managing Dask clusters.
 2. In practice, deploying distributed requires customization, both for the
    machine(s) that it will deployed on and for the specific application it will
    be deployed for. Dask-jobqueue provides users with an intuitive interface for
@@ -30,6 +30,8 @@ Example
    from dask.distributed import Client
    client = Client(cluster)
 
+See :doc:`Examples <examples>` for more real-world examples.
+
 
 Adaptivity
 ----------
@@ -40,39 +42,67 @@ resources when not actively computing.
 
 .. code-block:: python
 
-   cluster.adapt()
+   cluster.adapt(minimum=1, maximum=100)
 
+.. toctree::
+   :maxdepth: 1
+   :hidden:
 
-History
--------
+   install.rst
+   examples.rst
+   history.rst
+   api.rst
 
-This package came out of the `Pangeo <https://pangeo-data.github.io/>`_
-collaboration and was copy-pasted from a live repository at
-`this commit <https://github.com/pangeo-data/pangeo/commit/28f86b9c836bd622daa14d5c9b48ab73bbed4c73>`_.
-Unfortunately, development history was not preserved.
+How this works
+--------------
 
-Original developers include the following:
+This creates a Dask Scheduler in the Python process where the cluster object
+is instantiated:
 
--  `Jim Edwards <https://github.com/jedwards4b>`_
--  `Joe Hamman <https://github.com/jhamman>`_
--  `Matthew Rocklin <https://github.com/mrocklin>`_
+.. code-block:: python
 
-**Getting Started**
+   cluster = PBSCluster(processes=18,
+                        threads=4,
+                        memory="6GB",
+                        project='P48500028',
+                        queue='premium',
+                        resource_spec='select=1:ncpus=36:mem=109G',
+                        walltime='02:00:00')  # <-- scheduler started here
 
-* :doc:`install`
-* :doc:`examples`
+When you ask for more workers, such as with the ``scale`` command
 
-.. toctree::
-   :maxdepth: 1
-   :hidden:
-   :caption: Getting Started
+.. code-block:: python
 
-   install.rst
-   examples.rst
+   cluster.scale(10)
 
-.. toctree::
-   :maxdepth: 2
-   :hidden:
-   :caption: Contents:
+The cluster generates a traditional job script and submits that an appropriate
+number of times to the job queue.  You can see the job script that it will
+generate as follows:
 
-   api.rst
+.. code-block:: python
+
+   >>> print(cluster.job_script())
+
+.. code-block:: bash
+
+   #!/bin/bash
+
+   #PBS -N dask-worker
+   #PBS -q premium
+   #PBS -A P48500028
+   #PBS -l select=1:ncpus=36:mem=109G
+   #PBS -l walltime=02:00:00
+
+   /home/mrocklin/Software/anaconda/bin/dask-worker tcp://127.0.1.1:43745
+   --nthreads 4 --nprocs 18 --memory-limit 6GB --name dask-worker-3
+   --death-timeout 60
+
+Each of these jobs are sent to the job queue independently and, once that job
+starts, a dask-worker process will start up and connect back to the scheduler
+running within this process.
+
+If the job queue is busy then it's possible that the workers will take a while
+to get through or that not all of them arrive.  In practice we find that
+because dask-jobqueue submits many small jobs rather than a single large one
+workers are often able to start relatively quickly.  This will depend on the
+state of your cluster's job queue though.