Merge pull request #335 from Parsl/class-based-configs-#133

Class based configs #133
Parsl · Jul 3, 2018 · b37716e · b37716e
2 parents 6b7aa54 + 0ec957a
commit b37716e
Show file tree

Hide file tree

Showing 103 changed files with 2,000 additions and 2,525 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -37,7 +37,8 @@
     'sphinx.ext.autodoc',
     'sphinx.ext.autosummary',
     'sphinx.ext.intersphinx',
-    'sphinx.ext.linkcode'
+    'sphinx.ext.linkcode',
+    'sphinx.ext.napoleon'
 ]
 
 url = 'https://raw.githubusercontent.com/Parsl/parsl-tutorial/master/parsl-introduction.ipynb'
@@ -58,6 +59,10 @@ def linkcode_resolve(domain, info):
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3', None),
+    'libsubmit': ('https://libsubmit.readthedocs.io/en/stable', None)
+}
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
@@ -109,7 +114,7 @@ def linkcode_resolve(domain, info):
 # The reST default role (used for this markup: `text`) to use for all
 # documents.
 #
-default_role = 'all'
+default_role = 'any'
 
 # If true, '()' will be appended to :func: etc. cross-reference text.
 #

diff --git a/docs/devguide/dev_docs.rst b/docs/devguide/dev_docs.rst
@@ -288,8 +288,7 @@ srunMpiLauncher
 Flow Control
 ============
 
-This section deals with functionality related to controlling the flow of tasks to various different
-execution sites.
+This section deals with functionality related to controlling the flow of tasks to various executors. 
 
 FlowControl
 -----------

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -56,16 +56,16 @@ to write the object into a file and use files to communicate between Apps.
 How do I specify where Apps should be run?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Parsl's multi-site support allows you to define the site (including local threads)
+Parsl's multi-executor support allows you to define the executor (including local threads)
 on which an App should be executed. For example:
 
 .. code-block:: python
 
-     @app('python', dfk, sites=['SuperComputer1'])
+     @app('python', dfk, executors=['SuperComputer1'])
      def BigSimulation(...):
          ...
 
-     @app('python', dfk, sites=['GPUMachine'])
+     @app('python', dfk, executors=['GPUMachine'])
      def Visualize (...)
          ...
 
@@ -78,33 +78,26 @@ the workers can connect back. While our pilot job system, ipyparallel,
 can identify the IP address automatically on certain systems,
 it is safer to specify the address explicitly.
 
-Here's how you specify the address in the config dictionary passed to the DataFlowKernel:
+To specify the address in the :class:`~parsl.config.Config` (note this is an example
+using the :class:`libsubmit.providers.cobalt.cobalt.Cobalt`; any other provider could
+be substituted below):
 
 .. code-block:: python
 
-    multiNode = {
-        "sites": [{
-            "site": "ALCF_Theta_Local",
-            "auth": {
-                "channel": "ssh",
-                "scriptDir": "/home/{}/parsl_scripts/".format(USERNAME)
-            },
-            "execution": {
-                "executor": "ipp",
-                "provider": '<SCHEDULER>'
-                "block": { # Define the block
-                    ...
-                }
-            },
-        }],
-        "globals": {
-            "lazyErrors": True,
-    },
-        "controller": {
-        "publicIp": '<AA.BB.CC.DD>'  # <--- SPECIFY PUBLIC IP HERE
-        }
-    }
-
+    from libsubmit.providers.cobalt.cobalt import Cobalt
+    from parsl.config import Config
+    from parsl.executors.ipp import IPyParallelExecutor
+    from parsl.executors.ipp_controller import Controller
+
+    config = Config(
+        executors=[
+            IPyParallelExecutor(
+                label='ALCF_theta_local',
+                provider=Cobalt(),
+                controller=Controller(public_ip='<AA.BB.CC.DD>')  # specify public ip here
+            )
+        ],
+    )
 
 .. _pyversion:
 
@@ -204,21 +197,29 @@ There are a few common situations in which a Parsl script might hang:
      ssh over to a machine that is public facing. Machines provisioned from
      cloud-vendors setup with public IPs are another option.
 
-   * Parsl hasn't autodetected the public IP.
-     This can be resolved by manually specifying the public IP via the config:
-
-     .. code-block:: python
-
-        config["controller"]["publicIp"] = 8.8.8.8
+   * Parsl hasn't autodetected the public IP. See `Workers do not connect back to Parsl`_ for more details.
 
    * Firewall restrictions that block certain port ranges.
      If there is a certain port range that is **not** blocked, you may specify
-     that via the config:
+     that via the :class:`~parsl.executors.ipp_controller.Controller` object:
 
      .. code-block:: python
 
-        # Assuming ports 50000 to 55000 are open
-        config["controller"]["portRange"] = "50000,55000"
+        from libsubmit.providers.cobalt.cobalt import Cobalt
+        from parsl.config import Config
+        from parsl.executors.ipp import IPyParallelExecutor
+        from parsl.executors.ipp_controller import Controller
+
+        config = Config(
+            executors=[
+                IPyParallelExecutor(
+                    label='ALCF_theta_local',
+                    provider=Cobalt(),
+                    controller=Controller(port_range='50000,55000')
+                )
+            ],
+        )
+
 
 How can I start a Jupyter notebook over SSH?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/docs/reference.rst b/docs/reference.rst
@@ -9,9 +9,16 @@ Reference guide
     parsl.set_file_logger
     parsl.app.app.App
     parsl.app.futures.DataFuture
+    parsl.config.Config
     parsl.dataflow.futures.AppFuture
     parsl.dataflow.dflow.DataFlowKernelLoader
+    parsl.data_provider.data_manager.DataManager
     parsl.data_provider.files.File
+    parsl.executors.base.ParslExecutor
+    parsl.executors.threads.ThreadPoolExecutor
+    parsl.executors.ipp.IPyParallelExecutor
+    parsl.executors.ipp_controller.Controller
+    parsl.executors.swift_t.TurbineExecutor
 
 .. autosummary::
     :toctree: stubs

diff --git a/docs/userguide/app_caching.rst b/docs/userguide/app_caching.rst
@@ -1,16 +1,17 @@
 .. _label-appcaching:
 
-AppCaching
+App caching
 ----------
 
 When developing a workflow, developers often run the same workflow
 with incremental changes over and over. Often large fragments of
 a workflow will not have changed, yet apps will be executed again, wasting
-valuable developer time and computation resources. ``AppCaching``
+valuable developer time and computation resources. App caching
 solves this problem by storing results from apps that have completed
-so that they can be re-used. By default caching is **not** enabled.
-It must be explicitly enabled, either globally via the configuration, 
-or on each app for which caching is desired. 
+so that they can be re-used. App caching can be enabled by setting the `cache`
+argument to the :func:`~parsl.app.app` decorator to `True` (by default it is `False`). App caching
+can be globally disabled by setting `app_cache=False` (which by default is `True`)
+in the :class:`~parsl.config.Config`.
 
 .. code-block:: python
 
@@ -19,17 +20,17 @@ or on each app for which caching is desired.
        return 'echo {}'.format(msg)
 
 
-AppCaching can be particularly useful when developing interactive workflows such as when
+App caching can be particularly useful when developing interactive workflows such as when
 using a Jupyter notebook. In this case, cells containing apps are often re-executed as 
-during development. Using AppCaching will ensure that only modified apps are re-executed.
+during development. Using app caching will ensure that only modified apps are re-executed.
 
 Caveats
 ^^^^^^^
 
-It is important to consider several important issues when using AppCaching:
+It is important to consider several important issues when using app caching:
 
-- Determinism:  AppCaching is generally useful only when the apps are deterministic.
-  If the outputs may be different for identical inputs, AppCaching will hide
+- Determinism:  App caching is generally useful only when the apps are deterministic.
+  If the outputs may be different for identical inputs, app caching will hide
   this non-deterministic behavior. For instance, caching an app that returns
   a random number will result in every invocation returning the same result.
 
@@ -38,30 +39,8 @@ It is important to consider several important issues when using AppCaching:
   result is yet available. Once one such app completes and the result is cached
   all subsequent calls will return immediately with the cached result.
 
-- Performance: If AppCaching is enabled, there is likely to be some performance
+- Performance: If app caching is enabled, there is likely to be some performance
   overhead especially if a large number of short duration tasks are launched rapidly.
 
 .. note::
    The performance penalty has not yet been quantified.
-
-
-Configuration
-^^^^^^^^^^^
-
-AppCaching may be disabled globally in the configuration. If the
-``appCache`` is set to ``False`` all AppCaching is disabled. 
-By default the global ``appCache`` is **enabled**; however, AppCaching for each
-app is disabled by default. Thus, users must explicitly enable AppCaching 
-on each app.
-
-AppCaching can be disabled globally in the config as follows:
-
-    .. code-block:: python
-
-       config = {
-           "sites": [{ ... }],
-           "globals": {
-                "appCache": False # <-- Disable AppCaching globally
-       }
-
-       dfk = DataFlowKernel(config=config)