Skip to content

Commit

Permalink
Merge pull request #335 from Parsl/class-based-configs-#133
Browse files Browse the repository at this point in the history
Class based configs #133
  • Loading branch information
yadudoc committed Jul 3, 2018
2 parents 6b7aa54 + 0ec957a commit b37716e
Show file tree
Hide file tree
Showing 103 changed files with 2,000 additions and 2,525 deletions.
9 changes: 7 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.linkcode'
'sphinx.ext.linkcode',
'sphinx.ext.napoleon'
]

url = 'https://raw.githubusercontent.com/Parsl/parsl-tutorial/master/parsl-introduction.ipynb'
Expand All @@ -58,6 +59,10 @@ def linkcode_resolve(domain, info):
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),
'libsubmit': ('https://libsubmit.readthedocs.io/en/stable', None)
}
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
Expand Down Expand Up @@ -109,7 +114,7 @@ def linkcode_resolve(domain, info):
# The reST default role (used for this markup: `text`) to use for all
# documents.
#
default_role = 'all'
default_role = 'any'

# If true, '()' will be appended to :func: etc. cross-reference text.
#
Expand Down
3 changes: 1 addition & 2 deletions docs/devguide/dev_docs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -288,8 +288,7 @@ srunMpiLauncher
Flow Control
============

This section deals with functionality related to controlling the flow of tasks to various different
execution sites.
This section deals with functionality related to controlling the flow of tasks to various executors.

FlowControl
-----------
Expand Down
73 changes: 37 additions & 36 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,16 @@ to write the object into a file and use files to communicate between Apps.
How do I specify where Apps should be run?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Parsl's multi-site support allows you to define the site (including local threads)
Parsl's multi-executor support allows you to define the executor (including local threads)
on which an App should be executed. For example:

.. code-block:: python
@app('python', dfk, sites=['SuperComputer1'])
@app('python', dfk, executors=['SuperComputer1'])
def BigSimulation(...):
...
@app('python', dfk, sites=['GPUMachine'])
@app('python', dfk, executors=['GPUMachine'])
def Visualize (...)
...
Expand All @@ -78,33 +78,26 @@ the workers can connect back. While our pilot job system, ipyparallel,
can identify the IP address automatically on certain systems,
it is safer to specify the address explicitly.

Here's how you specify the address in the config dictionary passed to the DataFlowKernel:
To specify the address in the :class:`~parsl.config.Config` (note this is an example
using the :class:`libsubmit.providers.cobalt.cobalt.Cobalt`; any other provider could
be substituted below):

.. code-block:: python
multiNode = {
"sites": [{
"site": "ALCF_Theta_Local",
"auth": {
"channel": "ssh",
"scriptDir": "/home/{}/parsl_scripts/".format(USERNAME)
},
"execution": {
"executor": "ipp",
"provider": '<SCHEDULER>'
"block": { # Define the block
...
}
},
}],
"globals": {
"lazyErrors": True,
},
"controller": {
"publicIp": '<AA.BB.CC.DD>' # <--- SPECIFY PUBLIC IP HERE
}
}
from libsubmit.providers.cobalt.cobalt import Cobalt
from parsl.config import Config
from parsl.executors.ipp import IPyParallelExecutor
from parsl.executors.ipp_controller import Controller
config = Config(
executors=[
IPyParallelExecutor(
label='ALCF_theta_local',
provider=Cobalt(),
controller=Controller(public_ip='<AA.BB.CC.DD>') # specify public ip here
)
],
)
.. _pyversion:

Expand Down Expand Up @@ -204,21 +197,29 @@ There are a few common situations in which a Parsl script might hang:
ssh over to a machine that is public facing. Machines provisioned from
cloud-vendors setup with public IPs are another option.

* Parsl hasn't autodetected the public IP.
This can be resolved by manually specifying the public IP via the config:

.. code-block:: python
config["controller"]["publicIp"] = 8.8.8.8
* Parsl hasn't autodetected the public IP. See `Workers do not connect back to Parsl`_ for more details.

* Firewall restrictions that block certain port ranges.
If there is a certain port range that is **not** blocked, you may specify
that via the config:
that via the :class:`~parsl.executors.ipp_controller.Controller` object:

.. code-block:: python
# Assuming ports 50000 to 55000 are open
config["controller"]["portRange"] = "50000,55000"
from libsubmit.providers.cobalt.cobalt import Cobalt
from parsl.config import Config
from parsl.executors.ipp import IPyParallelExecutor
from parsl.executors.ipp_controller import Controller
config = Config(
executors=[
IPyParallelExecutor(
label='ALCF_theta_local',
provider=Cobalt(),
controller=Controller(port_range='50000,55000')
)
],
)
How can I start a Jupyter notebook over SSH?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
7 changes: 7 additions & 0 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,16 @@ Reference guide
parsl.set_file_logger
parsl.app.app.App
parsl.app.futures.DataFuture
parsl.config.Config
parsl.dataflow.futures.AppFuture
parsl.dataflow.dflow.DataFlowKernelLoader
parsl.data_provider.data_manager.DataManager
parsl.data_provider.files.File
parsl.executors.base.ParslExecutor
parsl.executors.threads.ThreadPoolExecutor
parsl.executors.ipp.IPyParallelExecutor
parsl.executors.ipp_controller.Controller
parsl.executors.swift_t.TurbineExecutor

.. autosummary::
:toctree: stubs
Expand Down
45 changes: 12 additions & 33 deletions docs/userguide/app_caching.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
.. _label-appcaching:

AppCaching
App caching
----------

When developing a workflow, developers often run the same workflow
with incremental changes over and over. Often large fragments of
a workflow will not have changed, yet apps will be executed again, wasting
valuable developer time and computation resources. ``AppCaching``
valuable developer time and computation resources. App caching
solves this problem by storing results from apps that have completed
so that they can be re-used. By default caching is **not** enabled.
It must be explicitly enabled, either globally via the configuration,
or on each app for which caching is desired.
so that they can be re-used. App caching can be enabled by setting the `cache`
argument to the :func:`~parsl.app.app` decorator to `True` (by default it is `False`). App caching
can be globally disabled by setting `app_cache=False` (which by default is `True`)
in the :class:`~parsl.config.Config`.

.. code-block:: python
Expand All @@ -19,17 +20,17 @@ or on each app for which caching is desired.
return 'echo {}'.format(msg)
AppCaching can be particularly useful when developing interactive workflows such as when
App caching can be particularly useful when developing interactive workflows such as when
using a Jupyter notebook. In this case, cells containing apps are often re-executed as
during development. Using AppCaching will ensure that only modified apps are re-executed.
during development. Using app caching will ensure that only modified apps are re-executed.

Caveats
^^^^^^^

It is important to consider several important issues when using AppCaching:
It is important to consider several important issues when using app caching:

- Determinism: AppCaching is generally useful only when the apps are deterministic.
If the outputs may be different for identical inputs, AppCaching will hide
- Determinism: App caching is generally useful only when the apps are deterministic.
If the outputs may be different for identical inputs, app caching will hide
this non-deterministic behavior. For instance, caching an app that returns
a random number will result in every invocation returning the same result.

Expand All @@ -38,30 +39,8 @@ It is important to consider several important issues when using AppCaching:
result is yet available. Once one such app completes and the result is cached
all subsequent calls will return immediately with the cached result.

- Performance: If AppCaching is enabled, there is likely to be some performance
- Performance: If app caching is enabled, there is likely to be some performance
overhead especially if a large number of short duration tasks are launched rapidly.

.. note::
The performance penalty has not yet been quantified.


Configuration
^^^^^^^^^^^

AppCaching may be disabled globally in the configuration. If the
``appCache`` is set to ``False`` all AppCaching is disabled.
By default the global ``appCache`` is **enabled**; however, AppCaching for each
app is disabled by default. Thus, users must explicitly enable AppCaching
on each app.

AppCaching can be disabled globally in the config as follows:

.. code-block:: python
config = {
"sites": [{ ... }],
"globals": {
"appCache": False # <-- Disable AppCaching globally
}
dfk = DataFlowKernel(config=config)
Loading

0 comments on commit b37716e

Please sign in to comment.