Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liu 128 #79

Merged
merged 63 commits into from
Dec 13, 2021
Merged

Liu 128 #79

merged 63 commits into from
Dec 13, 2021

Conversation

pritchardn
Copy link
Collaborator

Full 'end-to-end' multiprocessing tests have been skipped on machines with < 4 (logical) cores.
The added components of a shared-memory manager and process wrapper are still present.
I now repeat the message written in preparation for the pull request:

The summary

  • O-boy this sure was more than a 2-line fix. The overall idea remains to execute each drop in a separate process provided by Python.multiprocessing and to replace the memory-io InMemory drops with a new shared-memory-io implementation but the devil is in the details as with all things.

  • We use a (slightly) custom Multiprocessing.processsub-class and implementation of shared memory to allow for named shared memory blocks

  • In its current form this addition breaks Python 3.7 support (must be >=3.8) and well and truly breaks windows support (would need to implement shared-memory for Windows). Import guards are in place to continue supporting Python 3.7

Changes

  • Drops now re-compute their checksum if it does not exist but the drop has COMPLETED. Shared memory drops present themselves to the main process as if they were written externally

  • Drop Proxys now have their own context, rather than treating the node-manager itself as the RPC endpoint. While this consumes many more resources than is truly necessary, this is the fastest way (to me at least) to implement this functionality.

  • The -t flag for node managers has changed behaviour:

    • 0 (default) - Single process - multi-threading (what was there before)
    • 1 - Maximum processes - multi-threading (maximal performance)
    • 1 - Single process specifically - equivalent to 0 (to avoid overhead)
    • n - Use at most n-processes
    • The decision to make the old behaviour the default is to avoid ‘breaking’ any sort of deployment, while tested, it feels judicious to opt-in to multithreading at first until this is shown to be robust in practice
  • MemoryDrops now check for the existence of a thread-pool argument, which if present uses a shared_memory io handle instead of the normal handle. There is no ‘shared-memory drop' explicitly as such.

Additions

  • process.py implements DlgProcess - a lightweight wrapper around Multiprocessing.process. The only major difference is that a pipe is opened between the parent and child process allowing exceptions to be passed back to the node manager for handling.

    • This offers an opportunity to hand other information up to the manager if needed later.
  • shared_memory.py - (re)-implements the shared-memory primitive from Multiprocessing to allow for named shared memory in addition to catching some peculiar edge-cases this invites (files that exist in one process but not another for instance)

  • io.py - Adds a shared-memory IO module that attempts to ensure a minimally sized shared memory chunk

  • shared_memory_manager.py - A small module that handles the registration and unlinking of shared memory blocks on a per-session basis.

  • Currently, there could be a name-collision for drops with the same oid and the same session name (only a problem if setting the session id explicitly).

    • Multiple managers referring to the same shared memory block will aggressively unlink the same location, this is a caught exception and is reported as a warning
  • Some mention of this feature in the documentation, introduction.rst, graphs.rst and managers.rst specifically

  • All of these features are tested separately, the test_dm suite has been duplicated for parallel testing - mileage seems to vary on Travis but works locally

I hope this provides a decent summary, and I am open to making further changes, but for now, this seems to suffice.

pritchardn and others added 30 commits August 26, 2021 15:33
…f CPU time appending random numbers to a list.

Useful for performance benchmarking.
Otherwise drops run within their own thread (presumably within a single core).
…unctional and efficient (although resizing is not quite working yet)
Makes test_speedup actually test the speedup and increases test size.
… separate files.

DlgSharedMemoryManager now handles duplicate and unregistered session requests.
- Creating random files
- Handling multiple unlinks
- Creating random files
- Handling multiple unlinks (warns)

Adds tests for DlgSharedMemory
…angers may still need them). This functionality is moved to 'destroy_session' and is triggered when the memory manager itself is shutdown.
… 3.8.

Improves formatting of new code-files.
# Conflicts:
#	daliuge-engine/dlg/apps/simple.py
#	docs/architecture/dataflow.rst
#	docs/architecture/index.rst
#	docs/deployment.rst
#	docs/development/data_development.rst
#	docs/development/dev_index.rst
#	docs/index.rst
#	docs/intro.rst
@awicenec
Copy link
Contributor

awicenec commented Nov 3, 2021

Can you resolve the merge conflicts with master first, please?

# Conflicts:
#	daliuge-engine/dlg/apps/crc.py
#	daliuge-engine/dlg/drop.py
#	daliuge-engine/dlg/io.py
#	daliuge-engine/test/apps/test_simple.py
#	daliuge-engine/test/apps/test_socket.py
#	daliuge-engine/test/manager/test_dm.py
#	daliuge-engine/test/test_drop.py
#	docs/intro.rst
@coveralls
Copy link

coveralls commented Nov 4, 2021

Coverage Status

Coverage decreased (-0.3%) to 78.004% when pulling c8d7f15 on LIU-128 into 7fa67ff on master.

@awicenec awicenec requested a review from rtobar November 16, 2021 06:53
@awicenec awicenec requested review from awicenec and removed request for rtobar December 9, 2021 02:14
# Conflicts:
#	daliuge-engine/dlg/apps/simple.py
#	daliuge-engine/dlg/drop.py
#	daliuge-engine/dlg/droputils.py
#	daliuge-engine/dlg/io.py
#	daliuge-engine/dlg/manager/cmdline.py
#	daliuge-engine/dlg/manager/node_manager.py
#	daliuge-engine/dlg/rpc.py
#	daliuge-engine/test/apps/test_crc.py
#	daliuge-engine/test/apps/test_simple.py
#	daliuge-engine/test/manager/test_dm.py
@awicenec awicenec merged commit c9ee801 into master Dec 13, 2021
@awicenec awicenec deleted the LIU-128 branch December 13, 2021 15:16
awicenec added a commit that referenced this pull request May 19, 2022
pritchardn pushed a commit that referenced this pull request May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants