Liu 128 #79

pritchardn · 2021-11-03T06:10:17Z

Full 'end-to-end' multiprocessing tests have been skipped on machines with < 4 (logical) cores.
The added components of a shared-memory manager and process wrapper are still present.
I now repeat the message written in preparation for the pull request:

The summary

O-boy this sure was more than a 2-line fix. The overall idea remains to execute each drop in a separate process provided by Python.multiprocessing and to replace the memory-io InMemory drops with a new shared-memory-io implementation but the devil is in the details as with all things.
We use a (slightly) custom Multiprocessing.processsub-class and implementation of shared memory to allow for named shared memory blocks
In its current form this addition breaks Python 3.7 support (must be >=3.8) and well and truly breaks windows support (would need to implement shared-memory for Windows). Import guards are in place to continue supporting Python 3.7

Changes

Drops now re-compute their checksum if it does not exist but the drop has COMPLETED. Shared memory drops present themselves to the main process as if they were written externally
Drop Proxys now have their own context, rather than treating the node-manager itself as the RPC endpoint. While this consumes many more resources than is truly necessary, this is the fastest way (to me at least) to implement this functionality.
The -t flag for node managers has changed behaviour:
- 0 (default) - Single process - multi-threading (what was there before)
- 1 - Maximum processes - multi-threading (maximal performance)
- 1 - Single process specifically - equivalent to 0 (to avoid overhead)
- n - Use at most n-processes
- The decision to make the old behaviour the default is to avoid ‘breaking’ any sort of deployment, while tested, it feels judicious to opt-in to multithreading at first until this is shown to be robust in practice
MemoryDrops now check for the existence of a thread-pool argument, which if present uses a shared_memory io handle instead of the normal handle. There is no ‘shared-memory drop' explicitly as such.

Additions

process.py implements DlgProcess - a lightweight wrapper around Multiprocessing.process. The only major difference is that a pipe is opened between the parent and child process allowing exceptions to be passed back to the node manager for handling.
- This offers an opportunity to hand other information up to the manager if needed later.
shared_memory.py - (re)-implements the shared-memory primitive from Multiprocessing to allow for named shared memory in addition to catching some peculiar edge-cases this invites (files that exist in one process but not another for instance)
io.py - Adds a shared-memory IO module that attempts to ensure a minimally sized shared memory chunk
shared_memory_manager.py - A small module that handles the registration and unlinking of shared memory blocks on a per-session basis.
Currently, there could be a name-collision for drops with the same oid and the same session name (only a problem if setting the session id explicitly).
- Multiple managers referring to the same shared memory block will aggressively unlink the same location, this is a caught exception and is reported as a warning
Some mention of this feature in the documentation, introduction.rst, graphs.rst and managers.rst specifically
All of these features are tested separately, the test_dm suite has been duplicated for parallel testing - mileage seems to vary on Travis but works locally

I hope this provides a decent summary, and I am open to making further changes, but for now, this seems to suffice.

…f CPU time appending random numbers to a list. Useful for performance benchmarking.

Otherwise drops run within their own thread (presumably within a single core).

…ory module.

…ry manager.

…unctional and efficient (although resizing is not quite working yet)

Makes test_speedup actually test the speedup and increases test size.

… test handle

…lization.

… separate files. DlgSharedMemoryManager now handles duplicate and unregistered session requests.

- Creating random files - Handling multiple unlinks

- Creating random files - Handling multiple unlinks (warns) Adds tests for DlgSharedMemory

… this behaviour later)

… - gives a tighter bound on size.

…ng v as an array of bools

…n produce a checksum.

…angers may still need them). This functionality is moved to 'destroy_session' and is triggered when the memory manager itself is shutdown.

…ously found, then not found when opening

…r something else.

… 3.8. Improves formatting of new code-files.

# Conflicts: # daliuge-engine/dlg/apps/simple.py # docs/architecture/dataflow.rst # docs/architecture/index.rst # docs/deployment.rst # docs/development/data_development.rst # docs/development/dev_index.rst # docs/index.rst # docs/intro.rst

awicenec · 2021-11-03T15:11:53Z

Can you resolve the merge conflicts with master first, please?

# Conflicts: # daliuge-engine/dlg/apps/crc.py # daliuge-engine/dlg/drop.py # daliuge-engine/dlg/io.py # daliuge-engine/test/apps/test_simple.py # daliuge-engine/test/apps/test_socket.py # daliuge-engine/test/manager/test_dm.py # daliuge-engine/test/test_drop.py # docs/intro.rst

coveralls · 2021-11-04T02:31:47Z

Coverage decreased (-0.3%) to 78.004% when pulling c8d7f15 on LIU-128 into 7fa67ff on master.

# Conflicts: # daliuge-engine/dlg/apps/simple.py # daliuge-engine/dlg/drop.py # daliuge-engine/dlg/droputils.py # daliuge-engine/dlg/io.py # daliuge-engine/dlg/manager/cmdline.py # daliuge-engine/dlg/manager/node_manager.py # daliuge-engine/dlg/rpc.py # daliuge-engine/test/apps/test_crc.py # daliuge-engine/test/apps/test_simple.py # daliuge-engine/test/manager/test_dm.py

Liu 128

pritchardn and others added 30 commits August 26, 2021 15:33

Adds ListAppendThrashingApp - a toy BarrierAppDROP that wastes lots o…

a92d837

…f CPU time appending random numbers to a list. Useful for performance benchmarking.

Drops spawn a process when executed within a thread-pool.

6384139

Otherwise drops run within their own thread (presumably within a single core).

Adds a test for ListAppendThrashingApp

8ef60e0

Added parallel test hooks

23a2e88

Reverted default back to no threads, else dask_emulation fails

06e3bd0

fixed hugegraph test

098eb4c

multiprocessing graphs only work with FileDROPs right now

2fcc18b

Merge branch 'liu-174' into LIU-128

c62f950

Fixed crc32 deprecation warnings

ee8ce54

Fixed doc tree

ae24061

Adds a basic, technically functioning but non-deterministic sharedMem…

3c9a435

…ory module.

Adds a deterministic, technically functional but still incorrect memo…

e3e0a65

…ry manager.

Re-implements shared memory to write directly to posix_shm. This is f…

a32edb8

…unctional and efficient (although resizing is not quite working yet)

Makes generateArray an N^2 operation to show speedup more clearly.

fcea8d7

When opening an existing shmem location, finds and sets size correctly.

50d324e

Minor readibility fix for test_multi_listappendthrashing.

b143b0c

Makes test_speedup actually test the speedup and increases test size.

Condenses _mm attribute for sharedmemory into the single _tp attribute.

a7232b0

Adds session_id information to shared memory drops along with updated…

8fc06c3

… test handle

Adds (untested) addition to NodeManager for shared memory drop initia…

b6466e1

…lization.

separates shared memory manager and shared memory implementation into…

dc23202

… separate files. DlgSharedMemoryManager now handles duplicate and unregistered session requests.

Implementation cleanup for shared_memory_manager.py

02b1fc4

Implementation cleanup for shared_memory.py

0d87648

Shared memory handles shrinking a block.

991c338

shared_memory.py now supports:

0a6c676

- Creating random files - Handling multiple unlinks

shared_memory.py now supports:

5960def

- Creating random files - Handling multiple unlinks (warns) Adds tests for DlgSharedMemory

Shared Memory manager now shuts down when deleted. (Consider changing…

2ccdb94

… this behaviour later)

Adds tests for the DlgSharedMemoryManager

e9ea24d

Adds a DlgProcess class to setup error listeners via pipes

fbda5bc

DropProxys now make their own client to handle multiprocessing.

6905a79

SharedMemory now truncates shared memory blocks aggressively on write…

61b6a87

… - gives a tighter bound on size.

pritchardn added 16 commits October 27, 2021 14:50

Fixes test_averagearraysapp whose behaviour does not work when treati…

bdeb99b

…ng v as an array of bools

dropWrote from Outside now assumes that an externally written drop ca…

1d493e3

…n produce a checksum.

Fixes dynamic checksum generation to deal with bytes arrays too.

671f6b4

Fixes test_averagearraysapp to compare only a single average.

1e9f5a1

Changes shared-memory manager behaviour to only close blocks (other m…

6b26963

…angers may still need them). This functionality is moved to 'destroy_session' and is triggered when the memory manager itself is shutdown.

Catches the most esquisite edge-case where shared files are simultane…

e305ba5

…ously found, then not found when opening

Adds a first-cut of documentation for the shared memory feature.

73f28dc

Re-orders tests to establish if tests-fail due to networking issues o…

058e4b3

…r something else.

Adds import guarding for Python < 3.8 - _posixshm did not exist until…

8db4ddd

… 3.8. Improves formatting of new code-files.

Merge branch 'master' into LIU-128

4598d33

# Conflicts: # daliuge-engine/dlg/apps/simple.py # docs/architecture/dataflow.rst # docs/architecture/index.rst # docs/deployment.rst # docs/development/data_development.rst # docs/development/dev_index.rst # docs/index.rst # docs/intro.rst

Removes unused .rst file.

0ba3d87

Makes session ids random in test_dm.py

1a2a260

Tests serial dm first, then parallel.

cfae921

Removes pesky tests (investigating if error cleanup is causing issue).

90678bd

Skips parallel error tests (to investigate if these are the true cause).

705a19e

Conditionally skips multiprocessing tests.

3f4ac1f

pritchardn added 2 commits November 4, 2021 10:04

Continued merging process.

310b431

awicenec requested a review from rtobar November 16, 2021 06:53

awicenec requested review from awicenec and removed request for rtobar December 9, 2021 02:14

pritchardn added 2 commits December 9, 2021 17:39

Changes crc32c to crc32

c8d7f15

awicenec approved these changes Dec 13, 2021

View reviewed changes

awicenec merged commit c9ee801 into master Dec 13, 2021

awicenec deleted the LIU-128 branch December 13, 2021 15:16

awicenec added a commit that referenced this pull request May 19, 2022

Merge pull request #79 from ICRAR/LIU-128

481814c

Liu 128

pritchardn pushed a commit that referenced this pull request May 20, 2022

Merge pull request #79 from ICRAR/LIU-128

bfd3197

Liu 128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liu 128 #79

Liu 128 #79

pritchardn commented Nov 3, 2021

awicenec commented Nov 3, 2021

coveralls commented Nov 4, 2021 •

edited

Loading

Liu 128 #79

Liu 128 #79

Conversation

pritchardn commented Nov 3, 2021

awicenec commented Nov 3, 2021

coveralls commented Nov 4, 2021 • edited Loading

coveralls commented Nov 4, 2021 •

edited

Loading