Adding zero-copy support on the receiving end of the TCP and MPI parcel ports #6229

hkaiser · 2023-04-30T21:24:11Z

flyby: cleaning up and modernizing TCP parcel port

@JiakunYan this implements what we discussed recently by de-serializing received parcels once the chunk information is available. This de-serialization however does not assume that the chunk data has been received, but merely allocates the memory for the subsequent networking operations to place received chunk data directly into the internal memory buffers.

StellarBot · 2023-04-30T23:30:56Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-04-30T23:21:12+00:00
HPX Commit	`6b6e1e7`	`5cd8a3e`
Datetime	2023-03-10T03:27:49.135034-06:00	2023-04-30T18:29:43.672129-05:00
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	-

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-04-30T23:21:12+00:00
HPX Commit	`6b6e1e7`	`5cd8a3e`
Datetime	2023-03-10T03:28:21.991297-06:00	2023-04-30T18:30:17.526947-05:00
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	(=)
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-04-30T23:21:12+00:00
HPX Commit	`6b6e1e7`	`5cd8a3e`
Datetime	2023-03-10T03:28:29.145749-06:00	2023-04-30T18:30:24.677302-05:00
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

StellarBot · 2023-05-01T15:50:47Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	+	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T15:42:47+00:00
HPX Commit	`6b6e1e7`	`518834f`
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Envfile
Datetime	2023-03-10T03:27:49.135034-06:00	2023-05-01T10:49:34.013465-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	-

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T15:42:47+00:00
HPX Commit	`6b6e1e7`	`518834f`
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Envfile
Datetime	2023-03-10T03:28:21.991297-06:00	2023-05-01T10:50:07.093227-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	(=)
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	=	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T15:42:47+00:00
HPX Commit	`6b6e1e7`	`518834f`
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Envfile
Datetime	2023-03-10T03:28:29.145749-06:00	2023-05-01T10:50:14.394349-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver_connection.hpp

JiakunYan · 2023-05-01T21:39:41Z

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

hkaiser · 2023-05-01T22:34:06Z

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

This is a good suggestion. I will add a configuration variable for this.

StellarBot · 2023-05-01T23:01:31Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	?	(=)

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Datetime	2023-03-10T03:27:49.135034-06:00	2023-05-01T18:00:20.417841-05:00
Clustername	rostam	rostam
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	-

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Datetime	2023-03-10T03:28:21.991297-06:00	2023-05-01T18:00:53.523552-05:00
Clustername	rostam	rostam
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	=	(=)
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Datetime	2023-03-10T03:28:29.145749-06:00	2023-05-01T18:01:00.670255-05:00
Clustername	rostam	rostam
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser · 2023-05-01T23:04:13Z

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

This is a good suggestion. I will add a configuration variable for this.

This will require some acrobatics as the two versions rely on different types representing the chunking data (the old uses a std::vector<char>, the new uses serialization::serialization_chunk). I'll see if using a variant<> isn't making things unwieldy.

JiakunYan · 2023-05-01T23:12:55Z

This will require some acrobatics as the two versions rely on different types representing the chunking data (the old uses a std::vector, the new uses serialization::serialization_chunk). I'll see if using a variant<> isn't making things unwieldy.

variant<> might complicate things. I think you can use chunk_data for both cases. I guess you can just create a chunk_data from a vector<char>?
I can do this later once you merge this PR.

StellarBot · 2023-05-01T23:23:47Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	?	(=)

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Clustername	rostam	rostam
Datetime	2023-03-10T03:27:49.135034-06:00	2023-05-01T18:22:41.027888-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	-

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Clustername	rostam	rostam
Datetime	2023-03-10T03:28:21.991297-06:00	2023-05-01T18:23:13.760030-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	=
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Commit	`6b6e1e7`	`9f097d8`
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-01T22:56:03+00:00
Clustername	rostam	rostam
Datetime	2023-03-10T03:28:29.145749-06:00	2023-05-01T18:23:20.915726-05:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser · 2023-05-02T20:33:22Z

@JiakunYan this now supports the --hpx:ini=hpx.parcel.zero_copy_receive_optimization=0 key to disable the new zero-copy support on the receiving end (default is 1)

docs/sphinx/manual/launching_and_configuring_hpx_applications.rst

- flyby: cleaning up and modernizing TCP parcel port

- flyby: HPX_ASSERT_MSG() now takes arbitrary number of arguments to pass values to hpx::util::format

- this can be used to disable the zero-copy serialization on the receiving end, this option is enabled by default - flyby: modernize parcelset code

StellarBot · 2023-05-05T20:03:41Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	-	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-05T19:52:47+00:00
HPX Commit	`6b6e1e7`	`66ebd6a`
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Datetime	2023-03-10T03:27:49.135034-06:00	2023-05-05T15:02:32.847737-05:00

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	-

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-05T19:52:47+00:00
HPX Commit	`6b6e1e7`	`66ebd6a`
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Datetime	2023-03-10T03:28:21.991297-06:00	2023-05-05T15:03:06.593010-05:00

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	(=)
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-03-06T15:59:25+00:00	2023-05-05T19:52:47+00:00
HPX Commit	`6b6e1e7`	`66ebd6a`
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam
Datetime	2023-03-10T03:28:29.145749-06:00	2023-05-05T15:03:13.757004-05:00

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser · 2023-05-07T01:17:48Z

bors merge

bors · 2023-05-07T01:37:32Z

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

Bors

hkaiser added type: enhancement category: parcel transport type: compatibility issue labels Apr 30, 2023

hkaiser added this to the 1.10.0 milestone Apr 30, 2023

hkaiser requested review from aurianer, msimberg and biddisco as code owners April 30, 2023 21:24

hkaiser force-pushed the zero_copy_receive branch from 49f82a4 to 71f8ebd Compare April 30, 2023 23:21

hkaiser force-pushed the zero_copy_receive branch 2 times, most recently from 8d035fd to 2d78ba0 Compare May 1, 2023 13:18

hkaiser changed the title ~~Adding zero-copy support on the receiving end of the TCP parcel port~~ Adding zero-copy support on the receiving end of the TCP and MPI parcel ports May 1, 2023

hkaiser force-pushed the zero_copy_receive branch from df446f3 to 7b6a400 Compare May 1, 2023 15:42

hkaiser force-pushed the zero_copy_receive branch from 7b6a400 to bb04eac Compare May 1, 2023 15:55

JiakunYan reviewed May 1, 2023

View reviewed changes

libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver_connection.hpp Outdated Show resolved Hide resolved

hkaiser force-pushed the zero_copy_receive branch 4 times, most recently from bb8568d to f5d09f3 Compare May 1, 2023 22:56

JiakunYan reviewed May 3, 2023

View reviewed changes

docs/sphinx/manual/launching_and_configuring_hpx_applications.rst Show resolved Hide resolved

hkaiser force-pushed the zero_copy_receive branch from 60a9b36 to ca3008c Compare May 3, 2023 13:42

hkaiser mentioned this pull request May 5, 2023

Improve MPI initialization #5910

Closed

hkaiser force-pushed the zero_copy_receive branch from 6c77abc to b9dacf1 Compare May 5, 2023 19:32

hkaiser added 4 commits May 5, 2023 14:52

Adding zero-copy support on the receiving end of the TCP parcel port

0ec20f4

- flyby: cleaning up and modernizing TCP parcel port

Adding support for MPI parcel-port

1fe0c46

- flyby: HPX_ASSERT_MSG() now takes arbitrary number of arguments to pass values to hpx::util::format

Adding hpx.parcels.zero_copy_receive_optimization configuration key

97c660d

- this can be used to disable the zero-copy serialization on the receiving end, this option is enabled by default - flyby: modernize parcelset code

Merge changes from #5910

96db412

hkaiser force-pushed the zero_copy_receive branch from b9dacf1 to 96db412 Compare May 5, 2023 19:52

Fixing use-after-move

058c8b7

bors bot merged commit e82d578 into master May 7, 2023
63 of 69 checks passed

bors bot deleted the zero_copy_receive branch May 7, 2023 01:37

hkaiser modified the milestones: 1.10.0, 1.9.1 May 13, 2023

hkaiser mentioned this pull request May 13, 2023

PRs to be merged for v1.9.1 point release #6244

Closed

33 tasks

Adding zero-copy support on the receiving end of the TCP and MPI parcel ports #6229

Adding zero-copy support on the receiving end of the TCP and MPI parcel ports #6229

Conversation

hkaiser commented Apr 30, 2023

StellarBot commented Apr 30, 2023

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

StellarBot commented May 1, 2023

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

JiakunYan commented May 1, 2023

hkaiser commented May 1, 2023

StellarBot commented May 1, 2023

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hkaiser commented May 1, 2023 • edited

JiakunYan commented May 1, 2023 • edited

StellarBot commented May 1, 2023

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hkaiser commented May 2, 2023 • edited

StellarBot commented May 5, 2023

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hkaiser commented May 7, 2023

bors bot commented May 7, 2023

hkaiser commented May 1, 2023 •

edited

JiakunYan commented May 1, 2023 •

edited

hkaiser commented May 2, 2023 •

edited