Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding zero-copy support on the receiving end of the TCP and MPI parcel ports #6229

Merged
merged 5 commits into from May 7, 2023

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Apr 30, 2023

  • flyby: cleaning up and modernizing TCP parcel port

@JiakunYan this implements what we discussed recently by de-serializing received parcels once the chunk information is available. This de-serialization however does not assume that the chunk data has been received, but merely allocates the memory for the subsequent networking operations to place received chunk data directly into the internal memory buffers.

@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-04-30T23:21:12+00:00
HPX Commit6b6e1e75cd8a3e
Datetime2023-03-10T03:27:49.135034-06:002023-04-30T18:29:43.672129-05:00
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch-

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-04-30T23:21:12+00:00
HPX Commit6b6e1e75cd8a3e
Datetime2023-03-10T03:28:21.991297-06:002023-04-30T18:30:17.526947-05:00
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)(=)
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-04-30T23:21:12+00:00
HPX Commit6b6e1e75cd8a3e
Datetime2023-03-10T03:28:29.145749-06:002023-04-30T18:30:24.677302-05:00
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser hkaiser force-pushed the zero_copy_receive branch 2 times, most recently from 8d035fd to 2d78ba0 Compare May 1, 2023 13:18
@hkaiser hkaiser changed the title Adding zero-copy support on the receiving end of the TCP parcel port Adding zero-copy support on the receiving end of the TCP and MPI parcel ports May 1, 2023
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)+(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T15:42:47+00:00
HPX Commit6b6e1e7518834f
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Envfile
Datetime2023-03-10T03:27:49.135034-06:002023-05-01T10:49:34.013465-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch-

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T15:42:47+00:00
HPX Commit6b6e1e7518834f
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Envfile
Datetime2023-03-10T03:28:21.991297-06:002023-05-01T10:50:07.093227-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)(=)
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy=(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T15:42:47+00:00
HPX Commit6b6e1e7518834f
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Envfile
Datetime2023-03-10T03:28:29.145749-06:002023-05-01T10:50:14.394349-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@JiakunYan
Copy link
Contributor

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

@hkaiser
Copy link
Member Author

hkaiser commented May 1, 2023

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

This is a good suggestion. I will add a configuration variable for this.

@hkaiser hkaiser force-pushed the zero_copy_receive branch 4 times, most recently from bb8568d to f5d09f3 Compare May 1, 2023 22:56
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)?(=)

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Datetime2023-03-10T03:27:49.135034-06:002023-05-01T18:00:20.417841-05:00
Clusternamerostamrostam
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch-

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Datetime2023-03-10T03:28:21.991297-06:002023-05-01T18:00:53.523552-05:00
Clusternamerostamrostam
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)=(=)
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Datetime2023-03-10T03:28:29.145749-06:002023-05-01T18:01:00.670255-05:00
Clusternamerostamrostam
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser
Copy link
Member Author

hkaiser commented May 1, 2023

Do you think it is better to add a command line option controlling this zero-copy behavior so we can figure out how much performance improvement we get from this optimization?

This is a good suggestion. I will add a configuration variable for this.

This will require some acrobatics as the two versions rely on different types representing the chunking data (the old uses a std::vector<char>, the new uses serialization::serialization_chunk). I'll see if using a variant<> isn't making things unwieldy.

@JiakunYan
Copy link
Contributor

JiakunYan commented May 1, 2023

This will require some acrobatics as the two versions rely on different types representing the chunking data (the old uses a std::vector, the new uses serialization::serialization_chunk). I'll see if using a variant<> isn't making things unwieldy.

variant<> might complicate things. I think you can use chunk_data for both cases. I guess you can just create a chunk_data from a vector<char>?
I can do this later once you merge this PR.

@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)?(=)

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Clusternamerostamrostam
Datetime2023-03-10T03:27:49.135034-06:002023-05-01T18:22:41.027888-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch-

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Clusternamerostamrostam
Datetime2023-03-10T03:28:21.991297-06:002023-05-01T18:23:13.760030-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)=
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Commit6b6e1e79f097d8
HPX Datetime2023-03-06T15:59:25+00:002023-05-01T22:56:03+00:00
Clusternamerostamrostam
Datetime2023-03-10T03:28:29.145749-06:002023-05-01T18:23:20.915726-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser
Copy link
Member Author

hkaiser commented May 2, 2023

@JiakunYan this now supports the --hpx:ini=hpx.parcel.zero_copy_receive_optimization=0 key to disable the new zero-copy support on the receiving end (default is 1)

- flyby: cleaning up and modernizing TCP parcel port
- flyby: HPX_ASSERT_MSG() now takes arbitrary number of arguments to pass
  values to hpx::util::format
- this can be used to disable the zero-copy serialization on the receiving
  end, this option is enabled by default
- flyby: modernize parcelset code
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)-(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-05T19:52:47+00:00
HPX Commit6b6e1e766ebd6a
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Datetime2023-03-10T03:27:49.135034-06:002023-05-05T15:02:32.847737-05:00

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch-

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-05T19:52:47+00:00
HPX Commit6b6e1e766ebd6a
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Datetime2023-03-10T03:28:21.991297-06:002023-05-05T15:03:06.593010-05:00

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)(=)
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-03-06T15:59:25+00:002023-05-05T19:52:47+00:00
HPX Commit6b6e1e766ebd6a
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam
Datetime2023-03-10T03:28:29.145749-06:002023-05-05T15:03:13.757004-05:00

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser
Copy link
Member Author

hkaiser commented May 7, 2023

bors merge

@bors
Copy link

bors bot commented May 7, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit e82d578 into master May 7, 2023
63 of 69 checks passed
@bors bors bot deleted the zero_copy_receive branch May 7, 2023 01:37
@hkaiser hkaiser modified the milestones: 1.10.0, 1.9.1 May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants