New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infiniband parcelport #2419
Merged
Merged
Infiniband parcelport #2419
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
No actual parcel sending takes place yet. Lots of debug material is present in here which will need to be cleaned up
Static compilation of an external plugin requires certain parameters to be passed back to the root cmake, this is a work in progress and should not be used 'as is'
Interface for rdmahelper memory_pool now compiling and basic connection between two nodes ok.
Add a simple class that wraps timers and helps break down time spent in sections of code. This is intended for a quick look when full profiling is not convenient.
Encode header in one registered chunk. Main serialization in next if it fits in standard chunk. Additional chunks are not sent but mem handles are and get will be used on remote end (not implemented)
…er without it being released
…mory pool to keep regions alive Parcelport working as long as all data is piggybacked. zero_copy rdma get not yet implemented.
… container taking our pointer In background_work always repeat event loop check until nothing is received
…e function with zero copy
…m destination Some changes to header and decode parcels to improve performance of the verbs parcelport. Serialization chunks can store an rdma key and the decode parcels functin can be entered with the chunk info already supplied.
…e message function
This prevents a zerocopy chunk being reused later with a size larger than that allocated and causing rdma protection errors
Add callback handler to memory_pointer_wrapper and do not release blocks that have been given to the decode function. Lots of cleanups to the RDMA Get for zero_copy chunks. Fix receive refill counter - zero-byte messages also use a receive up.
…o empty When flushed messages are received, count down the preposted buffers until zero, then set the client state to terminated so we can clean up all clients when parcelport stop is called.
biddisco
force-pushed
the
verbs_rebased
branch
from
December 23, 2016 07:08
234e88a
to
42deb80
Compare
biddisco
force-pushed
the
verbs_rebased
branch
from
December 23, 2016 10:44
42deb80
to
33db41d
Compare
Shared Receive Queue (SRQ) simplifies handling of memory for receive queues on many nodes and reduces the amount of polling required. Redesign connection code to handle rejection/abort of connection when a race between two nodes connecting to each other happens. By maintaining two maps (one for outgoing, one for incoming) connection requests, we can more easily check for and remove aborted connections. Flyby: Lots of cleanup to debugging comments Fix hostname reported by the PP Remove qp from the locality type (it was unused) Store connections by ip address in main PP code to avoid a second map lookup and modify new connection callback accordingly. Fix some CMake option settings to use better defaults and clean up docs.
…on used on 1 rank
- Fixing compiler errors in pinned_memory_vector: Using pointer as iterator - Fixing segfault at shutdown when not running with ibverbs - Fixing double free of verbs event channel - Fixing UB with rdma_event_type
… connection" This reverts commit 7cecab6.
This passed the checks. If it's ever going to be merged - now's the time! (I think most of the review comments were addressed). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Still got some problems with multiple nodes connecting to each other, but want to start the process of testing and getting ready for a merge into master.
This branch should fix #1401 and #840
JAN - 2017 : I have updated the PR with new connection code and a bunch of other improvements. - it should be 'good enough' for production use now, at least on smallish numbers of nodes, it has been tested up to 128 successfully.
I will now address the original review comments and fix style/inspect report fails.