Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infiniband parcelport #2419

Merged
merged 177 commits into from Jan 17, 2017
Merged

Infiniband parcelport #2419

merged 177 commits into from Jan 17, 2017

Conversation

biddisco
Copy link
Contributor

@biddisco biddisco commented Dec 4, 2016

Still got some problems with multiple nodes connecting to each other, but want to start the process of testing and getting ready for a merge into master.

This branch should fix #1401 and #840

JAN - 2017 : I have updated the PR with new connection code and a bunch of other improvements. - it should be 'good enough' for production use now, at least on smallish numbers of nodes, it has been tested up to 128 successfully.

I will now address the original review comments and fix style/inspect report fails.

No actual parcel sending takes place yet.

Lots of debug material is present in here which will need to be cleaned up
Static compilation of an external plugin requires certain
parameters to be passed back to the root cmake, this is a
work in progress and should not be used 'as is'
Interface for rdmahelper memory_pool now compiling and
basic connection between two nodes ok.
Add a simple class that wraps timers and helps break down time
spent in sections of code. This is intended for a quick look when
full profiling is not convenient.
Encode header in one registered chunk. Main serialization in next
if it fits in standard chunk. Additional chunks are not sent
but mem handles are and get will be used on remote end (not implemented)
…mory pool to keep regions alive

Parcelport working as long as all data is piggybacked.
zero_copy rdma get not yet implemented.
… container taking our pointer

In background_work always repeat event loop check until nothing is received
…m destination

Some changes to header and decode parcels to improve performance of the verbs
parcelport. Serialization chunks can store an rdma key and the decode parcels
functin can be entered with the chunk info already supplied.
This prevents a zerocopy chunk being reused later with a size larger than
that allocated and causing rdma protection errors
Add callback handler to memory_pointer_wrapper and do not release blocks
that have been given to the decode function.

Lots of cleanups to the RDMA Get for zero_copy chunks.

Fix receive refill counter - zero-byte messages also use a receive up.
…o empty

When flushed messages are received, count down the preposted buffers until
zero, then set the client state to terminated so we can clean up all
clients when parcelport stop is called.
biddisco and others added 21 commits December 30, 2016 16:22
Shared Receive Queue (SRQ) simplifies handling of memory for receive
queues on many nodes and reduces the amount of polling required.

Redesign connection code to handle rejection/abort of connection
when a race between two nodes connecting to each other happens.
By maintaining two maps (one for outgoing, one for incoming) connection
requests, we can more easily check for and remove aborted connections.

Flyby:
Lots of cleanup to debugging comments
Fix hostname reported by the PP
Remove qp from the locality type (it was unused)
Store connections by ip address in main PP code to avoid a second map lookup
  and modify new connection callback accordingly.
Fix some CMake option settings to use better defaults and clean up docs.
    - Fixing compiler errors in pinned_memory_vector:
        Using pointer as iterator
    - Fixing segfault at shutdown when not running with ibverbs
    - Fixing double free of verbs event channel
    - Fixing UB with rdma_event_type
@biddisco
Copy link
Contributor Author

This passed the checks. If it's ever going to be merged - now's the time! (I think most of the review comments were addressed).

@hkaiser hkaiser merged commit 0cb47e2 into master Jan 17, 2017
@hkaiser hkaiser deleted the verbs_rebased branch January 17, 2017 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need an efficient infiniband parcelport
3 participants