Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Access Traits #74

Open
Nugine opened this issue Jun 3, 2022 · 5 comments
Open

New Access Traits #74

Nugine opened this issue Jun 3, 2022 · 5 comments

Comments

@Nugine
Copy link
Contributor

Nugine commented Jun 3, 2022

@Nugine
Copy link
Contributor Author

Nugine commented Jun 6, 2022

@GTwhy
Copy link
Collaborator

GTwhy commented Jun 10, 2022

Hey, that's a good article and ingenious design.
So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety?
We define three access traits here because even LocalMr needs rkey when it is used to send to remote end as a RemoteMr. That might seem a little strange.

@Nugine
Copy link
Contributor Author

Nugine commented Jun 10, 2022

So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety?

The value to be transfered is access value but not ownership value. If a value satisfies some access traits, it represents an access value.

We define three access traits here because even LocalMr needs rkey when it is used to send to remote end as a RemoteMr. That might seem a little strange.

We send a remote access value to the remote peer. A memory region can produce multiple read access values or single write access value.
For example, the rdma agent keep a memory region alive and send a "remote readable token" to the remote. Then it is not writeable at the local until the agent confirms that the remote token is destroyed.

@Nugine
Copy link
Contributor Author

Nugine commented Jun 10, 2022

I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.

@GTwhy
Copy link
Collaborator

GTwhy commented Jun 10, 2022

I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.

Yes, the timeout mechanism is unsound.
When using je, a timeout LocalMr is a part of a RawMr which will not be deregistered immdiately.
So the remote end can access the timeout remote mr if its timer is wrong. The software can not be aware of that and the NIC will not stop it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants