-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
READY: Latest TrustChain block via DHT #349
Conversation
2e60cd2
to
917e854
Compare
I have a couple of questions regarding an aspect of the implementation. In order to have the latest TC block published to the DHT, as soon as it gets added to the database, I added a callback named
|
@DanGraur You are right that I would try to avoid adding another type of callback if possible and use |
I would also vote for making use of |
Ok, then I'll make the required changes to make use of the |
Ok, I removed the explicit callback and instead added a wildcard to the |
In this latest commit, I enhanced the latest TC block via DHT publication mechanism, such that a block which is already in the DHT will not be published again under a different (embedded) version. This will increase the cost of adding a block to the DHT (due to the extra overhead of checking for a duplicate block), but will decrease the load on the DHT. |
I've added the blank line at the end of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, make sure to still rebase and fix my two comments.
Also, make sure python create_test_coverage_report.py
still works.
f7d26d2
to
2c3684b
Compare
Added the extra blank line at the end of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Should there be an additional check that a peer can only publish its latest TC block only under its own public key and not on somebody else key? Similarly, validation of the received blocks under the DHT key. Other than that, seems good. |
Sorry for the late response. @xoriole I think adding the validation should be easy enough, but for ensuring that a peer will only published by a node under its own PK, I'm not particularly sure what strategy to employ. Also, it should be mentioned that currently, the TrustChain block is published under the peer's |
@DanGraur Some comments regarding the DHT. The DHT uses 20-byte keys, so I'd definitely stick to using mid's. Currently, any peer can store information under whatever key they want. I'm not sure about changing this since other features rely on the DHT as well. The DHT does offer the option to sign values before storing them in the DHT. When this feature is used, and the signed value is stored under the mid of the signing peer, it will get a higher priority when returning to value (i.e., it will always be at the top of the results list). However, using this feature only one signed value can be stored under a particular key at any given time. Since you're splitting the block over multiple DHT entries, this feature can't be used. Is there a reason to prefer storing the entire block instead of, for instance, just the SHA-1? |
I think it is still possible to make use of the PK, since the code I've written produces an SHA-1 hash of the raw key (which was previously the I've implemented the code to publish the entire block (in chunks), since that was my initial understanding of the task, i.e. the requesting peer wants to get the block itself and reconstruct it via the DHT. I think a final goal of this task (not of this particular PR, though), is to make it possible to get the entire TrustChain via the DHT, although this, in my opinion, will impose quite a lot of strain on the DHT, due to the possibly large size of the TrustChain. I think, technically, my implementation can to a certain extent do that (or at least get the last few published blocks, which haven't been discarded by the DHT due to staleness). That being said some more code needs to be written to get the last few blocks, and not just the last one. As far as the mechanism for signing multiple published blocks goes, I've started writing (and I think I'm nearly done) some code which signs the block's chunks using the publishing node's private key. The retrieving node, will then use the public key of the publishing node, to ensure the chunks are faithful. This is why I'm starting to think that the public key might also be a good idea for both publishing and getting the block from the DHT, since it's use is two fold: for computing the key, and validating the block's chunks. Also, this validation mechanism also solves the issue of ensuring the block is indeed published by the true node, since the true node will be the only one with that particular SK. What might be an issue is that this won't stop a malicious node from still publishing fake TrustChain blocks under some other peer's public key. These blocks won't be valid, but they will incur some (spatial) strain on the DHT. This is however a general problem, I think, since it is possible to currently publish data under any key, without many restrictions. Perhaps a future task would be to allow a node to publish data to the DHT under a specific key, which will then be 'restricted' just to it, i.e. other nodes can get data under that key, but not publish under it. This solution could be general though, and not just for this use case. |
I didn't realize by PK you meant the hash of the public key. You're right, you can use any 20 byte key you'd like. Publishing the latest block on the DHT is likely to already introduce a significant train on the DHT, since the latest block will change frequently. Having said that, the mere idea of having the complete chain on the DHT terrifies me. I agree, signing the block-pieces yourself is the best way to do this at the moment. Your idea of restricting write access to certain keys has a similar effect to that of the signed-value feature already in place (i.e., you restrict write access to the first result). |
@egbertbouman as you are the author of the DHT community, does this PR need any changes before merging? |
I still have to push a new commit which adds validation to the chunks of the block being published to the DHT. Also, I wanted to mention this earlier, but I added a method in the test for this PR which expands the I still have to fix another test, since the changes I made by adding verification, and using the PK instead of the mid meant that some of the tests would break. I'll push this commit later tonight hopefully. |
Everything looks good! One thing that maybe we could add to the DHT in the future is the ability to store multiple values at once. This would avoid superfluous find/store-requests, and probably also stop the need to change the |
Sorry for the long delay. I pushed a new commit which does the following:
|
@devos50 could you review this? (Specifically the TrustChain changes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be sure to fix the merge conflict still.
@DanGraur I just noticed that you're not doing DHT requests, but you're just looking at the local storage: py-ipv8/ipv8/REST/dht_endpoint.py Line 154 in c52cbea
Assuming that this is unintentional, you should be using find_values. See SpecificDHTValueEndpoint for an example of how to use that function. |
…he DHT mechanism. Added a wildcard listener in the listener_map (of the TrustChainCommunity class) whose associated listeners are called regardless of block type. This wildcard is used to publish newly added to the TrustChainDB blocks to the DHT. The callback for this mechanism also defines behavior which disallows the same block from being published twice to the DHT. It is now possible to retrieve the latest TrustChain block of some peer via a HTTP request. Created a new test set for the newly added block retrieval mechanism. In addition to this, made a few changes to the REST API test suite in terms of the communication modules, such as moving classes from one module to another and creating new modules which host logic for HTTP requests and communication for specific endpoints.
…he node's secret key. In addition to this, this signature will be used on the receiving end for verifying the faithfulness of the chunks and, implicitly, of the block. Chunks will also be published under a new type of payload class, called DHTBlockPayload. The public key will be used (together with a special suffix) for publishing a block to the DHT (it should be mentioned that the raw key will be hashed before publishing).
2b998c6
to
9ca7075
Compare
@egbertbouman thank you very much for noticing this! It was indeed a mistake. I fixed it now, and replaced it with |
…BlockPayload. Removed the dht.storage.get and replaced them with dht.find_values calls, which are handled asynchronously. Finally, made a few changes and additions to the test_dht_endpoint module.
Do you have updated Gumby graphs for this new |
As far as the Gumby experiments which I wrote and forwarded as PRs here and here, both of these were using the |
@DanGraur that's fine then, if only this PR had the |
Ok, I'll leave it as it is then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devos50 since this affects the TrustChain, I'll wait for your formal approval before merging this.
retest this please |
Fixes #369
Added a new endpoint for retrieving the latest TrustChain block via the DHT mechanism. Added a new callback in the TrustChainCommunity class which is called whenever a block is being added in the DB, irrelevant of the block's type. It is now possible to retrieve the latest TrustChain block of some peer via HTTP requests. Created a new test set for the newly added block retrieval mechanism. In addition to this, made a few changes to the REST API test suite in terms of the communication modules, such as moving classes from one module to another and creating new modules which host logic for HTTP requests and communication for specific endpoints.