Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefix lookup implementation #2

Open
wants to merge 41 commits into
base: noot/double-hash
Choose a base branch
from
Open

Conversation

noot
Copy link

@noot noot commented Sep 13, 2022

  • add prefixLength as a config option to the DHT
  • lookups: if prefixLength is set, only lookup prefix of hash(CID)
  • queried nodes: if the query key is a prefix (ie. <32 bytes), look up and return providers via prefix
  • also return the full key the provider providers, so the lookup node knows what peers to discard
  • had to modify the Message_Peer message type to include the key the peer provides (maybe there's a nicer way to do this?)

@noot noot changed the title [wip] prefix lookup implementation prefix lookup implementation Sep 20, 2022
@noot noot marked this pull request as ready for review September 20, 2022 10:07
Copy link

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@@ -317,6 +361,7 @@ func (pm *ProviderManager) getProviderSetForKey(ctx context.Context, k []byte) (
}

if len(pset.providers) > 0 {
// TODO: if this is a prefix, do we want to cache it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add only the target CID to the cache (if found), and not all content matching the prefix?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can change this to only cache full CIDs!

@@ -296,6 +319,27 @@ func (pm *ProviderManager) GetProviders(ctx context.Context, k []byte) ([]peer.A
}
}

// GetProvidersForPrefix returns the set of providers with the given prefix, as well
// as the full key they provide.
func (pm *ProviderManager) GetProvidersForPrefix(ctx context.Context, k []byte) (map[peer.ID][][]byte, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return map[peer.ID][][]byte here? Wouldn't it be better to return map[[]byte]peer.ID? In the end we only care about one provider record associated with the CID. If we have a map[peer.ID][][]byte we need to iterate on the full map.
How does it integrate with provider record decryption?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the map value is what keys with the prefix the peer has. it's possible the peer has multiple keys with the same prefix, so that's why it's [][]byte.

for provider record decryption, the peer.ID is actually encrypted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when requesting a specific key, we expect Pr[many peerID for one CID matching prefix] >> Pr[many CIDs matching prefix provided by the same peerID] as if a provider provides multiples CIDs, they will be stored at random locations in the DHT, hence we don't expect to have multiple hits for the same content provider for a single request (targeting a specific location).

I think that it would make more sense to have a map: second hash (dht identifier) -> list of encrypted peer.ID

What do you think?

Copy link
Author

@noot noot Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I guess even though that case is unlikely, it still could potentially happen (especially if the prefix length chosen isn't that short). =

I think a map of double hash -> encrypted peer IDs would also work, I think the reason I didn't do that is due to the format of the Message_Peer type and how the pb messages are sent generally - they are sent by peer, not by provided data (since before there were no prefixes). let me revisit this and try to find a better solution!

update: I've updated the code to now use map[string][]*peer.AddrInfo for prefix lookups, where the map key is the full lookup key. I think it's cleaner now, let me know what you think!

dht.go Outdated Show resolved Hide resolved
handlers.go Outdated Show resolved Hide resolved
@guillaumemichel
Copy link

Would it make things easier to add a Prefix type? It may be easier to manage prefix length and could be more efficient than bit strings in the map[string][]*peer.AddrInfo

@@ -419,6 +431,21 @@ func makeRoutingTable(dht *IpfsDHT, cfg dhtcfg.Config, maxLastSuccessfulOutbound
return rt, err
}

// SetPrefixLength sets the prefix length for DHT provider lookups.
func (dht *IpfsDHT) SetPrefixLength(prefixLength int) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SetPrefixLength called somewhere in go-libp2p-kad-dht? Or is this function meant to be called by kubo or go-libp2p to configure the DHT settings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's meant to be called externally. I needed this for testing, but thought it could be useful for users of the library also, since the only other way to set the prefix length is at DHT initialization.

@@ -97,7 +97,7 @@ func (m *messageSenderImpl) SendRequest(ctx context.Context, p peer.ID, pmes *pb

stats.Record(ctx,
metrics.SentRequests.M(1),
metrics.SentBytes.M(int64(pmes.Size())),
metrics.SentBytes.M(int64(pmes.XXX_Size())),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the XXX stand for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, it's auto-generated from protobuf, haha

set map[peer.ID]time.Time
providers []peer.ID
set map[peer.ID]time.Time
keyToProviders map[string][]peer.ID // TODO change to map of maps, otherwise there can be duplicate peer IDs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would indeed make sense to have providers as a map, and keyToProviders as a map of maps 👍🏻

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually now that I thought about it, I think these are okay as slices because there can't be duplicate provider records in the datastore, since the DB key is the provider ID + provided CID essentially. so when these slices are written to while the database is being iterated over, it shouldn't be possible to have duplicate peer IDs.

Type: typ,
Key: &Message_Key{
Key: key,
PrefixBitLength: uint32(prefixBitLength),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint16 should be enough, but I am totally unfamiliar with how pb works and whether the type can actually be changed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no uint16 type in protobuf sadly :( uint32 is the smallest

@@ -197,6 +203,12 @@ func New(ctx context.Context, h host.Host, options ...Option) (*IpfsDHT, error)
return nil, err
}

dht.prefixLength = cfg.PrefixLookupLength
if dht.prefixLength >= 256 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: Is the prefix length set globally for the kubo node, or can it be changed for each request (e.g with Options, or argument)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be changed for each request by calling SetPrefixLength right before the lookup. Another option I just thought of (that might be nicer API-wise?) would be to create an option specifically for the FindProviders function that the caller can optionally pass in. what do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be awesome if the prefix can be set on a per request basis, with a global default value (either static e.g taken from the conf, or adaptive) for if the option isn't specifically set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants