You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm storing keywords and documents in the DHT, so if you search for "cancer software", it will first retrieve the key "cancer", then the key "software" from the network. These keys will contain document id arrays, e.g.:
It then performs an intersection of these keys, and retrieves the documents from the DHT network. So for the above example, roughly:
intersection = ["10.1039/cancer.research.1", "10.1039/cancer.research.2"]
_.each(intersection, function (docId) { chord.retrieve(docId) })
So for this search, 4 requests are made to the DHT: cancer, research, 10.1039/cancer.research.1 and 10.1039/cancer.research.2 (in effect, there are more requests still, because each keyword gets queried for all fields of a document, so title, abstract, authors, journal, etc., in the form of "[fieldname]keyword" keys. With 5 fields per document, that's 10 requests for just two keywords, and then 2 more to get the actual documents.
If any of these fails, the whole search fails. I cache the document id lookups, as these are static, but even so, the failure rate for searches is quite high.
I guess document lookup could happen from dx.doi.org as a fallback, in case the ID is a DOI (not in all cases), but even so, there should be a way to make this more resilient, either by smartly partially failing or contacting replicas for keywords where main node can't be reached.
The text was updated successfully, but these errors were encountered:
Related to this issue: tsujio/webrtc-chord#5
I'm storing keywords and documents in the DHT, so if you search for "cancer software", it will first retrieve the key "cancer", then the key "software" from the network. These keys will contain document id arrays, e.g.:
It then performs an intersection of these keys, and retrieves the documents from the DHT network. So for the above example, roughly:
So for this search, 4 requests are made to the DHT: cancer, research, 10.1039/cancer.research.1 and 10.1039/cancer.research.2 (in effect, there are more requests still, because each keyword gets queried for all fields of a document, so title, abstract, authors, journal, etc., in the form of "[fieldname]keyword" keys. With 5 fields per document, that's 10 requests for just two keywords, and then 2 more to get the actual documents.
If any of these fails, the whole search fails. I cache the document id lookups, as these are static, but even so, the failure rate for searches is quite high.
I guess document lookup could happen from dx.doi.org as a fallback, in case the ID is a DOI (not in all cases), but even so, there should be a way to make this more resilient, either by smartly partially failing or contacting replicas for keywords where main node can't be reached.
The text was updated successfully, but these errors were encountered: