Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOSED] Similarity matrix #1

Closed
hardbyte opened this issue May 30, 2017 · 0 comments
Closed

[CLOSED] Similarity matrix #1

hardbyte opened this issue May 30, 2017 · 0 comments

Comments

@hardbyte
Copy link
Collaborator

Issue by tho802
Monday Apr 11, 2016 at 16:48 GMT
Originally opened as https://github.csiro.au/magic/AnonymousLinking/issues/1


When determining Bloom Filter similarity the score is computed in C for all NxM combinations, but only the single best match is returned. This somewhat limits the next step that solves pairwise matches for the entire network at once.

Currently the structure is:

  • the index in filters1
  • the similarity score between 0 and 1 of the best match
  • The original index in entity A
  • The original index in entity B
  • The index in filters2 of the best match

Instead of computing a tuple for each entity in entity A, we could explore the memory/accuracy trade off of instead computing a similarity matrix - recording the n-gram similarity score between every pair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant