[CLOSED] Similarity matrix #1

hardbyte · 2017-05-30T01:41:24Z

Issue by tho802
Monday Apr 11, 2016 at 16:48 GMT
Originally opened as https://github.csiro.au/magic/AnonymousLinking/issues/1

When determining Bloom Filter similarity the score is computed in C for all NxM combinations, but only the single best match is returned. This somewhat limits the next step that solves pairwise matches for the entire network at once.

Currently the structure is:

the index in filters1
the similarity score between 0 and 1 of the best match
The original index in entity A
The original index in entity B
The index in filters2 of the best match

Instead of computing a tuple for each entity in entity A, we could explore the memory/accuracy trade off of instead computing a similarity matrix - recording the n-gram similarity score between every pair.

hardbyte added the enhancement label May 30, 2017

hardbyte mentioned this issue May 30, 2017

[CLOSED] Feature similarity matrix #4

Closed

hardbyte closed this as completed May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLOSED] Similarity matrix #1

[CLOSED] Similarity matrix #1

hardbyte commented May 30, 2017

[CLOSED] Similarity matrix #1

[CLOSED] Similarity matrix #1

Comments

hardbyte commented May 30, 2017