Skip to content
This repository has been archived by the owner on Dec 20, 2019. It is now read-only.

Similarity measures

Daniele Guido edited this page Feb 19, 2016 · 1 revision

(wip)

Queries

Below some queries that help histograph in identifying clone candidates

// name: get_clones
// get two people having same specificity, sorted by jaccard and union
MATCH (p1)-[r:appear_in_same_document]-(p2)
WHERE id(p1) < id(p2) AND r.union > 2 AND r.intersections > 2 AND p2.specificity = p1.specificity
WITH p1, p2, r
RETURN p1.name, p2.name, p1.specificity, p2.specificity, r.jaccard, r.union, r.intersections
ORDER BY r.jaccard DESC, r.union DESC
LIMIT 500
// name: get_top_similar_entity
MATCH (p1)-[r:appear_in_same_document]-(p2)
WHERE id(p1) = {id}
WITH p1, p2, r
RETURN p1.name, p2.name, p2.wiki_id, p1.specificity, p2.specificity, r.jaccard, r.union, r.intersections, r.difference
ORDER BY r.jaccard DESC, r.union DESC
LIMIT 1