Skip to content

Commit

Permalink
Updated docs to show output of match method
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric van Zanten committed Jul 15, 2014
1 parent 1234ca8 commit 97b4f94
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions docs/common_dedupe_methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@
.. py:method:: match(data, [threshold = 0.5])
Identifies records that all refer to the same entity, returns tuples of
record ids, where the record_ids within each tuple should refer to the
same entity
Identifies records that all refer to the same entity, returns tuples
containing a set of record ids and a confidence score as a float between 0
and 1. The record_ids within each set should refer to the
same entity and the confidence score is a cophenetic distance of the cluser.

This method should only used for small to moderately sized datasets for
larger data, use matchBlocks
Expand All @@ -42,7 +43,7 @@
> duplicates = deduper.match(data, threshold=0.5)
> print duplicates
[(3,6,7), (2,10), ..., (11,14)]
[(set([3,6,7]), 0.96778509), (set([2,10]),0.750963245) ..., (set([11,14]),0.1256734)]
.. py:method:: blocker(data)
Expand Down

1 comment on commit 97b4f94

@fgregg
Copy link
Contributor

@fgregg fgregg commented on 97b4f94 Jul 16, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Also need this for the other classes. RecordLink and Gazetteer

Please sign in to comment.