Library Documentation

`Dedupe` Objects

dedupe.Dedupe

# initialize from a defined set of fields
variables = [
    {'field' : 'Site name', 'type': 'String'},
    {'field' : 'Address', 'type': 'String'},
    {'field' : 'Zip', 'type': 'String', 'has missing':True},
    {'field' : 'Phone', 'type': 'String', 'has missing':True},
]
deduper = dedupe.Dedupe(variables)

prepare_training

uncertain_pairs

mark_pairs

train

write_training

write_settings

cleanup_training

partition

`StaticDedupe` Objects

dedupe.StaticDedupe

with open('learned_settings', 'rb') as f:
    matcher = StaticDedupe(f)

partition

`RecordLink` Objects

dedupe.RecordLink

# initialize from a defined set of fields
variables = [
    {'field' : 'Site name', 'type': 'String'},
    {'field' : 'Address', 'type': 'String'},
    {'field' : 'Zip', 'type': 'String', 'has missing':True},
    {'field' : 'Phone', 'type': 'String', 'has missing':True},
]
deduper = dedupe.RecordLink(variables)

prepare_training

uncertain_pairs

mark_pairs

train

write_training

write_settings

cleanup_training

join

`StaticRecordLink` Objects

dedupe.StaticRecordLink

with open('learned_settings', 'rb') as f:
    matcher = StaticRecordLink(f)

join

`Gazetteer` Objects

dedupe.Gazetteer

# initialize from a defined set of fields
variables = [
    {'field' : 'Site name', 'type': 'String'},
    {'field' : 'Address', 'type': 'String'},
    {'field' : 'Zip', 'type': 'String', 'has missing':True},
    {'field' : 'Phone', 'type': 'String', 'has missing':True},
]
matcher = dedupe.Gazetteer(variables)

prepare_training

uncertain_pairs

mark_pairs

train

write_training

write_settings

cleanup_training

index

unindex

search

`StaticGazetteer` Objects

dedupe.StaticGazetteer

with open('learned_settings', 'rb') as f:
    matcher = StaticGazetteer(f)

index

unindex

search

blocks

score

many_to_n

Lower Level Classes and Methods

With the methods documented above, you can work with data into the millions of records. However, if are working with larger data you may not be able to load all your data into memory. You'll need to interact with some of the lower level classes and methods.

The PostgreSQL and MySQL examples use these lower level classes and methods.

Dedupe and StaticDedupe

dedupe

fingerprinter

Instance of dedupe.blocking.Fingerprinter class if the train has been run, else None.

pairs

score

cluster

fingerprinter

Instance of dedupe.blocking.Fingerprinter class

pairs(data)

Same as dedupe.Dedupe.pairs

score(pairs)

Same as dedupe.Dedupe.score

cluster(scores, threshold=0.5)

Same as dedupe.Dedupe.cluster

RecordLink and StaticRecordLink

fingerprinter

Instance of dedupe.blocking.Fingerprinter class if the train has been run, else None.

pairs

score

one_to_one

many_to_one

fingerprinter

Instance of dedupe.blocking.Fingerprinter class

pairs(data_1, data_2)

Same as dedupe.RecordLink.pairs

score(pairs)

Same as dedupe.RecordLink.score

one_to_one(scores, threshold=0.0)

Same as dedupe.RecordLink.one_to_one

many_to_one(scores, threshold=0.0)

Same as dedupe.RecordLink.many_to_one

Gazetteer and StaticGazetteer

fingerprinter

Instance of dedupe.blocking.Fingerprinter class if the train has been run, else None.

blocks

score

many_to_n

fingerprinter

Instance of dedupe.blocking.Fingerprinter class

blocks(data)

Same as dedupe.Gazetteer.blocks

score(blocks)

Same as dedupe.Gazetteer.score

many_to_n(score_blocks, threshold=0.0, n_matches=1)

Same as dedupe.Gazetteer.many_to_n

`Fingerprinter` Objects

dedupe.blocking.Fingerprinter

__call__

index_fields

index

unindex

reset_indices

Convenience Functions

dedupe.console_label

dedupe.training_data_dedupe

dedupe.training_data_link

dedupe.canonicalize

dedupe.read_training

dedupe.write_training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API-documentation.rst

API-documentation.rst

Library Documentation

`Dedupe` Objects

`StaticDedupe` Objects

`RecordLink` Objects

`StaticRecordLink` Objects

`Gazetteer` Objects

`StaticGazetteer` Objects

Lower Level Classes and Methods

Dedupe and StaticDedupe

RecordLink and StaticRecordLink

Gazetteer and StaticGazetteer

`Fingerprinter` Objects

Convenience Functions

Files

API-documentation.rst

Latest commit

History

API-documentation.rst

File metadata and controls

Library Documentation

Dedupe Objects

StaticDedupe Objects

RecordLink Objects

StaticRecordLink Objects

Gazetteer Objects

StaticGazetteer Objects

Lower Level Classes and Methods

Dedupe and StaticDedupe

RecordLink and StaticRecordLink

Gazetteer and StaticGazetteer

Fingerprinter Objects

Convenience Functions

`Dedupe` Objects

`StaticDedupe` Objects

`RecordLink` Objects

`StaticRecordLink` Objects

`Gazetteer` Objects

`StaticGazetteer` Objects

`Fingerprinter` Objects