Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Linkage using MySQL #691

Closed
coommark opened this issue Aug 24, 2018 · 2 comments
Closed

Record Linkage using MySQL #691

coommark opened this issue Aug 24, 2018 · 2 comments

Comments

@coommark
Copy link

First I apologize for having to ask a question here.

The question is, is there an example of record linkage MySQL (large dataset)? I have tried about six times to convert the record linkage example to MySQL without success.

Our specific usage scenario is:

  1. We have a large dataset which we have deduped successfully using dedupe (about 500k records).
  2. However, every other week, new messy data is acquired, which we dedupe separately using our trained settings. Now we need to link these two canonical dataset, the first being about 500k records and the second being maybe 1k records. Now as you can see, we cannot use CSV to link such large data (or can we?) So we're desperate for a sample MySQL record linkage.

I am fairly new to Python, started learning it only so I can use this library (Dedupe), but I am not yet that good to understand the nuances of this awesome lib and be able to create the needed MySQL record linkage.

PLEASE HELP!

@mehrrsaa
Copy link

mehrrsaa commented Aug 30, 2018

I think this may help you out:

https://github.com/dedupeio/address-matching/blob/sqlclass/address_matching.py

But you would have to change whatever is necessary to make it work, depending on your mysql version, use-case, etc.

There is no ready-made solution, you need to spend time on it.

@fgregg
Copy link
Contributor

fgregg commented Sep 13, 2018

Hi, we don't have an full example for that yet. Sorry.

Duplicate of dedupeio/dedupe-examples#23

@fgregg fgregg closed this as completed Sep 13, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants