Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shoud we integrate with scikit learn #991

Closed
fgregg opened this issue Apr 25, 2022 · 6 comments · Fixed by #992
Closed

shoud we integrate with scikit learn #991

fgregg opened this issue Apr 25, 2022 · 6 comments · Fixed by #992

Comments

@fgregg
Copy link
Contributor

fgregg commented Apr 25, 2022

Thinking about #990, i'm wondering if we should make scikit learn a dependency.

When we started dedupe 10 years ago, it was really hard to get scipy and scikit learn set up on users machines, but it's not anymore.

It would be nice to get out of the game of implementing some algos ourselves.

Thoughts, @fjsj @NickCrews

@fjsj
Copy link
Contributor

fjsj commented Apr 25, 2022

I agree, IMO scikit-learn is fairly default and sometimes I see it being used just for model evaluation / visualization, so it ends up being a very common dependency in ML projects.

@NickCrews
Copy link
Contributor

I think this move would be well worth it.

Presumably after doing this it would be easy to swap out different classifiers, so a to-do is figuring out how to configure which classifier you want to use.

@fgregg
Copy link
Contributor Author

fgregg commented Apr 25, 2022

this is actually very easy, because i've already made so the classifier has the sklearn api.

@fjsj
Copy link
Contributor

fjsj commented Apr 25, 2022

Here I adapted the code via inheritance to support sklearn classifiers: https://github.com/vintasoftware/deduplication-slides/blob/pycon-2020/rf_dedupe.py

But it's worth checking the conditionals on def fit against the current Dedupe version.

@fgregg fgregg linked a pull request Jun 2, 2022 that will close this issue
3 tasks
@fgregg
Copy link
Contributor Author

fgregg commented Jun 2, 2022

closed by #992

@fgregg fgregg closed this as completed Jun 2, 2022
@NickCrews
Copy link
Contributor

Nice job @fgregg, this looks awesome!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants