Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqlite example #51

Merged
merged 128 commits into from
Jan 28, 2013
Merged

Sqlite example #51

merged 128 commits into from
Jan 28, 2013

Conversation

derekeder
Copy link
Contributor

create an example usage of dedupe with a sqlite database using campaign contribution data

  • create tables and loads in raw data
  • create pattern for sampling, training and creating blocking rules
  • run rules and dedupe functions to create table of unique donors

relates to #49

@derekeder derekeder mentioned this pull request Oct 24, 2012
derekeder and others added 7 commits October 24, 2012 17:34
+ added cosineSimilarity, createCanopies and coverage functions
+ added basic test cases (not in testing framework)
+ wired up TF-IDF to trainBlocking in blocking.py
+ BUG: TF-IDF not covering blocked data the way we expect
+ TODO: investigate TF-IDF blocking logic
…s, continued working on blocking sqlite data
@derekeder
Copy link
Contributor Author

Done:

  • implemented non-compound tf-idf predicates

To do:

  • compound predicates
  • use learned tf-idf thresholds to define blocking
  • read/write settings is broken with addition of tf-idf

@derekeder
Copy link
Contributor Author

Todo:

  • compound predicates
  • test against sqlite_example

fgregg and others added 26 commits January 15, 2013 07:14
write entity_map table in illinois_contributions db
@derekeder
Copy link
Contributor Author

OMG this feature branch is ready to be merged in to master!

Accomplished all above stated goals using a sqlite database with IL campaign contributions

derekeder added a commit that referenced this pull request Jan 28, 2013
@derekeder derekeder merged commit 4381ee5 into master Jan 28, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants