-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor labeler.py #1065
Refactor labeler.py #1065
Commits on Jun 17, 2022
-
Use conventions for fit(), fit_transform(), etc.
The meaning of these methods is not consistent with how sklearn uses them. For them, fit_transform(X, y) means fit(X, y).transform(X), but we were using it as fit(transform(X), y). So just rename fit_transform() to fit(), and now it has the correct meaning. What's more, the transform method isn't used external to the class, and I don't think it should be. The public API should only deal with TrainingPairs, it shouldn't deal with the calculated distances. Rename it to _distances() The old fit() method now is just an internal helper method that deals with already calculated distance data.
Configuration menu - View commit details
-
Copy full SHA for 70f8f31 - Browse repository at this point
Copy the full SHA 70f8f31View commit details -
Remove redundant setting of self.X, self.y
It is already done in self._fit()
Configuration menu - View commit details
-
Copy full SHA for 7f3842b - Browse repository at this point
Copy the full SHA 7f3842bView commit details -
Configuration menu - View commit details
-
Copy full SHA for e4e8697 - Browse repository at this point
Copy the full SHA e4e8697View commit details -
Configuration menu - View commit details
-
Copy full SHA for 56e2685 - Browse repository at this point
Copy the full SHA 56e2685View commit details -
Show error codes from mypy runs
Now it prints something like `[arg-type]` when it errors, so you can add `# type: ignore[arg-type]` and be specific about the error you are silencing
Configuration menu - View commit details
-
Copy full SHA for 685a0db - Browse repository at this point
Copy the full SHA 685a0dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for cd8782e - Browse repository at this point
Copy the full SHA cd8782eView commit details -
Don't store data_model in BlockLearner
It's not actually used anywhere besides from creating the initial set of candidate predicates
Configuration menu - View commit details
-
Copy full SHA for d59d538 - Browse repository at this point
Copy the full SHA d59d538View commit details -
Make BlockLearner.predict private
it's only used internally, so make that obvious
Configuration menu - View commit details
-
Copy full SHA for 5bad271 - Browse repository at this point
Copy the full SHA 5bad271View commit details -
Remove unneeded type hint BlockLearner.candidates
Already is part of the Learner base class
Configuration menu - View commit details
-
Copy full SHA for 0b0a117 - Browse repository at this point
Copy the full SHA 0b0a117View commit details -
Delegate DisagreementLearner.learn_predicates()
DisagreementLearner is really reaching far down into the contained objects. The lower classes themselves should be responsible for this sort of thing. Sure, it makes the code more complicated, but the way it is is fooling ourselves that it is simple, and making it prone to breaking int he future.
Configuration menu - View commit details
-
Copy full SHA for 3ded0bd - Browse repository at this point
Copy the full SHA 3ded0bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 925030b - Browse repository at this point
Copy the full SHA 925030bView commit details -
Configuration menu - View commit details
-
Copy full SHA for ced6f4e - Browse repository at this point
Copy the full SHA ced6f4eView commit details -
Remove fit() from DisagreementLearner API
It's not actually used anywhere. See dedupeio#1065 (comment)
Configuration menu - View commit details
-
Copy full SHA for e3442ef - Browse repository at this point
Copy the full SHA e3442efView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5386d6f - Browse repository at this point
Copy the full SHA 5386d6fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 62ad7ca - Browse repository at this point
Copy the full SHA 62ad7caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 24a2609 - Browse repository at this point
Copy the full SHA 24a2609View commit details -
Test more-public interface of labeler
We were only testing the very small component of labeler. Now we actually go through most lines of code. Check out coverage.
Configuration menu - View commit details
-
Copy full SHA for d77c118 - Browse repository at this point
Copy the full SHA d77c118View commit details
Commits on Jun 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 3e36957 - Browse repository at this point
Copy the full SHA 3e36957View commit details -
- Remove many unused methods of MatchLearner like mark() and pop(). I think these used to get used when this was the only learner class, but now they aren;t used anywhere. - Just set MatchLeaner.candidates in constructor. It makes it way easier to reason about. - Adjust the inheritance. Now DisagreementLearner is out of the heirarchy that MatchLEarner and BlockLearner are in. This is good, because DisagreementLEarner OWNS these other two it is not a "is a" relationship - Remove the mark() function from the sub-learners. They just have the fit() method now, but they don't actually persist this training data, which is in line with the naming of fit(). - Make `candidates` a RO attribute, makes it easier to reason about, we don't have to worry about someone outside of the calss coming in and changing it. - Fix a bug in BlockLearner where `remove` never actually removed entry from `candidates`, so if you broke the cahce of _cached_scores and ended up calling `self._predict(self.candidates)`, you would get the result from all of the original candidates. - In the test, actually check for the values of the candidates, not just the number of them. - Rename `_remove` to `remove` in the sublearners, since they are publicly used in DisagreementLEarner - Remove the unused candidate_scores from the DisagreementLearner public API - Always make a copy in sample_records() to avoid footguns
Configuration menu - View commit details
-
Copy full SHA for 1b230ac - Browse repository at this point
Copy the full SHA 1b230acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2605b29 - Browse repository at this point
Copy the full SHA 2605b29View commit details -
Configuration menu - View commit details
-
Copy full SHA for f9d88ec - Browse repository at this point
Copy the full SHA f9d88ecView commit details
Commits on Jul 9, 2022
-
Configuration menu - View commit details
-
Copy full SHA for c21398f - Browse repository at this point
Copy the full SHA c21398fView commit details -
Only have pytest config in pyproject.toml
The config in setup.cfg is ignored.
Configuration menu - View commit details
-
Copy full SHA for 3dc3a96 - Browse repository at this point
Copy the full SHA 3dc3a96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b330cf - Browse repository at this point
Copy the full SHA 5b330cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 81bf3f5 - Browse repository at this point
Copy the full SHA 81bf3f5View commit details -
ValueError if candidate_scores() used before fit()
Makes it consistent between the MatchLEarner and BlockLearner. Also fixes a bug where self._fitted was never set to True
Configuration menu - View commit details
-
Copy full SHA for 5d3a100 - Browse repository at this point
Copy the full SHA 5d3a100View commit details -
Configuration menu - View commit details
-
Copy full SHA for 83b5293 - Browse repository at this point
Copy the full SHA 83b5293View commit details -
Configuration menu - View commit details
-
Copy full SHA for 918b824 - Browse repository at this point
Copy the full SHA 918b824View commit details
Commits on Sep 1, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 1212a7b - Browse repository at this point
Copy the full SHA 1212a7bView commit details