You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, blocking and scoring are two distinct phases.
All the information about how two records came to be blocked together is unused by the scorer. This is a bit silly, as the fact that two records are blocked together by multiple predicates could be a pretty good indicator of co-reference.
I'm not really clear what the best way to take advantage of blocking information in scoring is though.
a few ideas:
ensemble model. Treat each each blocking predicate as a classifier, and put them in an ensemble with the scorer
blocking as feature: add dummy features indicating which predicate rules are cover a pair. these features get fed into the scorer
In both cases, i'm not quite sure how to set up the training.
The text was updated successfully, but these errors were encountered:
Splink uses something very similar to method 2. See https://youtu.be/msz3T741KQI?t=2035 for a nice way of how they think about the different "types" of comparisons that can happen. The whole video had some other great thoughts and visualizations in there too I thought.
Right now, blocking and scoring are two distinct phases.
All the information about how two records came to be blocked together is unused by the scorer. This is a bit silly, as the fact that two records are blocked together by multiple predicates could be a pretty good indicator of co-reference.
I'm not really clear what the best way to take advantage of blocking information in scoring is though.
a few ideas:
In both cases, i'm not quite sure how to set up the training.
The text was updated successfully, but these errors were encountered: