You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We first implement a matching between two entries. The result of the matching between the two entries is a distance score between those two. The distance score is the sum of the distances between its components for each attribute. This way the two entries are equal if they get the distance score 0, i.e. if all the attributes match.
The components are computed as follows: The distance between the values is the absolute value of their arithmetic difference.
The distance between the dates is the difference in days.
The distance between the descriptions are a bit more complex. We work with them as case-insensitive. If one is a sub-string of the other, the distance is 0.
Possibly it is interesting to weight the different components.
The next step is to match entries in one left-hand-side list against entries in another right-hand-side list. The result is a matching matrix where the rows are entries from the left-hand-side list and the columns are entries in the right-hand-side list. Basically we can find the minimums for rows and columns, those are the best matchings.
Suggestion for defining the matching score:
Order description, date.
The distance between descriptions are calculated as:
0 if one string is the subset of the other string (case insensitive),
and as 5 (a chosen threshold) if they are not a subset of each other (case insensitive).
The distance between dates is the absolute number of days.
The sum of the date and description distances are added together to give the final distance between two entries.
Depending on how the matching algorithm is implemented, it might not be commutative, meaning that you cannot get the same results if you match file A against B or file B against A. In case of a non-commutative implementation, it is suggested to compare the budget file (which has a higher possibility to be incomplete) against the bank file.
The text was updated successfully, but these errors were encountered:
We first implement a matching between two entries. The result of the matching between the two entries is a distance score between those two. The distance score is the sum of the distances between its components for each attribute. This way the two entries are equal if they get the distance score 0, i.e. if all the attributes match.
The components are computed as follows: The distance between the values is the absolute value of their arithmetic difference.
The distance between the dates is the difference in days.
The distance between the descriptions are a bit more complex. We work with them as case-insensitive. If one is a sub-string of the other, the distance is 0.
Possibly it is interesting to weight the different components.
The next step is to match entries in one left-hand-side list against entries in another right-hand-side list. The result is a matching matrix where the rows are entries from the left-hand-side list and the columns are entries in the right-hand-side list. Basically we can find the minimums for rows and columns, those are the best matchings.
Suggestion for defining the matching score:
Depending on how the matching algorithm is implemented, it might not be commutative, meaning that you cannot get the same results if you match file A against B or file B against A. In case of a non-commutative implementation, it is suggested to compare the budget file (which has a higher possibility to be incomplete) against the bank file.
The text was updated successfully, but these errors were encountered: