Design matching algorithm #6

Mitigooli · 2016-01-24T17:38:37Z

We first implement a matching between two entries. The result of the matching between the two entries is a distance score between those two. The distance score is the sum of the distances between its components for each attribute. This way the two entries are equal if they get the distance score 0, i.e. if all the attributes match.

The components are computed as follows: The distance between the values is the absolute value of their arithmetic difference.

The distance between the dates is the difference in days.

The distance between the descriptions are a bit more complex. We work with them as case-insensitive. If one is a sub-string of the other, the distance is 0.

Possibly it is interesting to weight the different components.

The next step is to match entries in one left-hand-side list against entries in another right-hand-side list. The result is a matching matrix where the rows are entries from the left-hand-side list and the columns are entries in the right-hand-side list. Basically we can find the minimums for rows and columns, those are the best matchings.

Suggestion for defining the matching score:

Order description, date.
The distance between descriptions are calculated as:
- 0 if one string is the subset of the other string (case insensitive),
- and as 5 (a chosen threshold) if they are not a subset of each other (case insensitive).
The distance between dates is the absolute number of days.
The sum of the date and description distances are added together to give the final distance between two entries.

Depending on how the matching algorithm is implemented, it might not be commutative, meaning that you cannot get the same results if you match file A against B or file B against A. In case of a non-commutative implementation, it is suggested to compare the budget file (which has a higher possibility to be incomplete) against the bank file.

dbosk added a commit that referenced this issue Jan 26, 2016

Adds Mitra's outline of the algorithm (from #6)

5fe41c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design matching algorithm #6

Design matching algorithm #6

Mitigooli commented Jan 24, 2016

Design matching algorithm #6

Design matching algorithm #6

Comments

Mitigooli commented Jan 24, 2016