Our goal is to devise a probabilistic (fuzzy) data-matching algorithm to be used in a client data platform. This platform acts as a central hub for data from multiple sources. This algorithm will generate functionality to evaluate various field values between a master record, and a record to be matched. A metric of confidence of a match will be returned to the master record indicating potential matches.
This program is not suitable for general data matching purposes. This was designed for a use-case specific to our client. However, the underlying methods of matching (similarity calculation & indexing method) are useful for general purposes.
Download the JAR file above (DataMatching_v1.jar) and add it as a dependency in your project. Then simply call the classes and methods as needed.