SQL Server wrapper for the SimMetrics string matching algorithms
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
dist
docs
src
.gitignore
LICENCE.txt
README.md
build.properties
build.xml

README.md

This project is a C# wrapper and set of SQL Server installation scripts to make the SimMetrics string matching algorithms available in SQL Server.

  • SimMetrics was originally released at SourceForge. This project uses version 1.5 of that library. Subsequent versions were migrated to Java.
  • The C# wrapper was inspired by this blogpost

Descriptions of the supported string fuzzy match functions are provided on the wiki home page.

Motivation

This project was motivated by the frequent need for fuzzy matching (approximate string matching) algorithms in data analytics and data science work. These algorithms are missing from SQL Server. Many projects do not have the time, licencing, or budget to install additional SQL Server packages such as SSIS. Furthermore, it is best to do as much data science work as possible through program code rather than manual graphical wizards as outlined in the Guerrilla Analytics Principles. You can read more about Guerrilla Analytics in the book.

Dependencies

The project has minimal dependencies.

Installation, Configuration, Examples and how to contribute

Installation and configuration are controlled by an Apache Ant build file. Configure your database settings and you should be good to go.

Please see the GitHub wiki page for details.

Simple Code Example

You can find the functions under a schema with the name of the Similarity library version e.g. Similarity_<Major version>_<minor version>_<patch version>.

To use these functions in SQL code, simply call the function while specifying its full name. For example: SELECT SIMILARITY_1_1_0.Levenshtein('THE QUICK BROWN FOX','THE QUICK FOX')

For more detailed examples, please see the Quick Guide on the wiki.

License

This overall project is released under the GPLv3.

  • The SimMetrics library was released under GPLv2 and can be downloaded from here.
  • This project was inspired by a blogpost by Anastasios Yalanopoulos at http://anastasiosyal.com/. Please see that author's licence terms in associated code files.