Linkage is a library for record linkage between one or two database tables.
Linkage uses Sequel to talk to databases, so any database that Sequel can talk to, Linkage can talk to. You just give Linkage the Sequel-style URI and the database table name:
ds = Linkage::Dataset.new('mysql://example.com/database_name', 'table_name')
To describe a linkage, you use the
parents = Linkage::Dataset.new('postgres://example.com/foo', 'parents') children = Linkage::Dataset.new('mysql://some-other-host.net/bar', 'children') config = parents.link_with(children) do lhs[:first_name].must == rhs[:parent_first_name] lhs[:last_name].must == rhs[:parent_last_name] lhs[:last_name].must_not == "Smith" # exclude parents with the last # name "Smith" save_results_in('sqlite://results.db') # see below end
Note that the datasets don't have to be in the same database, or even on the same machine.
To run a linkage, use a Runner with the resulting configuration from
runner = Linkage::SingleThreadedRunner.new(config) runner.execute
The runner saves results in a database that you specify in the configuration
save_results_in method). It stores its results in two database
groups table contains all of the
unique combinations of values in your datasets, and
records to groups.
You can also link a dataset to itself:
births = Linkage::Dataset.new('postgres://example.com/hospital_data', 'births') config = births.link_with(births) do lhs[:mother_first_name].must == rhs[:mother_first_name] lhs[:mother_last_name].must == rhs[:mother_last_name] end runner = Linkage::SingleThreadedRunner.new(config, 'sqlite://results.db') runner.execute
The above example would find birth records that have mothers with the same name.
Contributing to linkage
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
- Fork the project
- Start a feature/bugfix branch
- Commit and push until you are happy with your contribution
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
Copyright (c) 2011 Vanderbilt University. See LICENSE.txt for further details.