Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A record linkage library for Ruby
Ruby VimL
branch: group_by

This branch is 228 commits behind master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
test
.document
.gitignore
.vimrc
Gemfile
Gemfile.lock
Guardfile
LICENSE.txt
README.markdown
Rakefile
VERSION
linkage.gemspec

README.markdown

linkage

Linkage is a library for record linkage between one or two database tables.

Usage

Linkage uses Sequel to talk to databases, so any database that Sequel can talk to, Linkage can talk to. You just give Linkage the Sequel-style URI and the database table name:

ds = Linkage::Dataset.new('mysql://example.com/database_name', 'table_name')

To describe a linkage, you use the Dataset#link_with method.

parents = Linkage::Dataset.new('postgres://example.com/foo', 'parents')
children = Linkage::Dataset.new('mysql://some-other-host.net/bar', 'children')
config = parents.link_with(children) do
  lhs[:first_name].must == rhs[:parent_first_name]
  lhs[:last_name].must == rhs[:parent_last_name]
  lhs[:last_name].must_not == "Smith"  # exclude parents with the last
                                       # name "Smith"

  save_results_in('sqlite://results.db') # see below
end

Note that the datasets don't have to be in the same database, or even on the same machine.

To run a linkage, use a Runner with the resulting configuration from Dataset#link_with:

runner = Linkage::SingleThreadedRunner.new(config)
runner.execute

The runner saves results in a database that you specify in the configuration (via the save_results_in method). It stores its results in two database tables: groups and groups_records. The groups table contains all of the unique combinations of values in your datasets, and groups_records maps records to groups.

You can also link a dataset to itself:

births = Linkage::Dataset.new('postgres://example.com/hospital_data', 'births')
config = births.link_with(births) do
  lhs[:mother_first_name].must == rhs[:mother_first_name]
  lhs[:mother_last_name].must == rhs[:mother_last_name]
end
runner = Linkage::SingleThreadedRunner.new(config, 'sqlite://results.db')
runner.execute

The above example would find birth records that have mothers with the same name.

Contributing to linkage

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
  • Fork the project
  • Start a feature/bugfix branch
  • Commit and push until you are happy with your contribution
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.

Copyright

Copyright (c) 2011 Vanderbilt University. See LICENSE.txt for further details.

Something went wrong with that request. Please try again.