Skip to content

Simple triples storage for projects too small to install an RDF database.

License

Notifications You must be signed in to change notification settings

davidrichards/slender_t

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slender_t

SlenderT is a triples store It is simple for the simple jobs. I decided to replace a recommendation engine with a set of triples and some simple query tools, and this is the result. I didn’t want to break out the bigger tools for something that will only have hundreds of triples.

The ideas started with of “Programming the Semantic Web” by Toby Segaran et al. Toby’s code was in Python, and of course this is a Ruby version. I’ve used some Ruby idioms and created an IRB application (slender_t).

Usage

From the command line:

tmp : slender_t
Loading SlenderT version: 0.1.1
>> st = SlenderT.new
# => SlenderT: []
>> st.add :david, :writes, :code
# => [:david, :writes, :code]
>> st.add :david, :name, "David Richards"
# => [:david, :name, "David Richards"]
>> st.add :code, :name, 'Software'
# => [:code, :name, "Software"]
>> st.find nil, :name
# => [[:david, :name, "David Richards"], [:code, :name, "Software"]]

I’ve tried this with some larger data sets. The business data set that the O’Reilly provides can load about 36,500 triples in 22.3 seconds. I’d say that somewhere around that size of a data set is about where this library begins to need something a little more robust, such as Sesame or Redland. It’s up to you, of course. I haven’t made much effort at optimization, with the exception of the add_to_index method, which was exponential before, and is nearly linear now.

I’ve implemented the query concept from the book. The syntax is a little strange, but it works fine. Here is an example:

>> db.query(['?company', 'headquarters', 'New_York_New_York'], 
?> ['?company', 'industry', 'Investment Banking'],            
?> ['?contribution', 'contributor', '?company'],              
?> ['?contribution', 'recipient', 'Orrin Hatch'],             
?> ['?contribution', 'amount', '?dollars'])                   
=> [{"?contribution"=>"contrib285", "?dollars"=>30700.0, "?company"=>"BSC"}]

What this means is that BSC contributed $30,700 to Senator Orrin Hatch. If I dig around a little, I can find out some more information.

>> val = db.query(['?company', 'headquarters', 'New_York_New_York'], 
?> ['?company', 'industry', 'Investment Banking'],            
?> ['?contribution', 'contributor', '?company'],              
?> ['?contribution', 'recipient', '?recipient'],             
?> ['?contribution', 'amount', '?dollars'])
>> val.size
=> 110

That tells us that we know of 110 contributions from New York investment banks.

>> db.find('BSC', 'name', nil)
=> [["BSC", "name", "Bear Stearns"]]

That tells us that BSC is Bear Stearns.

>> val = db.query(['?contribution', 'contributor', 'BSC'],              
?> ['?contribution', 'recipient', '?recipient'],                        
?> ['?contribution', 'amount', '?dollars'])                             
=> [{"?contribution"=>"contrib285", "?dollars"=>30700.0, "?recipient"=>"Orrin Hatch"}, {"?contribution"=>"contrib284",
"?dollars"=>168335.0, "?recipient"=>"Hillary Rodham Clinton"}, {"?contribution"=>"contrib287", "?dollars"=>5600.0,
"?recipient"=>"Christopher Shays"}, {"?contribution"=>"contrib288", "?dollars"=>205100.0, "?recipient"=>"Christopher Dodd"},
{"?contribution"=>"contrib290", "?dollars"=>17300.0, "?recipient"=>"Frank Lautenberg"}, {"?contribution"=>"contrib286",
"?dollars"=>5000.0, "?recipient"=>"Barney Frank"}, {"?contribution"=>"contrib289", "?dollars"=>13000.0, "?recipient"=>"Michael
Dean Crapo"}, {"?contribution"=>"contrib294", "?dollars"=>4600.0, "?recipient"=>"Pete Sessions"},
{"?contribution"=>"contrib295", "?dollars"=>5000.0, "?recipient"=>"Paul E. Kanjorski"}, {"?contribution"=>"contrib292",
"?dollars"=>6600.0, "?recipient"=>"Nita Lowey"}, {"?contribution"=>"contrib293", "?dollars"=>5000.0, "?recipient"=>"Deborah
Pryce"}, {"?contribution"=>"contrib291", "?dollars"=>102260.0, "?recipient"=>"Joe Lieberman"}]
>> val.size
=> 12

That tells us that we know about 12 contributions that Bear Stearns made, including to Hillary Rodham Clinton for $168,335 and Joe Lieberman for $102,260.

You get the picture. I think I’ll create some classes soon to play with some inference, graph merging, and query patterns.

You should also know that you can load and save your triplet stores:

# st = SlenderT.load(some_csv_content_filename_or_url)
# st.save(some_filename)

Also, I was not trying to optimize anything. There is a really slow operation in the add method that I should probably fix, at least. This isn’t meant to be a fast data store, just a simple one. You may want to look at Sesame or Redland if you want to use a data set over 10,000 or 20,000 records. Like I said, I needed 200 or 300 triplets for a mini-recommendation engine, and I just couldn’t justify a full-on RDF system for that.

I’m also very interested in thinking about queries on graphs. I have several other projects that I think could benefit from some thoughts on a simple query syntax. I’m thinking of marginal, fathom, and overalls. These are all belief maintenance systems of some sort (joint distribution tables, Bayesian network, and nonmonotonic reasoning respectively) that got tabled until I thought through a better way to interface them.

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but

    bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright © 2009 David Richards. See LICENSE for details.

About

Simple triples storage for projects too small to install an RDF database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages