Skip to content

Commit

Permalink
Introducing the repository
Browse files Browse the repository at this point in the history
  • Loading branch information
davidrichards committed Apr 10, 2012
1 parent f72805a commit df64393
Show file tree
Hide file tree
Showing 9 changed files with 259 additions and 51 deletions.
33 changes: 8 additions & 25 deletions README.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ <h1 id="gearbox">Gearbox</h1>

<p>To get to wherever I am now, I&rsquo;ve been playing with semantic models for a while. I started by demonstrating working code from various corners of my imagination. After a while, I thought I had enough to create a useful gem. In the past, I&rsquo;ve worked with <a href="https://github.com/bhuga/spira">Spira</a> and examples given by <a href="http://greggkellogg.net/">Gregg Kellogg</a>. These have been very useful in forming ideas.</p>

<p>As I worked on this code, I decided to go to <a href="https://github.com/bhuga/spira">Spira</a> to see how similar the two gems are. I was surprised how similar they actually are. The similarities are:</p>
<p>As I worked on this code, I decided to go to <a href="https://github.com/bhuga/spira">Spira</a> to see how similar the two gems are. I&rsquo;ve read Ben&rsquo;s code a dozen times before, but this time I was surprised how similar they actually are. The similarities are:</p>

<ul>
<li>both are ORMs for semantic data, written in Ruby</li>
Expand All @@ -14,7 +14,7 @@ <h1 id="gearbox">Gearbox</h1>
<li>value types</li>
</ul>

<p>However, Gearbox is quite a bit different. It came from a different place. I&rsquo;ve been trying to embrace the nature of semantic models. This, juxtaposed against relational models or other types of models. The major differences, I imagine are:</p>
<p>However, Gearbox is quite a bit different. It came from a different place. I&rsquo;ve been trying to embrace the nature of semantic models (or find a way to express my imagination with semantic tools). The major differences, I imagine are:</p>

<ul>
<li>support for SPARQL-based scopes and finder methods</li>
Expand All @@ -23,34 +23,17 @@ <h1 id="gearbox">Gearbox</h1>
<li>object factory from triples</li>
</ul>

<h1 id="development-workflow-for-semantic-data">Development Workflow for Semantic Data</h1>

<p>For a typical domain, I test-drive some models to define the nature of the data. These are custom-built to support the user scenarios and behavior an application is built to serve. For semantic models, I&rsquo;m going for something different. The relationships between resources are much more dynamic. It&rsquo;s tough to build an application on dynamic domain models. </p>

<p>To work with this, I&rsquo;d rather optimize for different things. Instead of trying to canonize the data, concretize the interaction. Given a head/buffer/file full of data, what&rsquo;s the easiest way to save it? Which values have to be recorded? Which values can be inferred or classified offline? </p>

<p>I think that there&rsquo;s going to be an evolution of the data graph. I&rsquo;m looking to start with the mundane, and hope to be able to create a broad view from the details collected. Possibly, I will have alternative broad views, such as a full view of the topic or a chronological account of the topic as it transpires. From here, I want to look for insightful information: inferences that show the nature of our subject. </p>

<p>That&rsquo;s the goal.</p>

<p>Practically, the work changes over time. We start with defining the attributes and associations that are needed. This is the mundane. We then start to qualify the data by writing validations. Probably, there will be overlapping models to reflect the interaction. For example, browsing email might have a cursory recording of the people involved. Looking up Twitter information might have a more-specific user model. Facebook, the same. We&rsquo;re building a summary of the users, or whatever we&rsquo;re studying. From here, we might be able to use various analytical methods to classify and enhance the models, to clarify the story we are able to gather.</p>

<p>As you can see, this is a very different process than something we might do with typical relational data. I don&rsquo;t think there is much that we do with semantic data that can&rsquo;t be done with relational data, it&rsquo;s just that the technologies are optimized for different purposes.</p>

<h1 id="practical-example">Practical Example</h1>

<p>It would be good to offer a practical example. I&rsquo;ll get to these in a bit. I don&rsquo;t need to have the documentation get ahead of the released code.</p>
<p>It would be good to offer a practical example. I&rsquo;ll get to these in a bit. Probably these will start appearing around version 0.2 or 0.3. They probably won&rsquo;t be very exciting until around version 0.4.</p>

<h1 id="todo">TODO</h1>
<h1 id="road-ahead">Road Ahead</h1>

<ul>
<li>bring in the association and combination code</li>
<li>implement mutability</li>
<li>implement persistence</li>
<li>implement queryable</li>
<li>implement named scopes</li>
<li>implement some sort of repository access</li>
<li>use Gearbox in the examples (were written to concretize an example in my head)</li>
<li>bring in the associations and type system (related, believe it or not)</li>
<li>implement mutability (version 0.2)</li>
<li>workflows with SPIN and full-text indexing (version 0.3)</li>
<li>exploration utilities (version 0.4)</li>
</ul>

<h1 id="contributing-to-gearbox">Contributing to Gearbox</h1>
Expand Down
32 changes: 9 additions & 23 deletions lib/gearbox.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
require 'linkeddata'
require 'ostruct'
require 'active_model'
require 'open-uri'

module Gearbox

Expand Down Expand Up @@ -107,6 +108,11 @@ def type_alias(new, original)
end
module_function :type_alias

# ==========
# = Errors =
# ==========
class NotImplemented < StandardError; end

# ========================
# = Helper Utility: path =
# ========================
Expand All @@ -122,39 +128,19 @@ def path(path)
# =======================
require path('type')
require path('types')

autoload :Attribute, path('attribute')
autoload :AttributeCollection, path('attribute_collection')
autoload :RDFCollection, path('rdf_collection')
autoload :Repository, path('repository')

autoload :AdHocProperties, path('mixins/ad_hoc_properties')
autoload :ActiveModelImplementation, path('mixins/active_model_implementation')
autoload :AttributeMethods, path('mixins/attribute_methods')
autoload :DomainQueryBuilder, path('mixins/domain_query_builder')
autoload :QueryableImplementation, path('mixins/queryable_implementation')
autoload :Resource, path('mixins/resource')
autoload :SemanticAccessors, path('mixins/semantic_accessors')
autoload :SubjectMethods, path('mixins/subject_methods')

end

module Example
# ========================
# = Helper Utility: path =
# ========================
# @private
def path(path)
File.expand_path("../examples/#{path}", __FILE__)
end
private :path
module_function :path

# ============
# = Examples =
# ============

# Will separate these after things get to a solid 0.1 state.
autoload :Audience, path('audience')
autoload :Person, path('person')
autoload :Reference, path('reference')
autoload :Theme, path('theme')

end
9 changes: 9 additions & 0 deletions lib/gearbox/mixins/domain_query_builder.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
module Gearbox

# Understands how to construct CRUD queries from domain logic.
# I.e., the attributes, attribute options, associations and context taught
# in the domain logic are a necessary and complete description of the triples
# as they should be handled in the Repository.
module DomainQueryBuilder
end
end
3 changes: 2 additions & 1 deletion lib/gearbox/mixins/resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ def self.included(base)
base.send :include, SubjectMethods
base.send :include, SemanticAccessors
base.send :include, QueryableImplementation
base.send :include, RDF::Mutable
base.send :include, DomainQueryBuilder

end

def inspect
Expand Down
85 changes: 85 additions & 0 deletions lib/gearbox/repository.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
module Gearbox
class Repository < ::SPARQL::Client::Repository

MAX_TRIES = 10

attr_reader :data_uri, :update_uri, :status_uri, :size_uri

def initialize(endpoint="http://localhost:8000", options = {})
super
assert_uris!
end

def each(&block)
raise NotImplemented, "each is not yet implemented in Gearbox::Adapter"
end

# def insert_statement
# raise NotImplemented, "insert_statement is not yet implemented in Gearbox::Adapter"
# end

# def delete_statement
# raise NotImplemented, "delete_statement is not yet implemented in Gearbox::Adapter"
# end

# def load(filename, options = {})
# return super(filename, options) if /^https?:\/\//.match(filename)
#
# uri = nil
#
# if options[:context]
# uri = @dataURI + options[:context]
# else
# uri = @dataURI + 'file://' + File.expand_path(filename)
# end
#
# uri = URI.parse(uri)
# content = open(filename).read
# begin
# req = Net::HTTP::Put.new(uri.path)
# Net::HTTP.start(uri.host, uri.port) do |http|
# http.request(req, content)
# end
# rescue Errno::ECONNREFUSED, Errno::ECONNRESET, TimeoutError
# retry
# end
# end

# alias_method :load!, :load

attr_writer :load_handler
def load_handler
@load_handler ||= lambda do |*args|
filename = args.shift
options = args.shift
options ||= {}
uri = options[:context] ? File.join(data_uri, options[:context]) : File.join(data_uri, "file://#{File.expand_path(filename)}")
content = open(filename).read
begin
request = Net::HTTP::Put.new(uri.path)
Net::HTTP.start(uri.host, uri.port) do |http|
http.request(request, content)
end
rescue Errno::ECONNRESET, Errno::ECONNREFUSED, TimeoutError
retries ||= 0
retries += 1
retries <= MAX_TRIES ? retry : raise
end
end
end

def load(*args)
load_handler.call(*args)
end
alias :load! :load

private
def assert_uris!
uri = self.client.url.to_s
@data_uri = File.join(uri, 'data', '/')
@update_uri = File.join(uri, 'update', '/')
@status_uri = File.join(uri, 'status', '/')
@size_uri = File.join(uri, 'size', '/')
end
end
end
79 changes: 79 additions & 0 deletions spec/gearbox/mixins/domain_query_builder_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
require_relative '../../spec_helper'

include Gearbox

describe DomainQueryBuilder do

before do
@class = Class.new do
include DomainQueryBuilder
end

@omni_model = Class.new do
include Resource

attribute :name, :predicate => RDF::FOAF.name

end
end

# let(:base_uri) { "http://example.com" }
let(:omni) { @omni_model.new }
subject { @class.new }

it "just works" do
omni
end

end

=begin
This is the good stuff. At this point, I start reading and writing data from the repository.
I have been relying on 4Store locally to offer an endpoint. The concept is that this is standard-
enough to be a good pattern for other SPARQL-endpoint-aware repositories. This is a departure from
a lot of the RDF.rb libraries, where they are storing triples in various data stores (mongo, postgres,
sqlite, redis, cassandra, couchdb, and probably others) and using query patterns to extract these
triples. Here, we are allowing the SPARQL end points to understand their own architecture and deliverying
our data as we need it. This is partly due to having a near-standardized SPARQL Update 1.1 which defines
how the basic CRUD can be accomplished in these endpoints. That, and I need the SPIN to maintain the graph,
rather than just assume the model got it right the first time. That, and SPARQL is more expressive than the
query patterns as currently constituted in the RDF.rb world. That, and I want a knowledge base that is
implementation agnostic once the data is stored in it. I should be able to write an application in any
language I'd like and only have to rely on SPARQL to get the data in and out of the repository.
OK, that needed to be said. There is quite a bit of big-picture assumptions that I've been gleaning from
the source codes.
Meanwhile, we have a task before us today: start getting data in and out of a repository. Tasks:
* Read the query pattern codes, to make sure I'm not full of shit
* Figure out how to mock this environment, I don't want tests to rely on a repository running.
* Figure out the raw SPARQL for a single-model query
* SPARQL for all instances of a model
* SPARQL for a filtered list of instances
* " + ordered
* SPARQL to insert a model
* SPARQL to insert a set of models
* SPARQL to update a model
* SPARQL to update a set of models
* SPARQL to delete a model
* SPARQL to delete a set of models
At this point, it would probably be good to build a knowledge base regarding everything related to
database design, transactions, principles of data integrity, and similar concepts. There will be more
features once I understand some of these design principles better. Plus, building knowledge bases will
tease out some of these same concerns empirically.
RDF::Query
==========
* limited to select, ask, construct and describe
* I didn't see support for union, and there may be other gaps
* used to create SPARQL with a syntax like query.select.where(:variable => value).order([:v1, :v2])
* would have to be extended to be complete, even for today's work
* probably an unnecessary nicety in the short-run, maybe generally
=end
4 changes: 2 additions & 2 deletions spec/gearbox/mixins/resource_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
@class.included_modules.must_include Gearbox::ActiveModelImplementation
end

it "uses RDF::Mutable" do
@class.included_modules.must_include RDF::Mutable
it "uses DomainQueryBuilder" do
@class.included_modules.must_include Gearbox::DomainQueryBuilder
end

describe "Load order" do
Expand Down
57 changes: 57 additions & 0 deletions spec/gearbox/repository_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
require_relative '../spec_helper'

include Gearbox

describe Repository do

subject { Repository.new }

it "subclasses a SPARQL::Client::Repository" do
Gearbox::Repository.ancestors.must_include ::SPARQL::Client::Repository
end

it "defaults the endpoint to http://localhost:8000" do
subject.client.url.to_s.must_equal 'http://localhost:8000'
end

it "creates the data_uri as uri/data/" do
uri = subject.client.url.to_s
subject.data_uri.must_equal "#{uri}/data/"
end

it "creates the update_uri as uri/update/" do
uri = subject.client.url.to_s
subject.update_uri.must_equal "#{uri}/update/"
end

it "creates the status_uri as uri/status/" do
uri = subject.client.url.to_s
subject.status_uri.must_equal "#{uri}/status/"
end

it "creates the size_uri as uri/size/" do
uri = subject.client.url.to_s
subject.size_uri.must_equal "#{uri}/size/"
end

it "implements each" do
subject.respond_to?(:each).must_equal true
end

it "implements insert_statement" do
subject.respond_to?(:insert_statement).must_equal true
end

it "implements delete_statement" do
subject.respond_to?(:delete_statement).must_equal true
end

it "uses a load_handler to abstract the load (testing and async would need a different load)" do
subject.load_handler = lambda{:loaded}
subject.load!.must_equal :loaded
end

# load, load_data, update, update_data, delete, insert... (possibly BGP can assemble these)
# select, ask, describe, construct (possibly SPARQL client can do all of this)
# has_statement?, dump_statement, has_triple? has_quad?
end
8 changes: 8 additions & 0 deletions spec/gearbox_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,13 @@
it "depends on ActiveModel" do
defined?(ActiveModel).must_equal('constant')
end

it "depends on open-uri" do
defined?(OpenURI).must_equal('constant')
end

it "defined NotImplemented for stubbing an interface" do
defined?(::Gearbox::NotImplemented).must_equal 'constant'
end
end

0 comments on commit df64393

Please sign in to comment.