Permalink
Browse files

Initial upload.

  • Loading branch information...
0 parents commit 81d7f89ba0a0aaf1e368a25b96ab31a90b2cbcfe @ulbrich ulbrich committed Jul 8, 2009
Showing with 903 additions and 0 deletions.
  1. +3 −0 .gitignore
  2. +171 −0 README.rdoc
  3. +82 −0 Rakefile
  4. +17 −0 couchsphinx.gemspec
  5. +27 −0 couchsphinx.rb
  6. +216 −0 lib/indexer.rb
  7. +255 −0 lib/mixins/indexer.rb
  8. +76 −0 lib/mixins/properties.rb
  9. +56 −0 lib/multi_attribute.rb
@@ -0,0 +1,3 @@
+doc
+pkg
+tmp
@@ -0,0 +1,171 @@
+= CouchSphinx
+
+The CouchSphinx library implements an interface between CouchDB and Sphinx
+supporting CouchRest to automatically index objects in Sphinx. It tries to
+act as transparent as possible: Just an additional method in the CouchRest
+domain specific language and some Sphinx configuration are needed to get
+going.
+
+== Prerequisites
+
+CouchSphinx needs gems CouchRest and Riddle as well as a running Sphinx
+and a CouchDB installation.
+
+ sudo gem sources -a http://gems.github.com # Only needed once!
+ sudo gem install riddle
+ sudo gem install couchrest
+ sudo gem install ulbrich-couchsphinx
+
+No additional configuraton is needed for interfacing with CouchDB: Setup is
+done when CouchRest is able to talk to the CouchDB server.
+
+A proper "sphinx.conf" file and a script for retrieving index data have to
+be provided for interfacing with Sphinx: Sorry, no UltraSphinx like
+magic... :-) Depending on the amount of data, more than one index may be used
+and indexes may be consolidated from time to time.
+
+This is a sample configuration for a single "main" index:
+
+ searchd {
+ address = 0.0.0.0
+ port = 3312
+
+ log = ./sphinx/searchd.log
+ query_log = ./sphinx/query.log
+ pid_file = ./sphinx/searchd.pid
+ }
+
+ source couchblog {
+ type = xmlpipe2
+
+ xmlpipe_command = ./sphinxsource.rb
+ }
+
+ index couchblog {
+ source = couchblog
+
+ charset_type = utf-8
+ path = ./sphinx/sphinx_index_main
+ }
+
+The script "sphinxsource.rb" providing the data to index may vary
+depending on the number of CouchDB instances it talks to. This is a simple
+script interfacing with one single instance:
+
+ #!/usr/bin/env ruby
+
+ require 'rubygems'
+ require 'lib/models' # Depends on location of model files
+
+ data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
+ rows = data['rows'] rescue []
+
+ puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
+
+== Models
+
+Use method <tt>fulltext_index</tt> to enable indexing of a model. The
+default is to index all attributes but it is recommended to provide a list of
+attribute keys.
+
+A side effect of calling this method is, that CouchSphinx overrides the
+default of letting CouchDB create new IDs: Sphinx only allows numeric IDs and
+CouchSphinx forces new objects with the name of the class, a hyphen and an
+integer as ID (e.g. <tt>Post-38497238</tt>). Again: Only these objects are
+indexed due to internal restrictions of Sphinx.
+
+Sample:
+
+ class Post < CouchRest::ExtendedDocument
+ use_database SERVER.default_database
+
+ property :title
+ property :body
+
+ fulltext_index :title, :body
+ end
+
+Add options <tt>:server</tt> and <tt>:port</tt> to <tt>fulltext_index</tt> if
+the Sphinx server to query is running on a different server (defaults to
+"localhost" with port 3312).
+
+If you are sure your Sphinx is compiled with 64-bit support, you may add
+option <tt>:idsize</tt> with value <tt>64</tt> to generate 64-bit IDs for
+CouchDB (defaults to 32-bits).
+
+Here is a full-featured sample setting additional options:
+
+ fulltext_index :title, :body, :server => 'my.other.server', :port => 3313,
+ :idsize => 64
+
+== Indexing
+
+CouchSphinx also adds a new design document to CouchDB: It needs to collect
+all relevant objects for running the Sphinx indexer and adds its own views
+to do so. Have a look at CouchDB design document "CouchSphinxIndex" for
+details.
+
+Automatically starting the reindexing of objects the moment new objects are
+created can be implemented by adding a save_callback to the model class:
+
+ save_callback :after do |object|
+ `sudo indexer --all --rotate` # Configure sudo to allow this call...
+ end
+
+This or a similar callback should be added to all models needing instant
+indexing. If indexing is not that crucial or load is high, some additional
+checks for the time of the last call should be added.
+
+== Queries
+
+An additional instance method <tt>by_fulltext_index</tt> is added for each
+fulltext indexed model. This method takes a Sphinx query like
+"foo @title bar", runs it within the context of the current class and returns
+an Array of matching CouchDB documents. Use
+<tt>CouchRest::ExtendedDocument.by_fulltext_index</tt> if you want to find
+any document matching the query and not only a certain class.
+
+Samples:
+
+ Post.by_fulltext_index('first')
+ => [...]
+
+ post = Post.by_fulltext_index('this is @title post').first
+ post.title
+ => "First Post"
+ post.class
+ => Post
+
+Additional options <tt>:match_mode</tt>, <tt>:limit</tt> and
+<tt>:max_matches</tt> can be provided to customize the behaviour of Riddle.
+Option <tt>:raw</tt> can be set to <tt>true</tt> to do no lookup of the
+document IDs but return the raw IDs instead.
+
+Sample:
+
+ Post.by_fulltext_index('my post', :limit => 100)
+
+== Copyright & License
+
+Copyright (c) 2009 Holtzbrinck Digital GmbH, Jan Ulbrich
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,82 @@
+# CouchSphinx, a full text indexing extension for CouchDB/CouchRest.
+
+require 'rubygems'
+require 'rake/gempackagetask'
+
+require 'find'
+
+spec = Gem::Specification.new do |spec|
+ files = FileList['README.rdoc', 'couchsphinx.rb', 'tests/*.rb'].to_a
+
+ Find.find('lib') { |path|
+ files << path if not File.stat(path).directory? }
+
+ spec.platform = Gem::Platform::RUBY
+ spec.name = 'couchsphinx'
+ spec.homepage = 'http://github.com/ulbrich/couchsphinx'
+ spec.version = '0.2'
+ spec.author = 'Jan Ulbrich'
+ spec.email = 'jan.ulbrich @nospam@ holtzbrinck.com'
+ spec.summary = 'A full text indexing extension for CouchDB/CouchRest.'
+ spec.files = files
+ spec.require_path = '.'
+ spec.test_files = Dir.glob('tests/*.rb')
+ spec.has_rdoc = true
+ spec.executables = nil
+ spec.extra_rdoc_files = ['README.rdoc']
+ spec.rdoc_options << '--exclude' << 'pkg' << '--exclude' << 'tmp' <<
+ '--all' << '--title' << 'CouchSphinx' << '--main' << 'README.rdoc'
+end
+
+Rake::GemPackageTask.new(spec) do |pkg|
+ pkg.need_tar = true
+end
+
+task :default => "pkg/#{spec.name}-#{spec.version}.gem" do
+ puts 'Generated latest version.'
+end
+
+desc 'Remove directories "pkg" and "doc"'
+task :clean do
+ puts 'Remove directories "pkg" and "doc".'
+ `rm -rf pkg doc`
+end
+
+desc 'Create rdoc documentation from the code'
+task :doc do
+ `rm -rf doc`
+
+ puts 'Create rdoc documentation from the code'
+ puts `(rdoc --exclude pkg --exclude tmp \
+ --all --title "CouchSphinx" README.rdoc lib couchsphinx.rb) 1>&2`
+end
+
+desc 'Update the couchsphinx.gemspec file with new snapshot of files to bundle'
+task :gemspecs do
+ puts 'Update the couchsphinx.gemspec file with new snapshot of files to bundle.'
+
+ # !!Warning: We can't use spec.to_ruby as this generates executable code
+ # which would break Github gem generation...
+
+ template = <<EOF
+# CouchSphinx, a full text indexing extension for CouchDB/CouchRest.
+
+Gem::Specification.new do |spec|
+ spec.platform = #{spec.platform.inspect}
+ spec.name = #{spec.name.inspect}
+ spec.homepage = #{spec.homepage.inspect}
+ spec.version = "#{spec.version}"
+ spec.author = #{spec.author.inspect}
+ spec.email = #{spec.email.inspect}
+ spec.summary = #{spec.summary.inspect}
+ spec.files = #{spec.files.inspect}
+ spec.require_path = #{spec.require_path.inspect}
+ spec.has_rdoc = #{spec.has_rdoc}
+ spec.executables = #{spec.executables.inspect}
+ spec.extra_rdoc_files = #{spec.extra_rdoc_files.inspect}
+ spec.rdoc_options = #{spec.rdoc_options.inspect}
+end
+EOF
+
+ File.open('couchsphinx.gemspec', 'w').write(template)
+end
@@ -0,0 +1,17 @@
+# CouchSphinx, a full text indexing extension for CouchDB/CouchRest.
+
+Gem::Specification.new do |spec|
+ spec.platform = "ruby"
+ spec.name = "couchsphinx"
+ spec.homepage = "http://github.com/ulbrich/couchsphinx"
+ spec.version = "0.2"
+ spec.author = "Jan Ulbrich"
+ spec.email = "jan.ulbrich @nospam@ holtzbrinck.com"
+ spec.summary = "A full text indexing extension for CouchDB/CouchRest."
+ spec.files = ["README.rdoc", "couchsphinx.rb", "lib/multi_attribute.rb", "lib/mixins/properties.rb", "lib/mixins/indexer.rb", "lib/indexer.rb"]
+ spec.require_path = "."
+ spec.has_rdoc = true
+ spec.executables = []
+ spec.extra_rdoc_files = ["README.rdoc"]
+ spec.rdoc_options = ["--exclude", "pkg", "--exclude", "tmp", "--all", "--title", "CouchSphinx", "--main", "README.rdoc"]
+end
@@ -0,0 +1,27 @@
+# CouchSphinx, a full text indexing extension for CouchDB/CouchRest.
+#
+# This file contains the includes implementing this library. Have a look at
+# the README.rdoc as a starting point.
+
+require 'rubygems'
+
+require 'couchrest'
+require 'riddle'
+
+# Version number to use for updating CouchDB design document CouchSphinxIndex
+# if needed.
+
+module CouchSphinx
+ if (match = __FILE__.match(/couchsphinx-([0-9.-]*)/))
+ VERSION = match[1]
+ else
+ VERSION = 'unknown'
+ end
+end
+
+# Require the stuff implementing this library...
+
+require 'lib/multi_attribute'
+require 'lib/indexer'
+require 'lib/mixins/indexer'
+require 'lib/mixins/properties'
Oops, something went wrong.

0 comments on commit 81d7f89

Please sign in to comment.