jkraemer / rdig
- Source
- Commits
- Network (0)
- Issues (0)
- Downloads (9)
- Wiki (1)
- Graphs
-
Branch:
master
rdig /
| name | age | message | |
|---|---|---|---|
| |
.gitignore | ||
| |
.svnignore | ||
| |
CHANGES | ||
| |
History.txt | ||
| |
LICENSE | ||
| |
Manifest.txt | ||
| |
README | ||
| |
TODO | ||
| |
bin/ | ||
| |
doc/ | ||
| |
install.rb | ||
| |
lib/ | ||
| |
rakefile | ||
| |
rdig.gemspec | ||
| |
test/ |
README
= RDig
RDig provides an HTTP crawler and content extraction utilities
to help building a site search for web sites or intranets. Internally,
Ferret is used for the full text indexing. After creating a config file
for your site, the index can be built with a single call to rdig.
RDig depends on Ferret (>= 0.10.0) and, for parsing HTML, on either
Hpricot (>= 0.4) or the RubyfulSoup library (>= 1.0.4). As I know no way
to specify such an OR dependency in a gem specification, the gem depends
on Hpricot. If this is a problem for you, install the gem with --force and
manually do a +gem install rubyful_soup+.
== basic usage
=== Index creation
- create a config file based on the template in doc/examples
- to create an index:
rdig -c CONFIGFILE
- to run a query against the index (just to try it out)
rdig -c CONFIGFILE -q 'your query'
this will dump the first 10 search results to STDOUT
=== Handle search in your application:
require 'rdig'
require 'rdig_config' # load your config file here
search_results = RDig.searcher.search(query, options={})
see RDig::Search::Searcher for more information.
== usage in rails
- add to config/environment.rb :
require 'rdig'
require 'rdig_config'
- place rdig_config.rb into config/ directory.
- build index:
rdig -c config/rdig_config.rb
- in your controller that handles the search form:
search_results = RDig.searcher.search(params[:query])
@results = search_results[:list]
@hitcount = search_results[:hitcount]
=== search result paging
Use the :first_doc and :num_docs options to implement
paging through search results.
(:num_docs is 10 by default, so without using these options only the first 10
results will be retrieved)
== sample configuration
from doc/examples/config.rb. The tag_selector properties are called
with a BeautifulSoup instance as parameter. See the RubyfulSoup
Site[http://www.crummy.com/software/RubyfulSoup/documentation.html] for more info about this cool lib.
You can also have a look at the +html_content_extractor+ unit test.
:include:doc/examples/config.rb

