forked from sunspot/sunspot
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
http://outoftime.lighthouseapp.com/projects/20339/tickets/98/a/462987/solr_cell_attachment_indexing_v2.patch http://outoftime.lighthouseapp.com/projects/20339/tickets/98
- Loading branch information
Showing
17 changed files
with
434 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
module Sunspot | ||
class RichDocument < RSolr::Message::Document | ||
include Enumerable | ||
|
||
def contains_attachment? | ||
@fields.each do |field| | ||
if field.name.to_s.include?("_attachment") | ||
return true | ||
end | ||
end | ||
return false | ||
end | ||
|
||
def add(connection) | ||
params = { | ||
:wt => :ruby, | ||
'idx.attr' => false, # don't index any attributes, unless explicitly mapped | ||
'ignore.und.fl' => true, # ignore all undefined fields | ||
'map.title' => 'title_text', | ||
} | ||
|
||
@fields.each do |f| | ||
puts f.name.to_s + " " + f.value.to_s | ||
|
||
if f.name.to_s.include?("_attachment") | ||
params["resource.name"] = f.value # TIKA-154 workaround | ||
params["stream.file"] = f.value | ||
params['def.fl'] = f.name, # all text extracted goes to text_t (since it is a stored field, for highlighting) | ||
params['fmap.content'] = f.name | ||
else | ||
param_name = "literal.#{f.name.to_s}" | ||
params[param_name] = [] unless params.has_key?(param_name) | ||
params[param_name] << f.value | ||
end | ||
|
||
# if f.boost | ||
# params["boost.#{f.name.to_s}"] = f.boost | ||
# end | ||
end | ||
|
||
solr_message = params | ||
pp connection.send('update/extract', solr_message) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
require File.join(File.dirname(__FILE__), 'spec_helper') | ||
require 'pp' | ||
|
||
describe 'attachment keyword highlighting' do | ||
before :all do | ||
test_docs = File.expand_path(File.join(File.dirname(File.dirname(__FILE__)), 'test_docs')) | ||
@posts = [] | ||
@posts << RichTextPost.new(:rich_attachment => File.join(test_docs, 'TestPDF.pdf')) | ||
@posts << RichTextPost.new(:rich_attachment => File.join(test_docs, 'JustAnotherTest.pdf'), :title => "This is the title") | ||
Sunspot.index!(*@posts) | ||
@search_result = Sunspot.search(RichTextPost) { keywords 'lorem', :highlight => true } | ||
end | ||
|
||
it 'should include highlights in the results' do | ||
@search_result.hits.first.highlights.length.should == 1 | ||
end | ||
|
||
it 'should return formatted highlight fragments' do | ||
@search_result.hits.first.highlights(:rich_attachment).should_not be_empty | ||
@search_result.hits.first.highlights(:rich_attachment).first.format.should == "This is a test \nPDF file. <em>Lorem</em> ipsum dolor sit amet, consectetur adipiscing elit" | ||
end | ||
|
||
it 'should be empty for non-keyword searches' do | ||
search_result = Sunspot.search(RichTextPost){ with :title, "This is the title" } | ||
search_result.hits.first.highlights.should be_empty | ||
end | ||
|
||
it 'should return multple hits for multiple occurances' do | ||
pp @search_result.hits.first.highlights(:rich_attachment) | ||
@search_result.hits.first.highlights(:rich_attachment).length.should > 1 | ||
end | ||
end |
Oops, something went wrong.