Add support for custom search document models. #92

Closed
wants to merge 12 commits into
from

Conversation

3 participants
Contributor

inertialbit commented Feb 27, 2013

MultisearchableModels can now optionally declare one or more document
models. Useful e.g. maintaining private and public global search indexes.

Added:

  • document model generator w/ optional tsvector column migrations

Updated:

  • PgSearch.multisearch('DocClass', query)
    PgSearch.multisearch(query) still delegates to PgSearch::Document.search
  • MultisearchableModel.multisearchable('DocClass' => opts)
    MultisearchableModel.multisearchable({:against => :col}) still backed by
    PgSearch::Document
  • PgSearch::Multisearch.rebuild(MultisearchableModel, clean_up, DocumentModel)
    PgSearch::Multisearch.rebuild(MultisearchableModel) rebuilds all document models
  • PgSearch::Rebuilder.new(MultisearchableModel, time_source, DocumentModel)
    PgSearch::Rebuilder.new(MultisearchableModel) rebuilds all document
    models
Contributor

inertialbit commented Feb 27, 2013

Thoughts? This change should not break backward compatibility.

There are a couple of wanky test workarounds I want to point out (hopefully there's a better way).
https://github.com/Casecommons/pg_search/pull/92/files#L10R32
https://github.com/Casecommons/pg_search/pull/92/files#L10R60

Oh and the 1.8 test failures are expected since they're running against i18n 0.6.2 - not sure if I should update the Gemfile here to force 0.6.1?

Collaborator

nertzy commented Feb 27, 2013

Could you make a separate pull request real quick for just the i18n change? I'll accept it right away and then that should get everything cleaned up.

Contributor

inertialbit commented Feb 27, 2013

Ok sounds good. Submitted #93

Contributor

inertialbit commented Feb 28, 2013

Ok got the i18n change merged into this. Thanks!

One more thing I want to point out that I missed earlier. The rake task for rebuilding documents used the connection from PgSearch::Document, which may not be present given this pr. I updated it to use the connection of the model passed via args but can see cases where this may not work, though they may be edge-cases. Do you think this will be ok or should I update the rake task args to require the document model (and default to PgSearch::Document) or ...?

Collaborator

nertzy commented Feb 28, 2013

Hey, this will take me a while to process. I'm wondering, is there any way you could split this one commit into smaller ones?

I generally like to try to keep it to one new test and all the corresponding code changes in one small commit. It makes it a lot easier to go commit-by-commit and make sure that the test is correctly failing, then passing with the code change. Now with all the shared examples and everything it's a bit overwhelming.

Also, there are some directions I've been wanting to go, and I think we can use your work as a jumping off point to achieve them. (These don't need to be achieved in this pull request; but they are future plans I'm keeping in mind while reviewing it)

  1. I want to slowly get rid of PgSearch::Document. I think it's surprising to most people to have an Active Record model that is provided purely by a gem. I'd rather have a module that provides all the behavior and a generator to jump-start building a class named by the user. You've gotten very close to this ideal already. The harder part for this would be to figure out a good deprecation plan. I'm thinking maybe for the entire 0.6.x series to deprecate PgSearch::Document in some way, and then 0.7.x removes it wholesale.
  2. After the change above, I want to move away from things like PgSearch.multisearch and instead prefer calling search scopes directly on these new classes. So if you generate a SearchDocument multisearch model, you would call something like SearchDocument.search("foo") and a simple good default pg_search_scope would be generated right into your model. That better represents how it all works anyway. Essentially I don't want to have a "default" multisearch model like PgSearch::Document anymore.
  3. I want to move away from shared examples in RSpec. They can get very confusing. I also hope to slowly get more unit coverage of individual classes, and slim down the current integration coverage a bit. pg_search_spec has grown too large.

brandon-beacher and others added some commits Feb 12, 2013

Allow jruby failures on Travis CI for now.
There's some issue with the SQL generated by the JDBC adapter.
use i18n < 0.6.2 or > 0.6.2
Due to svenfuchs/i18n#192
i18n 0.6.2 breaks ruby 1.8 compat
alter .pg_search_multisearchable_options
nest :against and friends under model name that
they should apply to e.g.
'PgSearch::Document' => {:against => :col}
verify multisearchable w/ multiple document models
duplicate and update tests against
PgSearch::Document to confirm identical behavior
w/ different model
intro PgSearch.multisearch('MySearchDocument', query)
PgSearch.multisearch(query) still delegates to
PgSearch::Document.
verify Multisearch::Rebuilder w/ multiple document models
Update and duplicate tests to confirm same behavior
whether PgSearch::Document or CustomDocument.
Heavily refactored to use shared examples.
update PgSearch::Multisearch.rebuild
Add logic to utilize new document_model arg.
add multisearch_model generator
Generate custom document models identical to
PgSearch::Document.
use connection from MultisearchableModel
PgSearch::Document may not be present.
Contributor

inertialbit commented Feb 28, 2013

Ok, just split that commit into several. Does that help? There are a couple of commits that just add/update tests to verify things work with a e.g. CustomSearchDocument - really they test the changes from 57eeec6.

Like the idea to generate and use SearchDocs directly. That deprecation plan sounds good to me.

I should have reached out before starting this. Moving PgSearch::Document behavior into a module would have made life a little easier.

I'll see about un-sharing those rspec examples as soon as I can.

Collaborator

nertzy commented Feb 28, 2013

Thanks so much!

No worries about reaching out. I'm very glad that you spent the time to figure this all out. It's great to get outside eyes on the code.

I'm hoping to get this reviewed soon. I'm going to leave comments on the individual commits as I go through them...

nertzy commented on 57eeec6 Mar 3, 2013

I'd like to skip this commit. Let's keep PgSearch.pg_search_multisearchable_options around for backward-compatibility for now, but modify your generator to put a default pg_search_scope directly on the generated subclass. That way, we can keep the configuration in the call to pg_search_scope where it belongs.

Never mind, I think I was forgetting why this variable is there. The idea is that we store the searchable-class-specific options in this variable. I'd like to keep it pretty low-impact on the actual mixin, so that we don't add unnecessary methods to already bloated ActiveRecord::Base subclasses. Instead, we could have code at the point that reads the configuration and it could pre-process the user's input. Essentially when we ready it in the default PgSearch::Document class, we will make a special case to handle when it's the old format, and for newly generated classes, we will just use the correct format by default.

Maybe the way forward is to make a new method with a first argument which is the class you want it to be multisearchable within. So for example, we would ask users to change this:

class BlogPost < ActiveRecord::Base
  include PgSearch
  multisearchable :against => [:title, :content]
end

to this:

class BlogPost < ActiveRecord::Base
  include PgSearch
  searchable_as "PgSearch::Document", :against => [:title, :content]
end

And you could call searchable_as multiple times to register with multiple document classes. And .pg_search_multisearchable_options could become a Hash keyed by the document class name, and all we'd have to do is have .multisearchable call .searchable_as with a hard-coded first parameter of PgSearch::Document and print out a deprecation warning.

Owner

inertialbit replied Mar 5, 2013

That makes sense. I really like your .searchable_as idea/syntax.

I think it would also be nice to support a searchable-model-wide :against option to help reduce duplication in cases where a searchable model should have the same attributes indexed across all search documents.

Maybe this a) would be better done as a future improvement rather than rolling it into this change set; b) doesn't provide enough benefit to be worth implementing; c) increases potential for mis-reading what attributes are searchable? I'm also not sure what a nice interface would be.

The best I've come up w/ so far is to have either 2 class methods available for searchable models or to pass in a big hash of options to a single method, closer to the way it is now. Of course this would increase option parsing complexity.

2 class methods

class BlogPost < ActiveRecord::Base
  include PgSearch
  searchable_against :col_1, :attr_a # applies to Public & Private below
  searchable_as "PublicDocument", :if => :public?
  searchable_as "PrivateDocument", :unless => :public?
  searchable_as "GeneralDocument", :against => [:col_1, :col_2], :if => :y?

big hash

class BlogPost < ActiveRecord::Base
  include PgSearch
  searchable_as({
    :against => [:col_1, :attr_a], # applies to Public & Private below
    "PublicDocument" => {:if => :public?},
    "PrivateDocument" => {:unless => :public?},
    "GeneralDocument" => {:against => [:col_1, :col_2], :if => :blah?}
  })

I'd like to move all the common logic for a document class like PgSearch::Document into a module. That way, instead of needing to write all this in the test, we can simply include the module in the model block.

And we can have your generator just put an include line directly into the generated model.

The natural things to put in the module are the belongs_to :searchable polymorphic association, the pg_search_scope :search, and the #update_content private method. And instead of reading PgSearch.multisearch_options it should just generate a generic one. Something like:

pg_search_scope :search, :against => :content

nertzy commented on c981996 Mar 3, 2013

I'd like to skip this commit as well. We can back up from the centralized design instead of building more onto it.

Collaborator

nertzy commented May 8, 2013

Closing this for now. I like the spirit of it, so if you want to re-submit smaller pull requests for each change, I could probably pull them more easily.

If I get around to doing some of this work according to my previous plans, I'll refer to this code. If I use any of your ideas, I will give you credit in the changelog.

Thanks!

@nertzy nertzy closed this May 8, 2013

Contributor

inertialbit commented Jul 6, 2013

@nertzy Sorry to drop off like that; time became scarcer than usual. I'm looking at picking it back up in the next couple of weeks. Wanted to check-in first. Have you had a chance to begin this work yourself? Any change to what you'd like to see? Thanks!

Collaborator

nertzy commented Jul 7, 2013

@inertialbit That's great news! I haven't had too much of a chance to dig into this, in fact in the last couple months I've been scarcer than I've wanted to be as well.

I think the goal is roughly the same. Anything where we can eventually get rid of the PgSearch::Document model being checked into the gem source would be great. That's a way off so we'll work toward it together.

I think it would be easiest if you can do things step-by-step, working towards the goal, and do several small pull requests along the way. I'll do my best to get the smaller pull requests in right away so that you don't end up on a long-running feature branch again.

One note, I just cut version 0.7.0 which drops Ruby 1.8.7 support, so for work on master, don't worry about backwards compatibility. If you're still on 1.8.x (which was just end of lifed, by the way) then you might want to work off of the 0.6-stable branch, and then I can merge any relevant changes into master as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment