Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Posts API #306

Closed
ErisDS opened this issue Jul 22, 2013 · 18 comments
Closed

Search Posts API #306

ErisDS opened this issue Jul 22, 2013 · 18 comments

Comments

@ErisDS
Copy link
Member

ErisDS commented Jul 22, 2013

Our API for browse posts should accept a query option which will perform a full text search on the post model. This should use the title, content, and possibly meta_* fields of a post.

At the moment, posts are self contained, but in the near future we will be adding additional tables such as tags/categories which will also need to be searchable.

Search should return a paginated set of matching posts, using the existing settings for limit, offset etc.

@ghost ghost assigned julesbravo Jul 22, 2013
@ErisDS ErisDS mentioned this issue Aug 9, 2013
@ErisDS
Copy link
Member Author

ErisDS commented Aug 21, 2013

Do you have any details as to what schema changes you need yet? If you do, perhaps push up some stuff to a fork so I can take a look? I am doing a bit of a schema audit is all.

@ErisDS
Copy link
Member Author

ErisDS commented Sep 8, 2013

@julesbravo Any further updates to this?

@julesbravo
Copy link

Hannah I've been slacking I'll try to wrap it up Tuesday

On Sep 8, 2013, at 9:04 AM, Hannah Wolfe notifications@github.com wrote:

@julesbravo Any further updates to this?


Reply to this email directly or view it on GitHub.

@ErisDS
Copy link
Member Author

ErisDS commented Sep 9, 2013

No probs just been wondering whether to put this into the current schema migration or not. Think it might be worth waiting, purely because there are so many other changes ongoing.

@julesbravo
Copy link

Hannah,

I've got what I think should work done, but I'm having issues around Knex.
I'm going to try to hop into IRC tomorrow to hopefully get some pointers.
I've just been having the damnedest time with this.

On Mon, Sep 9, 2013 at 7:46 AM, Hannah Wolfe notifications@github.comwrote:

No probs just been wondering whether to put this into the current schema
migration or not. Think it might be worth waiting.


Reply to this email directly or view it on GitHubhttps://github.com//issues/306#issuecomment-24081970
.

@ErisDS
Copy link
Member Author

ErisDS commented Oct 24, 2013

This work was started in #489, but now needs to be picked up by a developer willing to give it some serious love - including considering how it might be written as a BookShelf or Knex plugin.

@halfdan
Copy link
Contributor

halfdan commented Oct 25, 2013

@ErisDS Can you assign me to this one?

@ErisDS
Copy link
Member Author

ErisDS commented Oct 25, 2013

FYI: This is for 0.5, and there is a search branch to submit PRs to. I recommend looking at what was done so far julesbravo@d9944de (not merged).

The big questions I have are:

  • should this be broken down into smaller chunks? It seems like a big chunk.
  • is it possible to do as a plugin for Bookshelf, or at least in knex, otherwise this will need to be re-implemented in Ghost for each and every data store - people are already using SQLite, MySQL and even postgres although we don't officially support that yet - that's a lot of code. Seems to me it should live elsewhere.

@Swaagie
Copy link

Swaagie commented Oct 25, 2013

Caught a bit of the discussion on IRC, I think the search should be handled by something like https://github.com/olivernn/lunr.js, this would allow plugins to be written towards an API, basically solving part of the puzzle. Lunr.js provides a tf-idf algoritm that allows documents to be ranked as well. Not simply listing the posts that contain the word but also sorting on relevance.

As reference, the search for the nodejitsu handbook is done with lunr.js. It's just a matter of pushing text/titles in at startup and let lunr.js do its magix

@ErisDS
Copy link
Member Author

ErisDS commented Oct 25, 2013

A bit like Solr, but much smaller and not as bright
http://lunrjs.com

👯

Excellent thanks 👍 Think this is probably well worth having a bit of a play with?

@halfdan
Copy link
Contributor

halfdan commented Oct 25, 2013

The issue with lunrjs seems to be that all items have to be indexed on app startup and be kept in-memory during the lifetime of the application. This results in an increased memory footprint and scalability issues at some point (imagine indexing 5000 posts at app startup).

@Swaagie
Copy link

Swaagie commented Oct 26, 2013

Agree with @halfdan there, we discussed this a bit over IRC. Using in database fts would be best, but not every database is as capable. For smaller blogs lunr.js will just do fine, however the upfront indexing will indeed increase memory footprints, reducing scalability. Perhaps lunr.js could be provided as plugin as intermediate API for databases which do not support fts

@ErisDS ErisDS modified the milestones: Future, Multi-user Apr 15, 2014
@ErisDS ErisDS added api and removed data labels Apr 15, 2014
@ErisDS
Copy link
Member Author

ErisDS commented Jan 5, 2015

This issue is pretty old and floundering. We're looking for someone to take the lead! See: https://ghost.org/contribute/search-lead/

@seesharper
Copy link

You might want to check out my new project that adds full text search to the Ghost platform.
It is rather simple and only supports SQLite, but it works :)
https://github.com/seesharper/GhostSearch

@dwstevens
Copy link

I think having a stand alone db for the search index would be best. This way you don't have to deal with fuzzy text searching inconsistencies (or nonexistence) of the various dbms's out there.

A possible option would be using a LevelDB backed index (file based key/value store) using the search-index module.

https://github.com/fergiemcdowall/search-index

Yes it adds another file to the mix that could grow quite large, but I believe it would have a lower memory footprint than Lunr.js.

@ErisDS
Copy link
Member Author

ErisDS commented Apr 3, 2015

@dwstevens Tell me if I'm wrong, but I really don't think levelDB is an option. As far as I am aware it'll add a new dependency that is far more complicated even than sqlite3 to get installed (and that has the wonderful node-pre-gyp feature), meaning it's not suitable for our user base.

@dwstevens
Copy link

@ErisDS Ah, my bad. You are correct.

@ErisDS
Copy link
Member Author

ErisDS commented Oct 8, 2015

Closing this issue in favour of #5321 which has plenty of discussion & traction. Having 2 issues is just confusing at this point.

@ErisDS ErisDS closed this as completed Oct 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants