-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use MongoDB full text search #149
Conversation
- native/indexed db support for searching - supports full search syntax of MongoDB, i.e. `multiple words`, `"phrase search"`, `word -exclusion` - higher weights for name and description - stemming of search terms (uses english by default, can be overriden on a per package base, by adding a language field) - ignores stopwords - does not perform proximity search (typos), but our levenshtein search returned a lot of nonsense - can't currently index the readme b/c it's not in the db, so people will have to optimize their short description to be nicely searchable - don't index categories which should be an orthogonal search filter
|
Not perfect but a lot better than the current search which returns almost arbitrary results. |
- use db.packages.dropIndex('searchTerms_1') and
db.packages.update({}, {$unset: {searchTerms: ''}}, {multi: true})
to actually remove the data
|
As a long-term solution and for autocompleted search we could use the SOLR adapter of mongo-connector. |
|
Sounds/looks good. I'll merge and switch live together with the other changes. @wilzbach: Would be good to have it on alpha.dub.pm for testing. I guess there could be a migration statement to clear the "searchTerms" field, but I can do that once from the CLI, too.
Indeed, plain Levenshtein turned out to not really be suited to this. It would at least have to weight the edit operations depending on their context and the place on the keyboard and lingual meaning/similarity. |
There you go: http://alpha.dub.pm/ I just registered a couple of packages, but if you send me the package dump, I can put it there too ;-) |
Thanks a lot. It took quite a while though until it refreshed it's cache. It had to (re)-check for every package the latest version. Should be online now ;-) |
|
Thanks! Makes a good impression. I'll merge now. |
It's in the commit message, it was too much hassle to check whether the index still exists. With WiredTiger as storage backend you can't read system.indexes, and runCommand listIndexes return a weird format. |
multiple words,"phrase search",word -exclusiona per package base, by adding a language field)
levenshtein search returned a lot of nonsense
will have to optimize their short description to be nicely searchable