New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEATURE: Adds option to search only in titles #5538
FEATURE: Adds option to search only in titles #5538
Conversation
You've signed the CLA, jorgemanrubia. Thank you! This pull request is ready for review. |
app/services/search_indexer.rb
Outdated
# first one will have a priority A; he second one a priority B, etc. | ||
# When only 1 entry is provided, no weights will be used. | ||
def self.update_index(table, id, *raw_entries) | ||
raw_data = Search.prepare_data(raw_entries.join(' '), :index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method now supports providing multiple entries to index and, when more than 1 entry is provided, it will assign them a weight.
I didn't add support for providing the specific weights and, instead, used the convention of deducing it from the order. So the first entry gets an A
, the second a B
, and so. I think this works well for what we need and keeps the code simpler.
end.join(" || ") | ||
end | ||
end | ||
|
||
def self.update_topics_index(topic_id, title, cooked) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While weights were only needed for posts, I added them for topics too for consistency
This pull request has been mentioned on Discourse Meta. There might be relevant details there: https://meta.discourse.org/t/search-only-within-topic-titles/27600/10 |
# don't allow concurrency to mess up saving a post | ||
end | ||
|
||
# for user login and name use "simple" lowercase stemmer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted a few methods from SearchIndexer#update_index
updateSearchTermForSpecialInLikes() { | ||
const match = this.filterBlocks(REGEXP_SPECIAL_IN_LIKES_MATCH); | ||
const inFilter = this.get('searchedTerms.special.in.likes'); | ||
updateSearchTermForSpecialIn(key, regexp){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted method to remove some duplicated code for the "searching in" logic
app/services/search_indexer.rb
Outdated
"indexable_fragment_#{index}".to_sym | ||
end | ||
|
||
def self.build_ts_vector(stemmer, indexable_entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where we add weights to entries when more than 1 entry is provided. When only 1 entry is provided (users, categories) it works like it used to.
When searching exact terms with The reason is that these queries check |
That 1.freeze I may have added there sure is odd, I would just bump version to 2 and rely on it the indexing job. Can you review the existing code that attempts to give a bump to title. Perhaps there is a bunch of code that can be removed now? eg: https://github.com/discourse/discourse/blob/master/lib/search.rb#L746 I think that checkbox in the UI has a bit too much prominence I would bump it to the bottom of the advanced area. Overall the PR looks great, nice feature. |
@SamSaffron I added some commits addressing your comments, thanks! Regarding the ordering, this commit will use the new weights to bump titles. With the default weights of A problem with this change is that, until the database is reindexed, title bumping in results won't work. Not sure if that's ok. If it is not, we can rollback that change and deploy it a few days after getting this shipped to be safe. |
I was having a look at Something like: posts = Post.joins(:topic)
.select(:id)
.where(...)
posts.find_each {|post| indexing_stuff} |
OK, I took some time today to re-do this work based on your awesome changes. per: 86d12bd I opted to hack this in myself cause I was very concerned about maintenance and needed to know exactly how everything works together. I opted out of UI changes for now for a couple more betas so the index has some time to rebuild. Overall I was very happy with this change and I think it provides discourse with a significantly better search. ranking wise I went with |
yes this was an excellent nudge to improve search on titles which we did get a lot of complaints about .. glad we were actually doing something wrong, that makes it easier to fix.. thank you @jorgemanrubia |
It adds a new search option
in:title
that will make it search only in titles.It also adds a new checkbox at the top of the advanced search UI:
See:
Overview of changes
It changes the way posts and topics are indexed to use PostgreSQL full-text search weights. Then, it uses the weight associated with
title
to filter by title when searching posts.Before, it was concatenating
title + body + category
and storing the indexed result. Now, it will store the concatenation of weighted indexed parts.This is my first experience with PostgreSQL full-text search. I think it is the proper way to do it after checking the docs. The weights system is a little bit bizarre with 4 values (A, B, C, D) but seems to work pretty well.
As a side effect, results will take weight into consideration when sorting. So
title
>body
>category
. From 12.3.3. Ranking Search Results:So for posts, weights will be 1.0 for title
title
, 0.4 forbody
and 0.2 forcategory
. I think this is ok, but it's definitely something good to check before merging the PR. I tested this locally with a database with 19k posts from a forum I used to run, and the sorting made sense to me.I will add some comments in the code to explain the changes next.
Reindexing required
in:title
will only work after re-indexing all the posts. Not sure how to do it automatically when shipping this:INDEX_VERSION
and rely on the indexing job that runs every day?In development I have used
rake search:reindex
for testing this.