-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ranking of exact matches in page titles #437
Comments
Ah! Interesting. So first for reasoning on why the results are in that order: headings are given very strong priority — though
On an intuitive level I agree! The trouble is that currently, with the data Pagefind has on hand at the time of ranking, this isn't known. By the time that level of data is loaded into the front end, the rankings are locked in. When ranking, all we see for these results is: // Page A word match locations
[{
"weight": 6,
"location": 23
}, {
"weight": 1,
"location": 27
}]
// Page B word match locations
[{
"weight": 7,
"location": 0
}, {
"weight": 1,
"location": 12
}, {
"weight": 1,
"location": 21
},
/* -- more -- */
] That I'm sure there's a creative solution here, but nothing immediately comes to mind. One option would just be to bump I do want to expose a new configuration option for mapping element selectors to custom weightings, meaning the default h1 rank could be a per-site implementation if people want to tailor their results. But this doesn't solve the "making Pagefind smart enough" goal of doing better by default. Another option is finding some way to opt h1 elements (or generally high-ranked words) out of the term frequency penalty — but I'll need to think on that further. Sorry for the essay! Since I have no immediate bright ideas, I think implementing manual ranking is a good first step. (Changing your |
No need to apologize, I love the essay! 😄 ...very interesting to get to know more about pagefind's internals. From my experience with implementing search, it's very possible that too many built-in assumptions ("smartness") could hurt more then help. After giving it more thought I realized that maybe even the assumption that a perfect match in the From other search engines I know the concept of "pinning" pages to the very top for specific search terms. Maybe that could be a feature idea for pagefind, as well. Something like It could also be a meta tag in the <!-- pin the page to the top when searching for "plugin" or "plugins": -->
<meta name="pagefind:pin" content="plugin,plugins"> Just an idea without knowing if this could be feasible with the architecture of pagefind 🙂 |
Given the implementation details kindly explained by @bglw, I'd opt for manual ranking here. Intuitively, for most sites, ranking pages the highest that have an exact match between search term and h1/title makes sense, but if that's not encoded in the ranking data, ranking them during display should work as well. I can imagine treating the h1 as a special |
Good directions to think about, thanks to both of ya 🙏 I need to brush up CloudCannon's documentation and search, so that might prove a good place to experiment with some pinning or special casing on a site I'm familiar with. Will update here if I land on anything! |
Some new thoughts on this; I think I'm going to try get metadata into the index in a way that it can be queried as part of a search. This would allow you to do a freeform search for the word Still more ideation needed, but I think having that data combined with exposing more configuration on how it is used when ranking will allow people to tailor search to their site content. |
My request is not to exactly match titles, but filenames: It would be nice IMHO if a search that exactly matched the name of a page would deliver that page as the top hit. E.g. searching for "I2C" would put a page called "I2C.html" at the top (without case sensitivity of course.) Are filenames even used at present? |
Filenames are currently unused apart from building URLs, but the URLs are present in the result fragments, so what you're after would still be solved by #532, as it is the precursor to matching files based on any "non-content" fields |
I agree that matches in page titles are not adquately weighed. After tinkering with all available parameters I could not get exact matches in the title tag with There are also some other rankings which I do not understand - e.g. here: In the above image, the first result has 6 matches while the second one has 49 matches. Changing the parameters did not seem to do much. If you want, you can try it yourself here: https://www.zhlaw.ch/ Anyways - just also wanted to say: Amazing project, thank you so much for your work and offering this under a permissible license! 🙏 |
Hi there,
we just updated pagefind to 1.0.2 for the docs for swup and it's amazing! Thanks for all your work on this project.
Playing around with it, I noticed something that might or might not be possible to generalize: When I search for "Plugins", I'm getting the following results:
Intuitively I'd think that a page who's main heading (
h1
) exactly matches the search term ("Plugins", highlighted in the screenshot above) should be rated highest. Sure it's possible for us to manually give the "plugins" page a very high rating – but maybe there is a way for pagefind to get smart enough to return pages with an exact match in the main heading first?I'd be happy to hear your opinion on that before we start implementing manual ranking.
The text was updated successfully, but these errors were encountered: