Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean search operators / search syntax documentation #329

Open
chrisspen opened this issue Jun 25, 2023 · 5 comments
Open

Boolean search operators / search syntax documentation #329

chrisspen opened this issue Jun 25, 2023 · 5 comments

Comments

@chrisspen
Copy link

Is there any formal documentation on the search syntax supported?

Like, is the exclusive "AND" operator supported? If I search for "term1 term2", pagefind seems to treat all searches like all the terms are ORed, so a result will contain at least one result, and maybe others if I'm lucky.

How would I tell pagefind to only return results that contain all the keywords?

@bglw
Copy link
Contributor

bglw commented Jun 26, 2023

No formal syntax has been implemented yet — it's something I'm hoping to do before a 1.0 release but I can't guarantee I'll get to it. There's a small conversation about this in #70 but no work has been started.

For some context on the current state:

The current search strategy could be thought of as "best effort". Specifically in your case, term1 term2 will be treated as term1 AND term2 if both words exist in the corpus — so Pagefind will bias to showing only the most specific pages in the case that it recognizes both words.

If one of the two words isn't found anywhere in the search index, then that word will be ignored. So in this case if term2 doesn't exist anywhere on the site being indexed, then Pagefind will execute the search as simply term1. In this sense it's biased toward returning some results, rather than none.

There shouldn't be a case where you see term1 term2 returning ORed results — let me know if this is definitely happening. I can't see a way this would be getting through the current search function, though. The excerpts generated sometimes aren't the best, and won't contain both words, so sometimes the matches might look worse than reality. Another explanation is that Pagefind does search all word extensions, so term1 term2 will also return a page containing term1 and term22.

Hopefully that context helps! In summary

How would I tell pagefind to only return results that contain all the keywords?

As long as both keywords exist (and aren't common prefixes) then this is the current behaviour. But I am keen on supporting a more formal search documentation 🙂

@eklausme
Copy link

First of all, I would like to thank the authors of Pagefind for this really easy to use search-tool!

I stumbled upon this issue because I also thought that Pagefind does not have an AND condition -- this perception is obviously wrong, as illustrated by above answer from bglw.

What is "missing", though, is to specify word groups, i.e., a sequence of two or more words to search for and require that they be found together. For example, for the famous sentence in Shakespeare's Hamlet:

To be, or not to be, that is the question

it is difficult to find to and be. It is the combination of those two words, which make them stand out. So what might be needed is searching for something like to+be, or that+is+the+question.

Also see Pagefind: Searching in Static Sites. As stated there, it is not a pressing issue, and mostly not important for technical blogs.

@bglw
Copy link
Contributor

bglw commented Oct 25, 2023

👋 Hey @eklausme!

Yes, that kind of adjacency would be great! Ideally, I would like Pagefind to take that into account by default. Given a plain search for to be, pages where those words are close or adjacent should rank higher than pages where those words are paragraphs apart.

That data does already exist when searching — if you search for "to be" in quotes you'll see only pages with those words adjacent are in the results. To do the better generic ranking, it's just a matter of finding a good algorithm to calculate that ranking, given Pagefind's available data, without blowing out the search performance.

Not something I have had time for yet, but hopefully will one day! 🙂

@bglw bglw changed the title Search syntax documentation? Boolean search operators / search syntax documentation Nov 16, 2023
@leancept
Copy link

I'm using Pagefind to show a list of related articles using the current article's tags. Problem is, it only shows articles that have exactly the same tags as the one being viewed. I've solved it by reducing the keyword set until Pagefind returns results. A fuzzy search matching, or one based on OR would be great though.

@bglw
Copy link
Contributor

bglw commented Nov 23, 2023

@leancept if you're showing a list based on a known set of tags, then filtering sounds like a good path that does support this :)

https://pagefind.app/docs/js-api-filtering/#using-compound-filters

You would be able to do something like:

await pagefind.search(null, {
    filters: {
        tag: {
            any: ["tag one", "tag_two", "tag_three"]
        }
    },
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants