Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Levenshtein distance to order search results #7918

Closed

Conversation

pmario
Copy link
Contributor

@pmario pmario commented Jan 5, 2024

This PR fixes: #7917 [BUG] The core search result list should return relevant info for basic search terms - early

It replaces the default filter string in $:/core/ui/DefaultSearchResultList

  • replace: [!is[system]search:title<userInput>sort[title]limit[250]]
  • with: [!is[system]search:title<userInput>] :sort:integer[levenshtein<userInput>] :and[limit[250]]

It may not be optimal, but I think it's a huge improvement.

All Example Screenshots can be seen at: #7917 -- I'll only add one here for tiddler

01-tiddler

Copy link

vercel bot commented Jan 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
tiddlywiki5 ✅ Ready (Inspect) Visit Preview Jan 5, 2024 2:26pm

@Jermolene
Copy link
Owner

Thanks @pmario it is useful to be able to try it out.

I find that it improves the quality of the first few search results, but makes it much, much harder to visually scan the main body of results because they are no longer alphabetised.

image

I think that this Levenshtein technique might be useful but not sufficient to fix our search shortcomings. I wonder if it might be useful for picking out a small number of top matches, for example.

@Jermolene Jermolene changed the title improve default search filter-string Use Levenshtein distance to order search results Jan 5, 2024
@pmario
Copy link
Contributor Author

pmario commented Jan 5, 2024

I find that it improves the quality of the first few search results, but makes it much, much harder to visually scan the main body of results because they are no longer alphabetised.

We do have a completely "contrary view" here. For me personally the shorter results are easier to scan and most of the time reflect the context.

For me the alphabetical order destroys the context and makes the search results mainly useless.

@mateuszwilczek
Copy link
Contributor

I think we should look at two uses of search.

One is "navigating", that is, getting to a tiddler, which name we more or less know. This would be my most frequent use of search. In this case, having the closest matches on top is a great convenience.

The second would would be "browsing", looking through a list of tiddlers, but not precisely knowing which one we want to open. In this case, having a longer list of titles organized in some way (alphabetical) is useful. I rarely do this, and if so, then often using filter search rather than standard search. The "closest matches" loose their meaning a bit in this case.

I agree with Jeremy that looking through a longer Levensthein-sorted list is impractical. But I do not see it as a disadvantage of this sorting, because I expect the result of interest to be near the top anyway (or I refine the search until it is).

Maybe a viable compromise would be to split the results not in two, but in three segments:

  1. Top title matches. First 5 closest matches with the Levenshtein sorting (or whatever better we conceive in the future).
  2. Further title matches. The rest of title matches, sorted alphabetically. Should this part be very short, like 5, then make the first segment longer (up to 10) and do not show the second segment.
  3. All matches without changes.

This should satisfy both conflicting needs quite well. If I know what I want to navigate to, I can get it to show in the top 5 result without much effort. If I want to browse a list of 50 titles, I think that I can get around reading the first 5 as a separate list before the next 45 alphabetically sorted ones.

It would be far from ideal, but possible right away, and still much better that the current search.

@Jermolene
Copy link
Owner

Thanks @pmario @mateuszwilczek, I like @mateuszwilczek's proposal.

@saqimtiaz
Copy link
Contributor

Maybe a viable compromise would be to split the results not in two, but in three segments:

  1. Top title matches. First 5 closest matches with the Levenshtein sorting (or whatever better we conceive in the future).
  2. Further title matches. The rest of title matches, sorted alphabetically. Should this part be very short, like 5, then make the first segment longer (up to 10) and do not show the second segment.
  3. All matches without changes.

I second this proposal. I do something similar in my own notes wiki with exactly those three groups of matches, though using a custom sort instead of Levenshtein as it predates the operator and also because it priorities results that are prefixed with the search term.

@pmario
Copy link
Contributor Author

pmario commented Jan 6, 2024

  1. Top title matches. First 5 closest matches with the Levenshtein sorting (or whatever better we conceive in the future).
  2. Further title matches. The rest of title matches, sorted alphabetically. Should this part be very short, like 5, then make the first segment longer (up to 10) and do not show the second segment.
  3. All matches without changes.

Changing the DefaultSearchResultList into 3 or more different segments will be a backwards incompatible change.

The current implementation of the DefaultSearchResultList converts the "shadow" tiddler into a "real tiddler", if the one of the two search filter strings is changed. This prevents automatic core updates.

So if users did ever change the first-search-filter or second-search-filter fields to optimize their search results, they will have problems with a future update.


The default search dropdown already allows us to configure additional search-result-lists using the $:/tags/SearchResults.

So as I see it, we'll need to go the plugin route, which adds such a new tab to the search-dropdown in a backwards compatible way. That's not pretty, but does not need any change.

I'll close this PR.


I may have a closer look about a new option for my own field-search plugin, which will allow me to also search in fields and titles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants