Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to search multiple words across multiple columns in Grid #1946

Merged
merged 4 commits into from
Dec 12, 2022

Conversation

mkrecek234
Copy link
Contributor

@mkrecek234 mkrecek234 commented Dec 5, 2022

related with #1911

src/Grid.php Outdated Show resolved Hide resolved
@mvorisek
Copy link
Member

mvorisek commented Dec 6, 2022

I still belive we can improve the search as outlined in #1911 (comment).

Enclosing words in quotes that should be matched as one substring is quite common, but non tech user usually does not know it. I will try to find some time to impl. a solution /wo the need for quotes.

Also this change needs test and the search should work the same for ex. for Lookup (server side Dropdown).

Co-authored-by: Michael Voříšek <mvorisek@mvorisek.cz>
@mkrecek234
Copy link
Contributor Author

@mvorisek I suggest to keep it simple. Non-tech users often don't even search for concatenated strings, but enter simply multiple words which works perfect here. And having quotes to combine a sub-string is state of the art if you look at Google for example. Let us keep it simple and not spend too much time on jt. If we add negation, this search is more powerful than most searches I know on applications. Having "smart" searches like 1,000 matching 1000 to me is over-the-top, and should be part of a separate PR - we don't need it in my opinion for a small and fast grid implementation.

@mvorisek
Copy link
Member

mvorisek commented Dec 6, 2022

@mkrecek234 I mean some serious problems like not matching "Elon Musk" when stored in two separate columns

it can be fixed quite easy, what we need to do is build a scoring expression and use it in WHERE/HAVING and in ORDER BY

the goal should be to provide at least some results sorted by the most relevant one, returning nothing for "not exact" query is bad and often lead users to a situation they think the record does not exist

fuzzy search should be then supported by highlighting on the client side like:

image

@mkrecek234
Copy link
Contributor Author

Sure this could be nice, but it is hard to generalize: If you separate given name and last name, and want to search for "Elon Musk" with quotations, sure this won't work. You could easily addExpression where you CONCAT_WS(" ", given_name, last_name) and add it to field list. However, there are languages where last name is first, so Musk Elon would be searched.
My experience is: 99% of search cases are simple, not quotation, you find what you want. An a relevance score is nice but very specialised for highly complex searches or extremely long lists.

If you have simple improvements would be great to get them into this PR - or we first merge the current PR as it heavily improves the usefulness of the grid search function already.

@mvorisek mvorisek changed the title Enhance quick search term handling in grid Enhance Grid quick search Dec 12, 2022
@mvorisek mvorisek changed the title Enhance Grid quick search Allow to seach across multiple columns in Grid Dec 12, 2022
@mvorisek
Copy link
Member

@mkrecek234 how critical is the quoted multiple word seach for you?

I have dropped the multiple word seach now in favor of simpler code. We should not support quoting at all, instead of matching with AND scopes, we should match everything with OR, and sort better row match (multiple words matched together like if they were quoted together, words starting with the actually seached term, ...) on top. This way, we would also return results when not all search terms were matched.

for record, we can keep #1911 open and merge this PR as now it has tests :)

@mkrecek234
Copy link
Contributor Author

@mvorisek Fine for me to merge this now without quoted search terms. Simpler code however to me is not really a satisfactory argument, as quoted search terms to me seem standard and give a bit more freedom to filter. But we can leave this for a later improvement for sure.

@mkrecek234 mkrecek234 changed the title Allow to seach across multiple columns in Grid Allow to search multiple words across multiple columns in Grid Dec 12, 2022
@mvorisek
Copy link
Member

Once we admit fuzzy support is great, quoted support does not make any sense to be supported, as we should even then return (sorted) partial matches.

@mvorisek mvorisek merged commit ba8c288 into develop Dec 12, 2022
@mvorisek mvorisek deleted the grid_search_enhancement branch December 12, 2022 13:58
@mkrecek234
Copy link
Contributor Author

mkrecek234 commented Jan 17, 2023

@mvorisek Working now extensively with the improved grid search function, it really is a pain that we skipped the multi-word search with quotes. Users often search for e.g., customer company names like Miller Electronics or John Miller Pumps - since you dropped the quotation there is no easy way to search for that, even though the user knows the word-order and that it is located in one field. I strongly suggest - since there was no other contribution by anyone on fuzzy or search order that we get this back in. We should not limit Atk4's functionality only because we see complex first-best alternatives which are never realised. Until then we should work with the second-best, but realised solution.

I would appreciate if you would contribute improved code to your liking, or if not I strongly appreciate to get back quoted search. This is industry standard in any relevant search engine.

@mvorisek
Copy link
Member

your example is perfect, we need to sort at least by matched words count

@mkrecek234
Copy link
Contributor Author

mkrecek234 commented Jan 17, 2023

No, if the user knows that only results with certain words in the correct order and in one field are a match, then ordering is not helpful as you simply list irrelevant results. If there is no match in a single field for John Miller Pumps it simply is not a match. Another example is New York Lawyers Inc - we do not want to show all matches in York, or in New York where a lawyers and an Inc is in any other field… What is so bad about just concatenating words using quotes? It is just a different regex?

For such a simple function we should not aim to out-smart common standards, I think there are many more important areas where we should focus on.

So as said I suggest to re-introduce it if there is not a better code contribution available.

@mvorisek
Copy link
Member

I diasagree, any typo in the query will result in no match which is not good. There can be also typo in the data which will be otherwise almost impossible to find. So I strongly belive we need to keep to result results even without full match.

Given this POV, your request cannot be satisfied and we need to ORDER the results based on match quality.

@mkrecek234
Copy link
Contributor Author

mkrecek234 commented Jan 17, 2023

These are two things:

  1. Clearness on combined words: If the user knows he has a single term (even if it is concatenated out of multiple words), the search term is more precisely defined and as such the query has to return only those results which are satisfying his search term.
  2. Fuzziness then comes on top if this to result not only strictly matching results but fuzzy results.

I don't care about all lawyers in York being a Inc. having a tag New or residing in New streetif I search for "New York Lawyers Inc." USA. If I search for "New Yrk Lawyers" I might still get a correct result as there is a fuzzy match to a company called New York Lawyers Inc..
My personal opinion is as long as we are not talking about a concrete better solution in coding, the concatenation is a clear improvement over the existing solution.

@atk4/trusted-maintainers Any opinion from the community?

@mkrecek234
Copy link
Contributor Author

mkrecek234 commented Jan 17, 2023

One add-on on the topic fuzziness: This PR is not about fuzziness, only about appropriate precise search results which today is the implementation in grid search. In my opinion until we talk about fuzzy search in a separate PR we should first improve search results based on what the user actualally wants:

  • Search for multiple terms/words which have to be present minimum once in a set of fields (this is implemented here)

  • Have multiple search words where a word might be also be comprised of multiple words, like Coca Cola Company or New York or Manhattan Av (not implemented, but clear industry standard)

  • Negation: Define words that should be EXCLUDED (not implemented, second priority, EASY)

As said, fuzziness can later come on top and is not related at all to this.

The search string expresses what the user wants. Result order might help but only if there is not a ton of irrelevant results given.

@mvorisek
Copy link
Member

Real fuzzy search can be hard/slow. As a first step I propose also quite simple solution - count matched words (like +1 for each matched word from search query, ie. +1 even if the word is present twice) and order by this simply score.

IMHO it will cover your usecase well and also help when some query word/part is not found. Such results will be rendered below better matches.

@mkrecek234
Copy link
Contributor Author

Just a hint for developers who still want support for concatenating multiple words as one word, so search terms like:
Lawyer "New York" or "Laurel and Hardy Industries" Spain just replace the regex in this line here:

ui/src/Grid.php

Line 340 in b113eb5

$qWords = preg_split('~\s+~', $q, -1, \PREG_SPLIT_NO_EMPTY);

by '~(?:\'[^\']*\'|"[^"]*")(*SKIP)(*F)|\h+~'

Then, quick search supports multi-words that can be glued together using apostrophes or double-quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants