Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow searches that are sorted by, rather than restricted by, a criterion #1804

Open
alanfgh opened this issue Mar 2, 2019 · 5 comments
Open
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.

Comments

@alanfgh
Copy link
Contributor

alanfgh commented Mar 2, 2019

Currently, we can only specify that search results satisfy a criterion. We can't specify that we want results that may or may not satisfy that criterion but that are sorted so that the ones that do satisfy it occur at the top of the list. This means that I need to guess beforehand how many hits I'm likely to get. If I guess too low, I have to either page through lots of results that are not what I'm looking for, or follow it with a more restrictive search. If I guess too high, I have to follow it with a less restrictive search.

This commonly happens with exact match vs. stemmed match. Let's say I'm looking for a verb. First, I use the default match, which (for the languages I'm learning) is stemmed. However, since the stemmer is imperfect, the results include a lot of hits that include words that happen to start with that prefix but are not the verb I'm looking for. Then I need to do an exact-match search (starting with "=") against the infinitive.

Note that Sphinx supports sorting modes:

http://sphinxsearch.com/docs/current.html#sorting-modes

so maybe we could take advantage of that functionality.

I mentioned this subject on the Wall:

https://tatoeba.org/eng/wall/show_message/31420#message_31420

@jiru jiru added the enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. label Mar 4, 2019
@ckjpn
Copy link

ckjpn commented Mar 6, 2019

Not exactly the same, but somewhat related.

#1713

@agrodet
Copy link
Contributor

agrodet commented Jan 25, 2020

Is that solved by the "Relevance" option? (I'm not sure I understood your problem fully)

@alanfgh
Copy link
Contributor Author

alanfgh commented Jan 25, 2020

The "Relevance" option is indeed an example of the kind of "fallback sort" that I was envisioning. But it's a single example that uses particular criteria. At the time I wrote this proposal, I was hoping that we could do this more generally to perform custom fallback sorting operations with other criteria.

Since then, it has occurred to me that while it's more work to have to perform two searches, it's also work to figure out the point where results from the narrower (preferred) set of criteria end and the other results begin. That may in fact require more effort than simply starting a second search. So the feature I was asking for may not be as useful as I originally envisioned.

@jiru
Copy link
Member

jiru commented Mar 13, 2020

it's also work to figure out the point where results from the narrower (preferred) set of criteria end and the other results begin

I have been thinking about that too since it’s already some work with the Relevance sort. In particular, for languages without word separators like Chinese and Japanese, each character is a word, so results "words not in same order" that come at the end are mostly irrelevant and it’s work to filter them out with your brain.

What if we somehow separate the results into different sections with horizontal bars that say something like "Exact matches with words in same order", "Exact matches with words in different order" or "Approximate matches".

@alanfgh
Copy link
Contributor Author

alanfgh commented Mar 13, 2020

What if we somehow separate the results into different sections with horizontal bars that say something like "Exact matches with words in same order", "Exact matches with words in different order" or "Approximate matches".

Yes, that would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.
Projects
None yet
Development

No branches or pull requests

4 participants