Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritizing Search Results (Enhancement) #1713

Closed
ckjpn opened this issue Nov 2, 2018 · 6 comments
Closed

Prioritizing Search Results (Enhancement) #1713

ckjpn opened this issue Nov 2, 2018 · 6 comments
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.

Comments

@ckjpn
Copy link

ckjpn commented Nov 2, 2018

Perhaps it's not possible now, and maybe never will be, but it would likely be useful to prioritize, or at least have the option to prioritize, search results as follows.

sentences with audio
sentences tagged OK
sentences by native speakers
and finally all other sentences.

For people searching for English sentences, the following would be a useful sequence.

sentences with audio
sentences on List 907
sentences by native speakers
and finally all other sentences.

It would probably also be good to give members the option to skip the List 907 step if they think that's not a good way to prioritize sentences.

@jiru jiru added the enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. label Nov 2, 2018
@alanfgh
Copy link
Contributor

alanfgh commented Nov 3, 2018

What is the problem that this suggestion is meant to fix?

People can do this prioritization right now. That is, they can conduct advanced searches for sentences with audio, and/or sentences on any list they desire, and/or by native speakers. As for forcing this prioritization on them, I think it's a bad idea. First of all, there are bound to be people who want to search preferentially for sentences without these characteristics. For instance, they might want to find sentences
WITHOUT audio (for instance, if they want to add it themselves, or because they like longer or more complicated sentences, which are less likely to have audio). Secondly, entrenching a list written by one person in the site's search algorithm would subject others to that person's prejudices and vastly reduce the diversity of sentences found by people using the search. The sentences in the Tatoeba corpus are already far more homogeneous than they should be, due largely to the efforts of a single person in this direction (in part by discouraging others from translating or recording audio for sentences not on or originating from his list). It almost seems like the problem that the suggestion is meant to fix is other people's ability to choose sentences for themselves.

@ckjpn
Copy link
Author

ckjpn commented Nov 5, 2018

I should have said to offer members this as an option.

I think it would be a good default option, though.

New users would likely have a more positive experience on the website if this were implemented for the main "simple" search.

Members who prefer other searches should be allowed to override this in advanced searches.

@ckjpn
Copy link
Author

ckjpn commented Nov 21, 2018

People can do this prioritization right now.

People can't really do this prioritization right now without doing multiple searches.

This is just a follow up, which may not matter if this kind of prioritization is not possible.

What happens now, is that a member can search for sentences with audio, and if such sentences aren't found, the member can hen then do a search for sentences by native speakers, and if none are found, then do another search with no limitations.

If would be nice if when searching, you could see the several sentences with audio, followed by those without audio by native speakers, followed by all remaining sentences. This would be especially useful for search results with only a few audio files.

The point is to give visitors a positive experience, much like Google does when they prioritize search results.

For students of English, for example my students, it would be useful and beneficial to them if they could see sentences that they could with some certainty trust were correct and natural-sounding. Sentences that have been recorded have gone through one extra proofreading, so are likely to be the most trustworthy, followed by sentences on List 907, followed by sentences owned by people who claim they are native speakers. The least trustworthy would be the remaining owned sentences, followed by the unowned sentences. It's often difficult for non-native speakers to access whether a sentence is good or not.

For students of Japanese, and maybe most other languages, the sequence would likely be 1) sentences with audio 2) sentences tagged OK by native speakers 3) sentences by those claiming to be native speakers 4) all other sentences. At least with Japanese sentences, it seems that a lot of the orphan sentences and sentences owned by non-native speakers are about the same quality.

@ckjpn
Copy link
Author

ckjpn commented Mar 14, 2019

This is just to help clarify what I think would be useful.

If a student wants to study sentences with the word "email", it would be useful if sentences were shown in this order.

  1. With Audio
    https://tatoeba.org/eng/sentences/search?query=email&from=eng&to=und&orphans=no&unapproved=no&user=&tags=&list=&has_audio=yes&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=words
    email - with audio (108 results)

  2. And then the additional 80 sentences that can be found with this search.
    https://tatoeba.org/eng/sentences/search?query=email&from=eng&to=und&orphans=no&unapproved=no&user=&tags=&list=907&has_audio=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=words
    email - on List 907 (188 results)

  3. And then the additional sentences shown on this search, that haven't already been shown.
    https://tatoeba.org/eng/sentences/search?query=email&from=eng&to=und&orphans=no&unapproved=no&native=yes&user=&tags=&list=&has_audio=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=words
    email - by self-proclaimed native speakers (232 results)

  4. And then the additional sentences shown on this search, that haven't already been shown.
    https://tatoeba.org/eng/sentences/search?query=email&from=eng&to=und
    email - all results (329 results)

Perhaps if this is not possible, then maybe you could make it possible to do the 2nd search with a way to filter out sentences found by the first search (etc.). This would allow students to manually do 4 searches to get similar results.

@ckjpn
Copy link
Author

ckjpn commented Jun 28, 2019

This Wall post is somewhat related.

https://tatoeba.org/eng/wall/show_message/32118#message_32118
Feature suggestion: Sort sentences by what should be translated first.

@trang
Copy link
Member

trang commented Jul 28, 2019

The prioritization that is suggested here is way too biased for us to consider it as a default. I think we've started to agree that randomizing, rather than trying to guess what the user wants, is the better default option (cf. Wall thread).

Implementing custom levels of prioritization as an option might be possible, but it is overcomplexifying the search feature. In your example @ckjpn, even if someone could configure that they first see sentences with audio, then sentences in list 907, then sentences from native speakers, they would end up looking only at the sentences with audio. People rarely go past the few first pages when using the search.

For me it's clear we won't implement this feature, so I will close this issue.

@trang trang closed this as completed Jul 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.
Projects
None yet
Development

No branches or pull requests

4 participants