Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritizing Search Results (Enhancement) #1713

ckjpn opened this issue Nov 2, 2018 · 4 comments


None yet
3 participants
Copy link

commented Nov 2, 2018

Perhaps it's not possible now, and maybe never will be, but it would likely be useful to prioritize, or at least have the option to prioritize, search results as follows.

sentences with audio
sentences tagged OK
sentences by native speakers
and finally all other sentences.

For people searching for English sentences, the following would be a useful sequence.

sentences with audio
sentences on List 907
sentences by native speakers
and finally all other sentences.

It would probably also be good to give members the option to skip the List 907 step if they think that's not a good way to prioritize sentences.

@jiru jiru added the enhancement label Nov 2, 2018


This comment has been minimized.

Copy link

commented Nov 3, 2018

What is the problem that this suggestion is meant to fix?

People can do this prioritization right now. That is, they can conduct advanced searches for sentences with audio, and/or sentences on any list they desire, and/or by native speakers. As for forcing this prioritization on them, I think it's a bad idea. First of all, there are bound to be people who want to search preferentially for sentences without these characteristics. For instance, they might want to find sentences
WITHOUT audio (for instance, if they want to add it themselves, or because they like longer or more complicated sentences, which are less likely to have audio). Secondly, entrenching a list written by one person in the site's search algorithm would subject others to that person's prejudices and vastly reduce the diversity of sentences found by people using the search. The sentences in the Tatoeba corpus are already far more homogeneous than they should be, due largely to the efforts of a single person in this direction (in part by discouraging others from translating or recording audio for sentences not on or originating from his list). It almost seems like the problem that the suggestion is meant to fix is other people's ability to choose sentences for themselves.


This comment has been minimized.

Copy link

commented Nov 5, 2018

I should have said to offer members this as an option.

I think it would be a good default option, though.

New users would likely have a more positive experience on the website if this were implemented for the main "simple" search.

Members who prefer other searches should be allowed to override this in advanced searches.


This comment has been minimized.

Copy link

commented Nov 21, 2018

People can do this prioritization right now.

People can't really do this prioritization right now without doing multiple searches.

This is just a follow up, which may not matter if this kind of prioritization is not possible.

What happens now, is that a member can search for sentences with audio, and if such sentences aren't found, the member can hen then do a search for sentences by native speakers, and if none are found, then do another search with no limitations.

If would be nice if when searching, you could see the several sentences with audio, followed by those without audio by native speakers, followed by all remaining sentences. This would be especially useful for search results with only a few audio files.

The point is to give visitors a positive experience, much like Google does when they prioritize search results.

For students of English, for example my students, it would be useful and beneficial to them if they could see sentences that they could with some certainty trust were correct and natural-sounding. Sentences that have been recorded have gone through one extra proofreading, so are likely to be the most trustworthy, followed by sentences on List 907, followed by sentences owned by people who claim they are native speakers. The least trustworthy would be the remaining owned sentences, followed by the unowned sentences. It's often difficult for non-native speakers to access whether a sentence is good or not.

For students of Japanese, and maybe most other languages, the sequence would likely be 1) sentences with audio 2) sentences tagged OK by native speakers 3) sentences by those claiming to be native speakers 4) all other sentences. At least with Japanese sentences, it seems that a lot of the orphan sentences and sentences owned by non-native speakers are about the same quality.


This comment has been minimized.

Copy link

commented Mar 14, 2019

This is just to help clarify what I think would be useful.

If a student wants to study sentences with the word "email", it would be useful if sentences were shown in this order.

  1. With Audio
    email - with audio (108 results)

  2. And then the additional 80 sentences that can be found with this search.
    email - on List 907 (188 results)

  3. And then the additional sentences shown on this search, that haven't already been shown.
    email - by self-proclaimed native speakers (232 results)

  4. And then the additional sentences shown on this search, that haven't already been shown.
    email - all results (329 results)

Perhaps if this is not possible, then maybe you could make it possible to do the 2nd search with a way to filter out sentences found by the first search (etc.). This would allow students to manually do 4 searches to get similar results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.