Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mass-tagging of sentences #785

Open
RyckRichards opened this issue Sep 27, 2015 · 48 comments
Open

Mass-tagging of sentences #785

RyckRichards opened this issue Sep 27, 2015 · 48 comments
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. unclear The issue, its scope or the goal are not clearly identified

Comments

@RyckRichards
Copy link
Member

By Hybrid:

Here's my request:
Feature request: Mass-tagging of sentences

Could there be a way to tag a large number of sentences with the same tag quickly? There could be, like, checkboxes that you would check next to the sentences and then click on a button to apply a tag: "by Tom".

This would be useful when adding many sentences from the same source. For example, adding many sentences from the story Pinocchio by Carlo Collodi.

@RyckRichards
Copy link
Member Author

Requested by Hybrid on Tatoeba Day 8 and on Tatoeba Day #9 and by Ricardo14 in Tatoeba Day #11.

@ckjpn
Copy link

ckjpn commented Oct 11, 2015

One possible way to implement what Hybrid seems to want to do is to come up with a way to easily add the previously-used tag, similar to how some software has the feature to repeat previously used formatting, etc.

This wouldn't really be "mass tagging", perhaps, but would allow members who sequentially contribute things that need the same tag to easily do so. I think doing something like this would help Hybrid do what he wants without unnecessarily doing a lot of programming and introducing the possibility that some members may accidentally mis-tag a lot of sentences.

@Guybrush88
Copy link

personally I would find checkboxes very useful to tag lots of sentences in Italian (with tags like 'OK' and tags related to the verbs' tense). this would greatly improve the proofreading of sentences by users that contribute/contributed a lot (more than 1000) sentences, since it would be quite painful to do that one by one. an example: from time to time I proofread riccioberto's Italian sentences (https://tatoeba.org/ita/sentences/of_user/riccioberto/ita) to check if there aren't typos, but it's quite long to tag them accordingly if I find the sentences correct (or if they have to be changed) because he owns more than 6000

@trang trang added the enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. label Oct 11, 2015
@RyckRichards
Copy link
Member Author

+1

@trang
Copy link
Member

trang commented Oct 11, 2015

When I think about the situations where mass-tagging is needed, I imagine the following:

  1. You are adding new sentences, and several of these sentences could be tagged with a same tag.
  2. You are browsing sentences of a certain user, and you would like to tag some of them.
  3. You are searching sentences, and you would like to tag the sentences in the results.

Right now I don't have a clear vision of what would be the best way to implement this.

One possibility would be:

  • We display a button "Tag sentences" somewhere in the right-side column.
  • When the user clicks on the button, checkboxes appear next to each sentence and the button "Tag sentences" is replaced by a form (textinput + "Apply" and "Cancel" buttons).
  • If the user clicks on "Cancel", the form disappears, and the "Tag sentences" button appears again.
  • Otherwise, the user clicks on the sentences that they want to tag.
  • The user enters the name of the tag and clicks on "Apply", then all the sentences that were selected are tagged.
  • Once the tagging has been done, the checkboxes disappear, as well as the tag form, and the "Tag sentences" button appears again.

Some problems:

  • What happens if the user tags several sentences with the wrong tag? We would need the possibility to easily revert the mass-tagging.
  • We don't have a good visual feedback to confirm that the sentences have been tagged. The user won't clearly see which sentences have indeed been tagged, and what they have been tagged with. We probably need to think of a way to implement Display tags in sentences block #790 before we implement mass-tagging.

@ckjpn
Copy link

ckjpn commented Jan 3, 2016

While this isn't exactly the solution being asked for, one easy solution for anyone wanting to add the same tag to several sentences in a series is to copy the tag, and then just paste it in on the next sentence. That's what I often do.

Another more geeky solution would be to explain to members how they can create a JavaScript bookmarket to add a certain tag to the sentence of the page they are on. A member could make several different bookmarklets with one for each tag they often use. This would accomplish what the person requesting this feature wants without cluttering up the pages on tatoeba.org.

@RyckRichards
Copy link
Member Author

We could also add an extra "field" on the main page (maybe accessible to CMs and Admins) which would display the latest tagged sentences as we have for Latest Contributions , for example - https://tatoeba.org/eng/contributions/latest - and last comments - https://tatoeba.org/eng/sentence_comments/index .

@RyckRichards RyckRichards self-assigned this Feb 24, 2016
@RyckRichards
Copy link
Member Author

Here's why I really would like that implemented:

As far as I am learning several languages at the same time and I'm a teacher too, I don't really like to talk to anyone or read something except in it his/her/its native language. I do like to understand what the person is saying - which might not be possible if we don't do that into our native language. There words that can't even be translated. For example, tag. It means is "quite equal" to label or something. But we don't say that in Spanish, Portuguese, French. We use SIMILAR (not equal). Another good example is "pet". It was not translated into Portuguese, for example ( (eu) Fui à um pet shop ontem.). - I went to a pet shop (a vet clinic) yesterday). Etc

OK, an what's it related to tags?

There are so many scenarios which tags are important to me.

  1. - Personal studies

I don't take courses here in Brazil - there are either too expensive (150, 200 euros a month and a teacher gets about 200 euros a month here. The "luckier" ones gets 300, 400 euros.
I'm used to browse for sentences on Tatoeba and so, tag them - verbtense, person, "topic" (weather, wish, relatives, etc). After doing that, I browse for tags that I want to study about.
Example: I want to study the present perfect in English. So, I browse sentences tagged as present perfect and I do the following:

  • English sentences
    ► If it's a sentence that I can easily translate , that has no words which I haven't studied before - In this case, I add to the list - https://tatoeba.org/eng/sentences_lists/show/3351 (About classes / To
    use in my classes). This way, I save sentences that I can show to my students later.

► If it's a sentence that I can translate kinda easily, that contains (no) words which I haven't studied before - in this case, I don't translate but I "mark as OK", I study them and so, I translate
and remove that mark. After this process, I usually add to the 3351 and to the list https://tatoeba.org/eng/sentences_lists/show/3547 ( !! English )

► If it's a sentence that I can't translate - I just mark as OK to study it later. I ask its meaning to someone else. I study it also on websites such as lang8, duolingo, busuu and livemocha or even talk to my friends on WhatsApp. After I understand its meaning completely, I remove the mark, translate it and add to the both lists (3351 and 3547)

-Sentences in other languages

I follow the same principles, but I add them into the list https://tatoeba.org/eng/sentences_lists/show/4065 I often ask someone to translate the sentences in there.

As you see, all of this began using just one Tag.

A big problem I found is that just English sentences have been tagged (sometimes Italian and Portuguese ones). I'd like to have all sentences tagged someday...

  1. - for my students

I teach English, Spanish and Portuguese. I'm used also to prepare my materials using Tatoeba also and its tags again.

For example, if I am going to teach how we express the weather - no matter which language. I compare the student's native language and the language s/he is studying with me. ( In Portuguese, we say "Está frio" (we don't use the pronoun), and in English "It's cold) (pronoun added). so I browse for tags > weather and I see which sentences I can use. (especially that ones that are often ""omitted" on textbooks here in Brazil). I take notes, I added those sentences into a list that I usually delete after the class. Each time I am going to teach about the weather, I do the same thing.

As an exercise, I ask to my "high-level" students to translate sentences on Tatoeba (after I watch them translating sentences by themselves and "offline".) I always ask them to translate sentences
by a specif topic - tag (example: Present Simple).

Many students come and ask me sentences in Portuguese in "presente do indicativo", for example. I tag those sentences and I send the link for them. (or after tagging, I just send some sentences to them).

  1. - translating on Tatoeba

There are too many sentences in English that were not translated into Portuguese - and that worries me a lot since many people is studying this language around the world. I mostly use the advanced search, write what "kind" of sentence I want to translate - Present Simple, Past Simple, location, etc - and I so, I start translating. (e.g.: https://tatoeba.org/eng/sentences/search?query=&from=eng&to=por&orphans=no&unapproved=no&user=&tags=present+simple&has_audio=yes&trans_filter=exclude&trans_to=por&trans_link=direct&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=random

@RyckRichards
Copy link
Member Author

Hybrid. says that he usually adds many sentences from the same actor or with the same verb partner and that would be easier if there were checkboxes on the sentences page. So he could easily to add more than one tag.

personally I would find checkboxes very useful to tag lots of sentences in Italian (with tags like 'OK' and tags related to the verbs' tense). this would greatly improve the proofreading of sentences by users that contribute/contributed a lot (more than 1000) sentences, since it would be quite painful to do that one by one. an example: from time to time I proofread riccioberto's Italian sentences (https://tatoeba.org/ita/sentences/of_user/riccioberto/ita) to check if there aren't typos, but it's quite long to tag them accordingly if I find the sentences correct (or if they have to be changed) because he owns more than 6000

One problem for him might be that you can only show 100 sentences or something per page.

"What happens if the user tags several sentences with the wrong tag? We would need the possibility to easily revert the mass-tagging."

He says that's a good point and maybe there could be a button on the right for "add tag" with a textbox and a button for "remove tag" with a textbox (to write which one you want to remove).

"We could also add an extra "field" on the main page (maybe accessible to CMs and Admins) which would display the latest tagged sentences as we have for Latest Contributions , for example - https://tatoeba.org/eng/contributions/latest - and last comments - https://tatoeba.org/eng/sentence_comments/index ."

He thinks that it would be a good idea to create a log of all the tags that are being added and removed and that it should be accessible to everyone.

@Guybrush88
Copy link

He thinks that it would be a good idea to create a log of all the tags that are being added and removed and that it should be accessible to everyone.

Just my two cents: I don't think this would be a good idea. Probably limiting it to admins and corpus maintainers (any suggestions about advanced contributors?) would prevent an abuse of this feature, probably. Mistakes can still happen to anyone, but this could prevent serious damage (i.e.: people/bots deliberately vandalizing the corpus by mass tagging sentences with harmful tags, such as spam, insults, etc.)

@RyckRichards
Copy link
Member Author

One way to make this feature reliable is to make it available only to some users (members who requested, members who have an expertise, members who know to code the basics, etc). I believe it's not that hard to write a kind of script which tags a sentence whenever it's in Portuguese and starts with "Eu", for example.

It'd save us a lot of time and also improve the corpus quality once we would have more sentences tags which were created to clarify sentences.

@RyckRichards
Copy link
Member Author

It has been discussed again today -https://tatoeba.org/fra/wall/show_message/32221#message_32221

Again, it would help members that want to help members and non-members to study a certain language. I myself use both CK and Guybrush's tags to study English and Italian. However as soliloquist says, "I, too, have thousands of sentences that need to be tagged, but it's discouraging having to visit each sentence's page. " . In other words we are losing a chance to get more sentences tagged.

@soliloquist-tatoeba
Copy link

I think making to tag sentences more easily might be possible wtihout adding a complicated mass-tagging feature.

Similar to the add-to-list icon, a tag icon that opens a text box for adding tags when clicking on it could save a lot of time. In this way, we could tag all sentences on pages with multiple sentences without leaving that page and going to sentences' pages separately.

It's not mass-tagging, but it'll make tagging easier.

Tag

@ckjpn
Copy link

ckjpn commented Jul 18, 2019

1. Here's one possibility.

Allow the data to be imported in a tab-delimited text file, similar to the way sentences used to be able to be imported.

Sentence_number + tab + tag_name

20763	imperative
20763	present simple
22016	imperative
22016	present simple
22061	imperative
22061	present simple
37902	imperative
37902	present simple
266065	imperative
266065	present simple
348091	imperative
348091	present simple
433491	imperative
433491	present simple
565931	imperative
565931	present simple

Members could work with the sentences.csv file and the tag.csv files offline.

Step 1.
Grab all lines in the sentences.csv files in your own language.

Step 2.
Filter out sentences that already have the tags one is going to be using.

Step 3.
Find all sentences that need a given tag.

Step 4.
Create the tab-delimited text file to be imported.

This would work for me, and would be similar to the way we used to be able to do this by sending URLs.

2. Here's another possibility.

Have a form that only allows importing one type of tag at a time.
This would be less convenient, especially for tagging things like tenses, since it would be nice to be able to go through sentences and add different tense names to sentences as your read them.
This would also be less convenient, for sentences that could be tagged with several tags, such as a tense (simple present), situation (restaurant), function (request), etc.

3. Similar to 1.

Maybe I'd like this one best.

Allow one sentence number to have several tags on the same line, comma-delimiting the tags

Sentence_number + tab + tag_name,tag_name2,tagname3,tagname4

2218475	compliment,SVC,present simple
1056859	SVC,job,present simple
2698873	frequency,location,present simple
1846830	restaurant,bar,future simple
955157	SVC,possession,present simple

Note

For importing, it could be similar to the way sentences were imported, allowing an admin to import tags for other members. Maybe this wouldn't be the ideal way, but it could be a temporary solution until we see how things work.

@ckjpn
Copy link

ckjpn commented Jul 19, 2019

If soliloquist-tatoeba's idea were to be used, it would be a good idea to also show the same thing when any new sentence or translation were added, since that would likely be a time when people would want to tag a sentence.

Link to the idea.
#785 (comment)

Perhaps that idea should be another issue. It would be useful to have that function in addition to mass tagging.

@soliloquist-tatoeba
Copy link

Allow one sentence number to have several tags on the same line, comma-delimiting the tags

Sentence_number + tab + tag_name,tag_name2,tagname3,tagname4

@ckjpn

That's a good idea. It would be even better if it included mass-adding sentences to lists, too.

Sentence_number + tab + list_number1,list_number2,list_number3.......

1234567 876,678,1234

Perhaps that idea should be another issue. It would be useful to have that function in addition to mass tagging.

I've opened a new issue for that: #1923

@trang
Copy link
Member

trang commented Jul 27, 2019

Before we consider implementing mass-tagging and how it could be done, we have to take into account that tags are quite messy, and mass-tagging might just make things worse.

For the context, tags were implemented back in 2010. After several months, we figured it was quite a mess and tried to tidy them up.

Someone rightfully wrote in the comments of the second blog post:

I'm a new member here but I think that you're opening a bad can of worms if you allow too many people to create tags.

This is something that should be sorted out by moderators first. I'm not usually in favor of top-down solutions but this is one where it makes sense.

And I think it turned out very true. We were not able to maintain the tags in the long run. Today, tagging is a free-for-all activity. Contributors are not consulting each other before creating new tags. We have many duplicate tags and many "personal" tags.

There are some questions I've asked myself, for which I do not really have a clear answer.

  • What's really the difference between tags and lists?
    • When you add a tag on a sentence, what actually justifies that you tag a sentence instead of adding it to a list?
    • Why is it more useful with a tag than with a list?
  • Is it actually okay to let other members tag sentences that they do not own?

Mass-tagging is a solution to something. But to what exactly? If we take the original request, the use-case provided by Hybrid is about adding many sentences from the same source or same author (if the sentence is taken from a book, an article, etc).

This doesn't have to be solved with mass-tagging.

First of all, there's the assumption that the original author of a sentence should be mentioned as a tag. But why should it be a tag actually? Why don't we have a new field that would store this information? Just like we have a field that indicates what's the language of a sentence, another field that indicates who's the owner of the sentence, we could have a field that indicates who is the original author in case of a copied sentence. Why not?

Second of all, wouldn't it more practical to be able to add this information before creating the sentences and not after? To make the analogy with the language selection, let's imagine our language detection was extremely bad and never detects the correct language. If you know that the next 20 sentences you're adding are going to be in English, you would select it in the language dropdown. It's more practical that way than adding 20 sentences with "auto-detect", then have a feature to mass-edit the language. The same can be done for Hybrid's use-case. If you know you're going that your next 20 sentences are from the same author, then why not have a way to select the author before adding the sentences?

So let's forget about mass-tagging for a moment. We need to identify the problems. What are the scenarios in which people have to repeatedly add the same tag? Each scenario could be an issue on its own, and may not be solved with mass-tagging but with something else.

Hybrid's scenario is:

I'm adding several sentences in a row from the same author. But then I have to go to the sentence's page for each sentence to add the "by " tag and it's tedious.

CK's scenario, I assume, is something like:

I'm proofreading sentences offline from the downloads file. Along the way, I note in a text file the ID of each sentence and tense of the verb used in the sentence (imperative, present simple, etc). But then I have to manually set them on the website and it's tedious.

I need concrete scenarios that describe the current working mode and indicate:

  • What are the sentences involved (is it sentences from search results, from latest contributions, from the download files, from the sentences added...)
  • What are the tags involved ("by " tags, verb tense tags...)

Note: I'm leaving this issue open for discussion, but it will eventually be closed and each use-case/scenario of mass-tagging will be handled in a separate issue.

@trang trang added the unclear The issue, its scope or the goal are not clearly identified label Jul 27, 2019
@soliloquist-tatoeba
Copy link

So assuming there was a way for you to publish your lists on the sentence's page, to make them visible to everyone, what would then be the difference between lists and tags?

Then the difference would be rather cosmetic and it wouldn't matter much if I used tags or lists.

@soliloquist-tatoeba
Copy link

soliloquist-tatoeba commented Jul 27, 2019

By the way, one difference between tags and lists is that tags are collaborative. All sentences with the same tag can be viewed together no matter how many different users added them. It's more difficult with lists as the list needs to be collaborative and the other users need to be aware of that list. Also, it's not possible yet to merge contents of multiple lists created by different users as discussed on #1704.

@ckjpn
Copy link

ckjpn commented Jul 28, 2019

By the way, one difference between tags and lists is that tags are collaborative ...

I agree with this and think this is a very powerful function of tags.
In fact, perhaps we could even do away with collaborative lists, which would help with the problem mentioned in #1911.

Also, I think I've mentioned this earlier, but tags can be a very useful tool for students who want to search for sentences that have been tagged with tenses (present simple, etc.), situations (restaurant, etc.), functions (requests, etc.), and so on.

Students may want to search for all such sentences with translations into their own languages, or members may want to search for all such sentences that yet need translations into their own native languages.

I think it would benefit such people, thus also the Tatoeba Project, to make it easier for educators and researchers to more easily tag sentences, similar to what I've been able to do up to now.

@trang
Copy link
Member

trang commented Jul 28, 2019

Just to be clear, when I ask what's the difference between list and tags, I'm not exactly looking to know what's the difference functionally speaking. I have been closely involved in the implementation of both these features, so I know very well what's the difference in terms of functionalities :)

I'm more looking to understand how everyone is interpreting the notion of lists and the notion of tags. In other words, forget about how things are implemented now. Just imagine that we implemented all the possible features for lists and all the possible features for tags that you've dreamed of. What then, would be the difference between lists and tags?

On the collaborative aspect, just imagine that instead of the endless dropdown to add a sentence to a list, you have a text input with auto-suggestion. Just like tags have auto-suggestion. And so if there was a collaborative list named "Present simple", you would see it as a suggestion when you start typing "Pre...". And imagine it is easy to merge lists. So if I created a list "Present simple" and you created one too, I could easily transfer the sentences from my list into yours if you would agree. And imagine we also have macro-lists. What now?

As a contributor, what would drive you to add a tag instead of adding to a list? And I should also ask, as a learner, what would drive you to search in the tags rather than searching in the lists?

@trang
Copy link
Member

trang commented Jul 28, 2019

I posted on the Wall for in case non-GitHub users want to participate in the discussion: https://tatoeba.org/eng/wall/show_message/32260#message_32260

@alanfgh
Copy link
Contributor

alanfgh commented Jul 28, 2019

For me, conceptually, a list is an enumeration. Associating a list with a sentence is saying "This sentence is a member of a group." I generally think of a list as serving a particular purpose (keeping track of the next batch of 100 sentences that I want to upload to Anki, for instance). It tends to be of a conceptually finite size that is manageable for its purpose.

By contrast, I think of a tag as a descriptor. Associating a tag with a sentence is saying "This sentence has this attribute." It says nothing about how many other sentences have that attribute.

As we know, historically, lists could only be downloaded if they had 100 or fewer sentences, and there was no simple way to download sentences with a particular tag. Furthermore, from a list of sentences, a sentence could be assigned to a list without leaving the window, while adding a tag to a sentence required going to that sentence's page. Also, there was no obvious association between a tag and a contributor, while most lists are associated with a single contributor, and even when they're collaborative, there are generally only a small number of contributors who use it (or so I surmise). While the first restriction I mentioned has been changed, and others could as well, history matters. Now that I've created 60+ lists for my personal use, I've gotten used to the rhythm of finishing a list at 100 sentences, uploading it to Anki, marking it inactive, and starting a new list. I wouldn't want to shift to using tags for the same purpose just because I suddenly had that option. That would introduce an inconsistency. I would feel the same way about seeing other people make that shift: after years of seeing lists like "100 Chuvash sentences I want to learn" and tags like "simple present", I would not want to see sentences associated with tags like "100 Chuvash sentences I want to learn" and lists like "simple present". I would find that jarring and disorienting (and I think it would seriously confuse new contributors).

I can easily see the utility of eliminating the restriction on the size of lists to be downloaded, or introducing mass tagging of sentences, or otherwise allowing sentences to be tagged from list views. But I don't think this should serve as an opportunity for us to erode the useful connotations of "list" and "tag" that we have built up over the course of Tatoeba's existence. That seems like introducing chaos for no good reason.

@trang
Copy link
Member

trang commented Jul 28, 2019

I realize I may have sounded like I want to merge lists and tags into one feature, but be assured that it's not the case. I do have my own an answer as to what is the difference between lists and tags, but it is my personal definition. While the distinction is somewhat clear to me, what is not clear is whether my definition of lists and tags is a valid definition for Tatoeba. Hence this discussion. It is interesting to me, to know what people have to say. Because when looking at how lists and tags are used in practice, I feel that not everyone may share the same definition.

I would not want to see sentences associated with tags like "100 Chuvash sentences I want to learn" and lists like "simple present".

So this is interesting. Let's imagine for a moment that we introduce mass-tagging and someone starts tagging sentences with "100 Chuvash sentences I want to learn". Their reasoning is: "it's easier for me to use tags because there's no way of mass-listing".

  • How much control should Tatoeba have over that? Would it be okay for us to force them to use the lists instead?
  • Conversely, should we force someone who created a collaborative and public list named "present simple" to remove their list?

To elaborate, I'm asking all of this because of several reasons.

  1. If we agree that Tatoeba should have control over what tags are being created, this would be a clear difference between tags and lists: tags are regulated and fulfilling a common need, while lists are not. But then, surely, we want to do things in the right order and take care of issues like Prevent users from adding unnecessary new tags #305, Allow admins/corpus maintainers to merge tags #961 or Add a way to delete tags from the set of possible tags through the UI #330 before we consider mass-tagging. We should as well nominate an official tag maintainer.

  2. If on the other hand we agree that regulating tags is unsustainable and we should let people tag as they want, then we can implement mass-tagging anytime. But then we should consider, among other things, opening up tags to all contributors. Because why should it be just reserved for advanced contributors and corpus maintainers?

  3. The request of mass-tagging is coming from those who can tag, but I have not heard much from those who cannot tag. Is this really impacting and benefiting them? Assuming we keep tags as a restricted feature, wouldn't it be unfair to implement mass-tagging in priority over mass-listing? Especially if in the end, what is being done with tags can be achieved by lists too.

  4. The tags and the lists features need to evolve, that is for sure. But they have to evolve in a way that they don't overlap each other. If a user has the choice between tagging and adding to a list, and they feel it makes no difference whichever they choose, then something doesn't make sense. If a user ends up using tags when using lists would have been better (or vice versa), then something doesn't make sense. Obviously we want to avoid that. Understanding the difference between tags and lists is necessary for the long term, to build features that make sense.

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

what would drive you to search in the tags rather than searching in the lists?

One reason is that it's possible to search with more than one tag.

Here is an example of a search for imperative sentences that would be used in a restaurant.

https://tatoeba.org/eng/sentences/search?from=eng&tags=imperative%2C+restaurant

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

(2) ... Because why should it be just reserved for advanced contributors and corpus maintainers?

My feeling is that just about anybody who has been on the website for long enough to understand how it works should be given rights to tag sentences. Perhaps we need to be a little more careful about who gets the rights to link and unlink sentences, especially unlinking.

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

I think of a tag as a descriptor.

I agree with this, and for the most part, with the exception of the @ tags and the tags for quality, that's how they seem to be used, at least the ones with the most tags.
https://tatoeba.org/eng/tags/view_all

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

If you want to regulate tags, one possibility would be to not allow members to create new tags, but request new tags and have them added by an admin, or perhaps a corpus maintainer.

We could further limit tags that only fit into certain categories if you wanted to. This would prevent tags like "100 Chuvash sentences I want to learn".

At one time, I suggested sorting tags into categories and have a demo page for that online somewhere.
tenses
functions
situations
politeness levels
gender
author (by ...)
etc.

Being able to look at a list of tags sorted by categories would make it more obvious to people what tags are used for.

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

On a related note, it might be a good idea to also allow mass untagging of one's own tags.
This would allow a member to undo errors.

@ckjpn
Copy link

ckjpn commented Jul 29, 2019

Mass-tagging is a solution to something. But to what exactly?

A member could go through the exported data, choosing appropriate tags for sentences and then add the tags to sentences without needing to spend all the time that it would take to visit each page, choose a tag, wait for the tag to be added, and then choose the next tag.

Mass tagging would save a lot of time, making more efficient use of volunteers' time.

Imagine how nice it would be to have a large number of our sentences tagged with at least tenses, functions, situations and maybe a few others.

@Guybrush88
Copy link

Mass tagging would save a lot of time, making more efficient use of volunteers' time.

Imagine how nice it would be to have a large number of our sentences tagged with at least tenses, functions, situations and maybe a few others.

That's my main idea for mass tagging sentences.
Since I generally add tags for tenses in order to provide more detailed examples to users who want to use Tatoeba to study and/or teach languages through complete sentences, by mass tagging, I could provide much more tags for this purpose spending the same amount of time that I would generally spend without this feature.

@soliloquist-tatoeba
Copy link

It's better to use lists for some purposes like:

  • "X sentences I want to learn in the Y language"
  • "100 must-know phrases for beginners"
  • "My original sentences"

On the other hand, tags like '@change' and 'literal translation' may not work well with lists.

But for many other purposes (i.e. weather, football, maths etc.), both tags and lists would work fine. It's just a matter of choice which one we use. If we focus on this (rather large) gray area, then it might seem redundant to have both these features.

@ckjpn
Copy link

ckjpn commented Jul 30, 2019

But for many other purposes (i.e. weather, football, maths etc.), both tags and lists would work fine.

I don't agree with this. Tags would be much better, since someone looking for such sentences would want all such sentences, and not just one member's listed sentences.

If I wanted to find English sentences in the present simple tense, I'd want to find sentences tagged as such by others and not just sentences on a list I made.

@soliloquist-tatoeba
Copy link

@ckjpn
Yes, I'm aware of that current disadvantage with lists, but my comment was rather based on Trang's hypothetical approach, so I ignored it.

I'm more looking to understand how everyone is interpreting the notion of lists and the notion of tags. In other words, forget about how things are implemented now. Just imagine that we implemented all the possible features for lists and all the possible features for tags that you've dreamed of. What then, would be the difference between lists and tags?

On the collaborative aspect, just imagine that instead of the endless dropdown to add a sentence to a list, you have a text input with auto-suggestion. Just like tags have auto-suggestion. And so if there was a collaborative list named "Present simple", you would see it as a suggestion when you start typing "Pre...". And imagine it is easy to merge lists. So if I created a list "Present simple" and you created one too, I could easily transfer the sentences from my list into yours if you would agree. And imagine we also have macro-lists. What now?

@jiru
Copy link
Member

jiru commented Jul 30, 2019

This may be a little off-topic, but has anyone considered tagging a sentence group rather than a single sentence? By "sentence group" I mean all sentences that are linked to each other, no matter how "far" the link is, no matter how many levels of indirections there are in the group.

I believe that some tags, like sports or mathematics, are quite universal and if a member decides to tag it in its own language, it would be very beneficial for all the other languages involved in the sentence group to get the same tag, translated accordingly. This would allow to get sentences classified in languages where little or no members tag sentences. On the other hand, if the tag is conveying a concept that cannot be easily transposed to some other languages, it becomes a problem (or even a danger of assigning a foreign classification onto a different culture).

Anyway, my point is that the universality of a tag (or lack thereof) may be another way to look at how to organize and define tags vs. lists.

Lists, to the contrary, tend to be tied to a single language, even though it’s possible to create multi-language lists. Unless there are valid use cases for multi-language lists, we might want to enforce a single language for each list in order to better distinguish them from tags.

@Guybrush88
Copy link

This may be a little off-topic, but has anyone considered tagging a sentence group rather than a single sentence? By "sentence group" I mean all sentences that are linked to each other, no matter how "far" the link is, no matter how many levels of indirections there are in the group.

Personally, I'd prefer to mass tag sentences in the same language, so that I'm sure that I understand there's the proper usage of the tag I want to add.

Mass tagging whole groups of sentences, in my opinion, might be confusing, since maybe one of the far indirect sentences might convey completely different meanings and contexts (i.e.: a word in a language can mean many different things but has only one form that covers all the meanings, but, in a second language, it may have different forms for each meaning), so a single tag might not be ok for all the possible sentences.

@trang
Copy link
Member

trang commented Jul 31, 2019

Mass tagging would save a lot of time, making more efficient use of volunteers' time.

@ckjpn, @Guybrush88
I don't want you to feel like you have to defend the case of mass-tagging itself. Please know that as long as tags have a purpose in Tatoeba, then it will make sense to implement mass-tagging someday. I think no one will disagree with the fact that being able to do something in 10 seconds, instead of 10 minutes, is useful.

I've identified two scenarios of mass-tagging and as I've said: each use-case/scenario of mass-tagging will be handled in a separate issue.

I will create these issues in due time, we are not in a situation of urgency. But if you have another scenario of mass-tagging, then it's a good time to share them here.

If not, the rest of the discussion is to figure out what should be the difference between tags and lists. In regards of mass-tagging, this discussion mostly affects its prioritization. In the larger scheme, this discussion will help shaping the tags and the lists features in the long term.

Anyway, my point is that the universality of a tag (or lack thereof) may be another way to look at how to organize and define tags vs. lists.

This is an interesting point of view but we would have an issue with the verb tags ("present simple", "past simple", etc). Tenses are not universal but specific to a language.

Unless there are valid use cases for multi-language lists, we might want to enforce a single language for each list in order to better distinguish them from tags.

I searched for "music" in the lists and found this one:
https://tatoeba.org/eng/sentences_lists/show/466

It includes sentences in multiple languages. At that time this list was created, tags were already implemented. The contributor had access to the tag feature but chose nonetheless to make a list.

This led me to think of the following use-case: the user may not want to have all the sentences about music, just a custom list. Maybe they find some sentences too boring for their taste, maybe they want to avoid near-duplicates. The user may be browsing the sentences at random and can understand several languages, so whenever they see an interesting sentence about the topic they're collecting sentences for, they would just add it to their list, regardless of the language.

I don't think this use-case was the reason why the list I found had multiple languages though. But it would be a valid use-case to me.

@RyckRichards
Copy link
Member Author

RyckRichards commented Jul 31, 2019 via email

@jiru
Copy link
Member

jiru commented Aug 8, 2019

This led me to think of the following use-case: the user may not want to have all the sentences about music, just a custom list. Maybe they find some sentences too boring for their taste, maybe they want to avoid near-duplicates. The user may be browsing the sentences at random and can understand several languages, so whenever they see an interesting sentence about the topic they're collecting sentences for, they would just add it to their list, regardless of the language.

It makes sense as a use-case for multi-language lists. And this is very similar to Ricardo's use-case. Use tags as a way to browse sentences and add some to your personal list as you see fit.

Actually, maybe what really distinguish lists from tags is their scope of use. The scope of tags is the whole corpus while the scope of lists is restricted to one or more individuals’ needs.

We can also think about the wording. The word list (and especially my lists) convey the idea of an individual writing down an enumeration of sentences on a sheet of paper, which already some vague idea about the purpose of lists. However I think tag is very vague. It doesn’t give me any hint about the purpose. A word like category gives me the hint that this can be used to classify. (It’s just an example; I don’t want tags to be reworded into category.)

@agrodet
Copy link
Contributor

agrodet commented Mar 16, 2020

I'm thinking of summarizing, compacting, and filtering ideas about this topic. I think a lot has already been said so please do not restart the conversation to say or ask the same thing again. That will only make the work more discouraging and more difficult.

I just want to ask one question to the people who contributed here: Would a solution similar to how we can add lists in the new design be enough to fit your needs? Something like @soliloquist-tatoeba described in a comment above.
For example you click on "Add a tag" icon, and a small window appears to let you input your tag(s). Maybe the window will show the list of tags already applied, etc. The point is, would that fit your needs or not?

@Guybrush88
Copy link

@agrodet at least for me, this solution would work fine to speed up mass tagging, i think

@RyckRichards
Copy link
Member Author

Same here, @agrodet

@ckjpn
Copy link

ckjpn commented Mar 17, 2020

The advantage of allowing mass tagging as I used to be able do it would be that it would be easier to deal with sentences already in the database when tagging things like tenses, situations and functions. It is much easier and faster to work with text files offline than to have to use a web interface.

Adding something similar to how sentences can be added to lists in the new design for tags, too, would definitely be a good idea, which is somewhat similar to @soliloquist-tatoeba's idea.

@soliloquist-tatoeba
Copy link

There is an HTTP request (sorry if I named it wrong) for removing a tag from a sentence.

https://tatoeba.org/eng/tags/remove_tag_from_sentence/TAG_ID/SENTENCE_ID

There are also similar requests for adding/removing sentences to/from lists.

My question is, is it possible to add a particular tag (using the tag id) to a particular sentence in a similar manner? If it isn't, could it be possible to add such a function like the others? It could be used by some users for mass-tagging sentences without much programming skills and efforts.

@ckjpn
Copy link

ckjpn commented Jul 3, 2022

Related Wall Post: https://tatoeba.org/en/wall/show_message/38846#!#message_38846

If mass tagging were possible, then I, or someone else, could easily do what is suggested by this post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. unclear The issue, its scope or the goal are not clearly identified
Development

No branches or pull requests

8 participants