-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.5 is a Card, not Frac? #884
Comments
This seems so counterintuitive to me that I would like to suggest this should be discussed again. What do you think @dan-zeman @nschneid ? I don't see why "one" and "1" should have the same NumType but "half" and "0.5" shouldn't. |
Because "0.5" can be read as "zero point five" or just "point five". It is only through mathematical knowledge (not linguistic knowledge) that we know this is equivalent to 1/2 = one half. |
I actually wouldn't object to adding |
I don't have much to add here. I'd be fine with using just To sum up, I suppose that the original idea was that |
In English I suppose there is value in distinguishing the two senses of "third", the fractional one and the ordinal one, both of which are separate from cardinal "three". Corresponding to cardinal "two" are the fractional "half" and ordinal "second". A simple rule could be that NumType is strictly for the morphological paradigm, so in the case of words formed from digits without overt morphological marking, NumType=Card applies regardless of what is represented mathematically. |
This was originally an issue in UD_English-GUM. I moved it to docs because it has become a general question of how certain values of the |
Note: This Teitok query returns various occurrences across languages but note that the query operates on UD 2.7. |
I think that third should always be an ordinal, as it in fact is. The same goes for the Italian equivalent terzo, and the motivation behind is always expressing fractions as "the third part of a unity", and so on. While drittel in German is a candidate for I see a more general problem with symbolic expressions, which I lately stumbled upon for Roman numbers. One problem here, for example, is that a Roman number like XIX (in digits: 19) might be used for a cardinal or an ordinal, and maybe in general for any numeric expression. So I cannot sensibly choose a I wonder if we just shouldn't use |
Maybe the word "Cardinal" is confusing to people. I think it has been interpreted in the guidelines as "unmarked numeric" (whether expressed as a word or with digits or Roman numerals), as opposed to an expressly ordinal, fractional, or multiplicative form. But as @dan-zeman's treebank query shows, there are a few treebanks which use I don't see how this is relevant to the choice of |
It is relevant because we are not dealing with a lexical representation of numbers, but a symbolic one, so it becomes very problematic to associate lexical features like How should one label something like 2. intended as second? Not as PS: don't let me even start with |
I see, yes, it is unfortunate that there is some mixing of morphological and orthographic considerations. But I think saying that "two" is
Is the "." explicitly indicating that it is read as "second" as opposed to "two" (analogous to the suffix in 2nd)? If so then I think it's perfectly fine to call it |
If we agree on this (and I see a thumbs up from @dan-zeman too), then I am even more confused about why "1/3" in Arabic numerals would not be If the criterion for being |
That's not true in the case of "One third of the population ..." or other sentences where it represents a fraction |
And the representation of a fraction is achieved by means of a (substantivised) ordinal. The two forms are the same thing, we are just giving different contextual interpretations. |
I see the theoretical argument that there is a syntactic construction that expresses fractions using the ordinal forms of words...but this doesn't work for "half" in "one half" or "three halves". As a practical measure it seems reasonable to say that there are two slots in the paradigm with syncretism between the fractional and ordinal for most values. |
I agree that half indeed appears as a specific fractional numeral (to continue the parallel with italian: mezzo), and it might well be the only such form in English. Lower numbers often show peculiar patterns, especially a fraction as significant as 1/2. The syncretism would be total apart from this single case, as far as I know, and this advocates for not making a distinction! |
I understood this discussion to be about the universal category |
There's also "quarter" and arguably "percent". Still, I think those are two different meanings of third. If you order a hamburger and they ask if you want "1/4, 1/3, or 1/2 lb patty" and you respond "the third", it's pretty clear which one you want |
Actually I think that's ambiguous :) But that ambiguity supports there being two different meanings... |
Clearly it is ambiguous in terms of the English, but if you do that at a
restaurant, you will 100% get the 1/3lb hamburger. And yes, the conceptual
difference is why I came up with such a carefully crafted example
|
In some languages, ordinals expressed in digits are different from cardinals: 3 = "three", 3. = "third". Czech is one of those languages but unfortunately the tokenization used in the Czech data separates the period from the digit, so we don't really have a token that could get |
Agreed. But I presented them because I am not convinced that we need to distinguish |
Yes, quarter looks like that, nice! But about percent I am not so sure, it seems rather distributive, or something different altogether. Anyway, the ambiguity or absence of such a term is given by the context: both the real world, and also the words that accompany it, so we will see that when we have
I would still lean on seeing this as a symbolic, conventional representation beyond lexicon. Maybe we could just be content with a relation of |
No unless we also say that the lemma of 3 is three. Which we don't do in Czech. |
It seems the UD standard is to label things like 0.5 a Card, not a Frac
The text was updated successfully, but these errors were encountered: