Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update normalize.tsv #106

Merged
merged 1 commit into from
Aug 1, 2021
Merged

Update normalize.tsv #106

merged 1 commit into from
Aug 1, 2021

Conversation

cormacanderson
Copy link
Collaborator

Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v.
(Currently checking IE-CoR transcriptions and have found cases)

Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v
@LinguList
Copy link
Contributor

This is fine, will however cause a LOT of trouble, as many people use the damn L in our data, so this HAS side effects. I'd be inclined to leave the normalization, since if something has been normalized, this can always be retrieved from the code, and we even through warnings, if this happens now.

@LinguList
Copy link
Contributor

But an alias is a good idea: this means, you will retain the original character, and can from there check that it is different. Is that a good compromise?

@cormacanderson
Copy link
Collaborator Author

cormacanderson commented May 19, 2021

I get that and it's annoying. But if people sometimes use ł to mean ɫ and sometimes to mean ɬ I don't see how either normalisation or an alias is likely to work. Is this not rather something that should be dealt with by an orthography profile?

@cormacanderson
Copy link
Collaborator Author

Leaving it in as a normalisation seems very risky to me. ɫ and ɬ are quite different beasts...

@LinguList
Copy link
Contributor

so make an alias, as I said.

@LinguList
Copy link
Contributor

If you use orthography profiles by now, they WILL point you to the use of aliases.

@LinguList
Copy link
Contributor

Also to normalizations, so you can trace this as a researcher.

@LinguList
Copy link
Contributor

E.g., this is the output we now (review pending) receive on a profile check in CLDF:

Found 17 generated graphemes

Grapheme BIPA Modified Segments Graphemes Count
eu eu k i *5/⁵ + f eu *1/¹ ki5_feu1 1
ɯɘ ɯɘ ts ɯɘ ŋ *5/⁵ tsɯɘŋ5 1
ɛi ɛi n ɛi *3/³ nɛi3 2
oːi oːi ɬ oːi *1/¹ ɬo:i1 3
oi oi f oi l *1/¹ foil1 3
iu iu kʰ iu *1/¹ khiu1 5
ɔi ɔi tθ ɔi *1/¹ tθɔi1 6
iːu iːu r iːu *1/¹ ri:u1 7
uːi uːi b uːi *3/³ + b uːi *3/³ bu:i3_bu:i3 10
ui ui x ui *3/³ xui3 11
aːi aːi b aːi *3/³ + b aːi *3/³ ba:i3_ba:i3 25
ou ou h ou *1/¹ hou1 29
ai ai ʔ ai *1/¹ + l a *3/³ Ɂai1_la3 37
ei ei n ei *2/² nei2 43
m eɯ *1/¹ meɯ1 45
au au f au *1/¹ fau1 59
aːu aːu t aːu *3/³ ta:u3 63

Found 13 modified graphemes

Grapheme BIPA Segments Graphemes Count
lh/l̥ d u ŋ *2/² + lh/l̥ a n *4/⁴ duŋ2_lhan4 1
*9/⁵⁴ ⁵⁴ t i k *9/⁵⁴ tik9 4
ɘ/ə ə k ɘ/ə *5/⁵ kɘ5 5
i/j j m ɔ *5/⁵ + m i/j ɘ/ə mɔ5_miɘ 12
u/w w ɬ u/w ai *1/¹ ɬuai1 12
*6/⁵¹ ⁵¹ ʔ a *1/¹ + r a *6/⁵¹ Ɂa1_ra6 33
*8/⁵³ ⁵³ m eɯ ʔ *8/⁵³ meɯɁ8 34
*5/⁵ k ɘ/ə *5/⁵ kɘ5 54
*4/⁴ m eɯ *4/⁴ meɯ4 83
*2/² ² ʔ a *2/² + r ou *1/¹ Ɂa2_rou1 85
*7/⁵² ⁵² h ou ʔ *7/⁵² houɁ7 96
*3/³ ³ d e *3/³ de3 203
*1/¹ ¹ h ou *1/¹ hou1 432

@cormacanderson
Copy link
Collaborator Author

Okay, can you clarify for me though what I exactly you are saying I should add as an alias? You mean ł as an alias of lˠ, right, alongside ɫ? Or do you mean I should rather add it as an alias of ɬ?

I thought normalisation was for lookalikes, like : for ː etc. not for cases like this...

@LinguList
Copy link
Contributor

sorry, I never saw your reply, @cormacanderson. I'll have to look into this tomorrow!

@LinguList
Copy link
Contributor

So for me, the ł is a lookalike. It is less clear what KIND of lookalike it is, but you find it frequently in the literature, for both cases you mention. It may indeed be useful to say: let us drop our normalization, as it only normalizes arbitrarily in one direction, and let us force people to be more exact.

@LinguList
Copy link
Contributor

So yes, @cormacanderson, just merge the PR if you agree still with the decision. For an update, this may force us to adjust some tests, but it is for the sake of clarity.

@cormacanderson cormacanderson merged commit ac42154 into master Aug 1, 2021
@xrotwang xrotwang deleted the cormacanderson-patch-2 branch October 13, 2021 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants