Update normalize.tsv #106

cormacanderson · 2021-05-19T10:53:35Z

Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v.
(Currently checking IE-CoR transcriptions and have found cases)

Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v

LinguList · 2021-05-19T11:16:54Z

This is fine, will however cause a LOT of trouble, as many people use the damn L in our data, so this HAS side effects. I'd be inclined to leave the normalization, since if something has been normalized, this can always be retrieved from the code, and we even through warnings, if this happens now.

LinguList · 2021-05-19T11:19:12Z

But an alias is a good idea: this means, you will retain the original character, and can from there check that it is different. Is that a good compromise?

cormacanderson · 2021-05-19T11:20:33Z

I get that and it's annoying. But if people sometimes use ł to mean ɫ and sometimes to mean ɬ I don't see how either normalisation or an alias is likely to work. Is this not rather something that should be dealt with by an orthography profile?

cormacanderson · 2021-05-19T11:21:35Z

Leaving it in as a normalisation seems very risky to me. ɫ and ɬ are quite different beasts...

LinguList · 2021-05-19T12:24:20Z

so make an alias, as I said.

LinguList · 2021-05-19T12:24:39Z

If you use orthography profiles by now, they WILL point you to the use of aliases.

LinguList · 2021-05-19T12:24:55Z

Also to normalizations, so you can trace this as a researcher.

LinguList · 2021-05-19T12:27:35Z

E.g., this is the output we now (review pending) receive on a profile check in CLDF:

Found 17 generated graphemes

Grapheme	BIPA	Segments	Graphemes	Count
eu	eu	k i 5/⁵ + f eu 1/¹	ki5_feu1	1
ɯɘ	ɯɘ	ts ɯɘ ŋ *5/⁵	tsɯɘŋ5	1
ɛi	ɛi	n ɛi *3/³	nɛi3	2
oːi	oːi	ɬ oːi *1/¹	ɬo:i1	3
oi	oi	f oi l *1/¹	foil1	3
iu	iu	kʰ iu *1/¹	khiu1	5
ɔi	ɔi	tθ ɔi *1/¹	tθɔi1	6
iːu	iːu	r iːu *1/¹	ri:u1	7
uːi	uːi	b uːi 3/³ + b uːi 3/³	bu:i3_bu:i3	10
ui	ui	x ui *3/³	xui3	11
aːi	aːi	b aːi 3/³ + b aːi 3/³	ba:i3_ba:i3	25
ou	ou	h ou *1/¹	hou1	29
ai	ai	ʔ ai 1/¹ + l a 3/³	Ɂai1_la3	37
ei	ei	n ei *2/²	nei2	43
eɯ	eɯ	m eɯ *1/¹	meɯ1	45
au	au	f au *1/¹	fau1	59
aːu	aːu	t aːu *3/³	ta:u3	63

Found 13 modified graphemes

Grapheme	BIPA	Segments	Graphemes	Count
lh/l̥	l̥	d u ŋ 2/² + lh/l̥ a n 4/⁴	duŋ2_lhan4	1
*9/⁵⁴	⁵⁴	t i k *9/⁵⁴	tik9	4
ɘ/ə	ə	k ɘ/ə *5/⁵	kɘ5	5
i/j	j	m ɔ *5/⁵ + m i/j ɘ/ə	mɔ5_miɘ	12
u/w	w	ɬ u/w ai *1/¹	ɬuai1	12
*6/⁵¹	⁵¹	ʔ a 1/¹ + r a 6/⁵¹	Ɂa1_ra6	33
*8/⁵³	⁵³	m eɯ ʔ *8/⁵³	meɯɁ8	34
*5/⁵	⁵	k ɘ/ə *5/⁵	kɘ5	54
*4/⁴	⁴	m eɯ *4/⁴	meɯ4	83
*2/²	²	ʔ a 2/² + r ou 1/¹	Ɂa2_rou1	85
*7/⁵²	⁵²	h ou ʔ *7/⁵²	houɁ7	96
*3/³	³	d e *3/³	de3	203
*1/¹	¹	h ou *1/¹	hou1	432

cormacanderson · 2021-05-19T12:38:36Z

Okay, can you clarify for me though what I exactly you are saying I should add as an alias? You mean ł as an alias of lˠ, right, alongside ɫ? Or do you mean I should rather add it as an alias of ɬ?

I thought normalisation was for lookalikes, like : for ː etc. not for cases like this...

LinguList · 2021-07-28T16:23:56Z

sorry, I never saw your reply, @cormacanderson. I'll have to look into this tomorrow!

LinguList · 2021-08-01T09:31:27Z

So for me, the ł is a lookalike. It is less clear what KIND of lookalike it is, but you find it frequently in the literature, for both cases you mention. It may indeed be useful to say: let us drop our normalization, as it only normalizes arbitrarily in one direction, and let us force people to be more exact.

LinguList · 2021-08-01T09:32:23Z

So yes, @cormacanderson, just merge the PR if you agree still with the decision. For an update, this may force us to adjust some tests, but it is for the sake of clarity.

Update normalize.tsv

2a00465

Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v

cormacanderson requested a review from LinguList May 19, 2021 10:53

cormacanderson merged commit ac42154 into master Aug 1, 2021

cormacanderson mentioned this pull request Sep 2, 2021

Removing double vowels from vowels.tsv #120

Closed

xrotwang deleted the cormacanderson-patch-2 branch October 13, 2021 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update normalize.tsv #106

Update normalize.tsv #106

cormacanderson commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

cormacanderson commented May 19, 2021 •

edited

Loading

cormacanderson commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

cormacanderson commented May 19, 2021

LinguList commented Jul 28, 2021

LinguList commented Aug 1, 2021

LinguList commented Aug 1, 2021

Update normalize.tsv #106

Update normalize.tsv #106

Conversation

cormacanderson commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

cormacanderson commented May 19, 2021 • edited Loading

cormacanderson commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

LinguList commented May 19, 2021

Found 17 generated graphemes

Found 13 modified graphemes

cormacanderson commented May 19, 2021

LinguList commented Jul 28, 2021

LinguList commented Aug 1, 2021

LinguList commented Aug 1, 2021

cormacanderson commented May 19, 2021 •

edited

Loading