Data by Nishi1999 #77

LinguList · 2016-11-01T15:58:29Z

Nishi is another dataset, just as Mann1998, consisting of morphemes. We need to do the following

link data to concepticon
write orthography profile
check phoneme inventories in the original source and if there are, type them off
provide a small description of the dataset

The data is pages 100-107 of Nishi 1999 (Four Papers on Burmese: Toward the history of Burmese (the Myanmar langauge) Tokyo: Institute for the Study of Language and Cultures of Asia and Africa (ILLCAA), Tokyo University of Foreign Studies). These data amount to 359 proposed cognates among eight languages, viz. Written Burmese, Spoken Burmese, Achang, Xiandao, Zaiwa, Leqi, Langsu, and Bola. The non-Burmese data are cited from "ZYC, except fora few Achang and Zaiwa forms which are supplied from (Xu and Xu 1984) and (Dai and Cai 1985). Note that entries of all the four Burmish languages, Burmese (and Mod. WB. transliterated by the Beijing method), Achang, Zaiwa and Langsu contained in ZYHC are supplied by the same authors as those in ZYC" (Nishi 1999: 96). (It looks like he also cites from Dai, et al. 1991).

The phoneme inventories are from the same work pp. 90-94. Nishi uses ñ in place of a sign the Chinese use for palatal n (not the usual one) and he uses ï for the apical vowel. He marks irregular cognates in bold, and he notes with the abbreviations x/x, d/c, and d.

LinguList · 2016-11-02T16:04:40Z

this is a first automatically created profile of sequences that need to explained.

Nishi1999.xlsx

The data looks rather messy, unfortunately, as there are many idiosyncratic characters, it seems.

nh36 · 2016-11-04T23:01:40Z

Nathan has made a new version of Nishi. The data are now cleaner. Mattis must recompile the orthography profile for Nathan to check.

nh36 · 2016-11-04T23:02:15Z

Nishi.ods.zip

LinguList · 2016-11-05T09:09:57Z

Short question: the bold things in Nishi, are they meaningful? If so, I'd try to search-and-replace them by * and code them differently, maybe adding a note for those words...

LinguList · 2016-11-05T09:38:07Z

Looks much nicer, just preparing the concept list. There was one hidden row, though, you should have a look (but I resolved it): line 367 was hidden, and so it was exported, but with many blank lines. I now moved it up where it belongs, and also corrected one nishi gloss: to dye instead of dye (cloth), line 100. BTW: it's good having the original source noted there, as in this way, we can trace back to Sun 1991 (that is ZMYYC, right?).

LinguList · 2016-11-05T09:43:24Z

so here is the currently mapped data for Nishi, automatic mapping, percentage: 0.79, not bad actually:

Nishi-1999-mapped.ods.zip

I leave that to @nh36 to have a closer look at it, but will later double-check your cleaned version. The algorithm is better now, but also yields a lot of possibilities, yet I consider this as important, as we should be as strict as possible with those mappings.

LinguList · 2016-11-05T10:00:45Z

and here's the new test for the profile. Not much changed, to be honest, but it looks clearer now. I suppose, it's time to just work with the data as is, there are some five exceptions with tones, but I will handle them explicitly once I run the profile to re-create the data.

Nishi1999-prf.xlsx

nh36 · 2016-11-05T10:01:30Z

The things in bold are those that Nishi himself identified to be irregular.
ZMYYC is Sun1991, it is his usual source. But note that he uses several
other sources, when he thought he got better data there.

Dr Nathan W. Hill
Reader in Tibetan and Historical Linguistics
Department of China & Inner Asia and Department of Linguistics
SOAS, University of London
Thornhaugh Street, Russell Square, London WC1H 0XG, UK
Tel: +44 (0)20 7898 4512

Profile -- http://www.soas.ac.uk/staff/staff46254.php

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

On Sat, Nov 5, 2016 at 9:09 AM, Johann-Mattis List <notifications@github.com

wrote:

Short question: the bold things in Nishi, are they meaningful? If so,
I'd try to search-and-replace them by * and code them differently, maybe
adding a note for those words...

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#77 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIdHxceIXnXfJHjGd8wMpjw-c1ZBkPaPks5q7EflgaJpZM4KmQH1
.

LinguList · 2016-11-05T10:01:47Z

Just saw: even better, as you halved the number of rows, so this is really working nicely now!

LinguList · 2016-11-05T10:03:39Z

Once we have linked Sun1991 to concepticon, we can directly compare across the sources (also provided we have orthography profiles for Sun1991).

BTW: this workflow we are following up now could definitely be optimized. I think it is time I start thinking about a script to run to create an initial orthography profile for a given dataset. I'll make that an issue, and I'll probably handle it by writing a new function for either lingpy or the original orthography profile code, as it is of general interest to users, I'd say.

nh36 · 2016-11-05T10:09:05Z

So, I guess I will hold off on the concepticon mapping and orthography profile for Nishi1999, since they can instead be done as part of Sun1991 or using a (semi-)automatic system that you will develop. Oder?

LinguList · 2016-11-05T10:14:11Z

Every source has it's own right, and as far as I can see, we don't know whether Sun1991 uses the same concept labels, and the same orthography. Nishi may have well adjusted those. And since Sun1991 is also originally Chinese, there may be some divergences in translation. So I prefer to do the work two times, on time for Nishi and one time for Sun and then check the overlap, which will also be interesting as a scientific study on the sociology of research, as I guess we may find some coding errors, and it is interesting to see how they could influence an analysis.

nh36 · 2016-11-05T10:16:47Z

Ok, in that case I will do the Nishi. You can work on automating things,
but at least the Nishi will be done already.

Dr Nathan W. Hill
Reader in Tibetan and Historical Linguistics
Department of China & Inner Asia and Department of Linguistics
SOAS, University of London
Thornhaugh Street, Russell Square, London WC1H 0XG, UK
Tel: +44 (0)20 7898 4512

Profile -- http://www.soas.ac.uk/staff/staff46254.php

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

On Sat, Nov 5, 2016 at 10:14 AM, Johann-Mattis List <
notifications@github.com> wrote:

Every source has it's own right, and as far as I can see, we don't know
whether Sun1991 uses the same concept labels, and the same orthography.
Nishi may have well adjusted those. And since Sun1991 is also originally
Chinese, there may be some divergences in translation. So I prefer to do
the work two times, on time for Nishi and one time for Sun and then check
the overlap, which will also be interesting as a scientific study on the
sociology of research, as I guess we may find some coding errors, and it is
interesting to see how they could influence an analysis.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#77 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIdHxf7U08ktZ9KDDNRlV8QlkdLjqcaXks5q7FbzgaJpZM4KmQH1
.

LinguList · 2016-11-05T10:19:45Z

yep, exactly what I was thinking. There is a possibility that we are doing more than necessary here, but I prefer taking the risk over risking to further change the data in any way. Sun1991 is extremely interesting for us, but Nishi is a lower-hanging fruit and also important for the QPA, as with this source, and with Mann, we have then concrete tests where we can compare with your analysis of Huang1992. Already that comparison will be some research that has not been carried out yet, I'd say.

nh36 · 2016-11-05T17:34:06Z

Here is the Nishi concepticon mapping. In many cases I have left some
ambiguities, generally this is where Nishi seems to want to combine two
concepticon concepts. In some cases I changed the automatic map to '???'
because none of the concepticon concepts seems to work (e.g. dream vi,
which is certainly not the same as dream (something).

I also attach the Nishi orthography profile, but I am not sure it is done
correctly. I have fixed mistakes where I have seen them (mostly changing t
s into ts and things like that), but I find it odd that whole words come up
unsegmented into initial and final.

Dr Nathan W. Hill
Reader in Tibetan and Historical Linguistics
Department of China & Inner Asia and Department of Linguistics
SOAS, University of London
Thornhaugh Street, Russell Square, London WC1H 0XG, UK
Tel: +44 (0)20 7898 4512

Profile -- http://www.soas.ac.uk/staff/staff46254.php

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

On Sat, Nov 5, 2016 at 10:19 AM, Johann-Mattis List <
notifications@github.com> wrote:

yep, exactly what I was thinking. There is a possibility that we are
doing more than necessary here, but I prefer taking the risk over
risking to further change the data in any way. Sun1991 is extremely
interesting for us, but Nishi is a lower-hanging fruit and also
important for the QPA, as with this source, and with Mann, we have then
concrete tests where we can compare with your analysis of Huang1992.
Already that comparison will be some research that has not been carried
out yet, I'd say.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#77 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIdHxWRjobZOoQ72A7z3Li2ju2X-RhCiks5q7FhBgaJpZM4KmQH1
.

LinguList · 2016-11-05T18:21:13Z

alright, thanks! As to the point with whole words unsegmented: this means that the algorithm did not find a vowel. This is easy to explain as, as I said, LingPy doesn't know that an a with a dot under it is a vowel, as this is the first time LingPy is confronted with it. I should add the dotty things to LingPy, but it is a bit tedious and not fun, so I know I need to do it, but I don't want to do it now. And I keep being annoyed by the zipfian distributions. LingPy recognizes an enormous amount of sounds correctly now, but each dataset keeps having just one other sound I did not see before. The rule was: if the algor does not find a vowel, it would just show the full word form, as works in cases of syllabic nasals, for example, where we need to re-map anyway. I'll work from there and prepare an updated version of the profile, so you can see what I would do in those cases.

LinguList · 2016-11-05T18:22:47Z

Ah: could you upload the profile and the mapping on the website? If you attach it in an email, it does not get submitted...

nh36 · 2016-11-05T21:46:48Z

Nishi1999-prf corrected by NH.xlsx

nh36 · 2016-11-05T21:47:28Z

Nishi-1999-mapped corrected by NH.ods.zip

nh36 · 2016-12-11T16:45:50Z

Nathan still needs to--

_ check phoneme inventories in the original source and if there are, type them off
_ provide a small description of the dataset

nh36 · 2017-01-02T22:05:17Z

Here is the phoneme inventory for Nishi 1999, I am not sure it is the format you will want, but it should work one way or another.

x means 'doesn't have'
check means 'has'
airplane means 'only in loans'

LinguList · 2017-01-03T07:18:02Z

Nishi phoneme inventory.xlsx

Excellent, I just uploaded it here, but have it locally as well. I'll change tone letters to upper case, but otherwise, the format is very convenient, and it probably directly qualifies as a orthography profile (but will need to test this).

nh36 · 2017-01-03T10:37:22Z

So, this issue can be closed, right? Although my data description, at the top, probably needs to be moved somewhere else.

LinguList · 2017-01-03T10:48:03Z

don't close right now, as I'll need to add the profile to the repository, I just assigned myself to get this finalized.

nh36 · 2017-02-11T16:26:12Z

Please send an update on this thread.

LinguList · 2017-02-11T18:51:54Z

Okay, Nishi1999 is the next target, as Mann1998, Nishi1999 and Huang1992 seem to be central (and the other Chinese source, whose name I keep forgetting...).

nh36 · 2017-02-17T19:32:19Z

This issue may now be superseded. Please review and confirm.

LinguList · 2017-02-17T19:42:06Z

Yes, the issue which follows on this is the wrong concept list in the csv-file #90, all relevant data should be there. Please look in the folder called "raw" in Nishi for the csv-file I have been using (downloading and opening in openoffice should be straightforward, I hope).

LinguList added the new dataset label Nov 1, 2016

LinguList mentioned this issue Nov 5, 2016

write a preparse-code for initial orthography profile construction #83

Closed

LinguList self-assigned this Jan 3, 2017

LinguList closed this as completed Feb 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data by Nishi1999 #77

Data by Nishi1999 #77

LinguList commented Nov 1, 2016 •

edited

Loading

LinguList commented Nov 2, 2016

nh36 commented Nov 4, 2016

nh36 commented Nov 4, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016 via email

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

LinguList commented Nov 5, 2016 via email

nh36 commented Nov 5, 2016

LinguList commented Nov 5, 2016 via email

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

nh36 commented Nov 5, 2016

nh36 commented Dec 11, 2016

nh36 commented Jan 2, 2017

LinguList commented Jan 3, 2017

nh36 commented Jan 3, 2017

LinguList commented Jan 3, 2017

nh36 commented Feb 11, 2017

LinguList commented Feb 11, 2017

nh36 commented Feb 17, 2017

LinguList commented Feb 17, 2017

Data by Nishi1999 #77

Data by Nishi1999 #77

Comments

LinguList commented Nov 1, 2016 • edited Loading

LinguList commented Nov 2, 2016

nh36 commented Nov 4, 2016

nh36 commented Nov 4, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

LinguList commented Nov 5, 2016

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016 via email

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

LinguList commented Nov 5, 2016 via email

nh36 commented Nov 5, 2016

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

LinguList commented Nov 5, 2016 via email

LinguList commented Nov 5, 2016

nh36 commented Nov 5, 2016

nh36 commented Nov 5, 2016

nh36 commented Dec 11, 2016

nh36 commented Jan 2, 2017

LinguList commented Jan 3, 2017

nh36 commented Jan 3, 2017

LinguList commented Jan 3, 2017

nh36 commented Feb 11, 2017

LinguList commented Feb 11, 2017

nh36 commented Feb 17, 2017

LinguList commented Feb 17, 2017

LinguList commented Nov 1, 2016 •

edited

Loading