-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data by Nishi1999 #77
Comments
this is a first automatically created profile of sequences that need to explained. The data looks rather messy, unfortunately, as there are many idiosyncratic characters, it seems. |
Nathan has made a new version of Nishi. The data are now cleaner. Mattis must recompile the orthography profile for Nathan to check. |
Short question: the bold things in Nishi, are they meaningful? If so, I'd try to search-and-replace them by |
Looks much nicer, just preparing the concept list. There was one hidden row, though, you should have a look (but I resolved it): line 367 was hidden, and so it was exported, but with many blank lines. I now moved it up where it belongs, and also corrected one nishi gloss: to dye instead of dye (cloth), line 100. BTW: it's good having the original source noted there, as in this way, we can trace back to Sun 1991 (that is ZMYYC, right?). |
so here is the currently mapped data for Nishi, automatic mapping, percentage: 0.79, not bad actually: I leave that to @nh36 to have a closer look at it, but will later double-check your cleaned version. The algorithm is better now, but also yields a lot of possibilities, yet I consider this as important, as we should be as strict as possible with those mappings. |
and here's the new test for the profile. Not much changed, to be honest, but it looks clearer now. I suppose, it's time to just work with the data as is, there are some five exceptions with tones, but I will handle them explicitly once I run the profile to re-create the data. |
The things in bold are those that Nishi himself identified to be irregular. Dr Nathan W. Hill Profile -- http://www.soas.ac.uk/staff/staff46254.php Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/On Sat, Nov 5, 2016 at 9:09 AM, Johann-Mattis List <notifications@github.com
|
Just saw: even better, as you halved the number of rows, so this is really working nicely now! |
Once we have linked Sun1991 to concepticon, we can directly compare across the sources (also provided we have orthography profiles for Sun1991). BTW: this workflow we are following up now could definitely be optimized. I think it is time I start thinking about a script to run to create an initial orthography profile for a given dataset. I'll make that an issue, and I'll probably handle it by writing a new function for either lingpy or the original orthography profile code, as it is of general interest to users, I'd say. |
So, I guess I will hold off on the concepticon mapping and orthography
profile for Nishi1999, since they can instead be done as part of Sun1991 or
using a (semi-)automatic system that you will develop. Oder?
|
Every source has it's own right, and as far as I can see, we don't know whether Sun1991 uses the same concept labels, and the same orthography. Nishi may have well adjusted those. And since Sun1991 is also originally Chinese, there may be some divergences in translation. So I prefer to do the work two times, on time for Nishi and one time for Sun and then check the overlap, which will also be interesting as a scientific study on the sociology of research, as I guess we may find some coding errors, and it is interesting to see how they could influence an analysis. |
Ok, in that case I will do the Nishi. You can work on automating things, Dr Nathan W. Hill Profile -- http://www.soas.ac.uk/staff/staff46254.php Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/On Sat, Nov 5, 2016 at 10:14 AM, Johann-Mattis List <
|
yep, exactly what I was thinking. There is a possibility that we are
doing more than necessary here, but I prefer taking the risk over
risking to further change the data in any way. Sun1991 is extremely
interesting for us, but Nishi is a lower-hanging fruit and also
important for the QPA, as with this source, and with Mann, we have then
concrete tests where we can compare with your analysis of Huang1992.
Already that comparison will be some research that has not been carried
out yet, I'd say.
|
Here is the Nishi concepticon mapping. In many cases I have left some I also attach the Nishi orthography profile, but I am not sure it is done Dr Nathan W. Hill Profile -- http://www.soas.ac.uk/staff/staff46254.php Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/On Sat, Nov 5, 2016 at 10:19 AM, Johann-Mattis List <
|
alright, thanks! As to the point with whole words unsegmented: this
means that the algorithm did not find a vowel. This is easy to explain
as, as I said, LingPy doesn't know that an a with a dot under it is a
vowel, as this is the first time LingPy is confronted with it. I should
add the dotty things to LingPy, but it is a bit tedious and not fun, so
I know I need to do it, but I don't want to do it now. And I keep being
annoyed by the zipfian distributions. LingPy recognizes an enormous
amount of sounds correctly now, but each dataset keeps having just one
other sound I did not see before. The rule was: if the algor does not
find a vowel, it would just show the full word form, as works in cases
of syllabic nasals, for example, where we need to re-map anyway. I'll
work from there and prepare an updated version of the profile, so you
can see what I would do in those cases.
|
Ah: could you upload the profile and the mapping on the website? If you attach it in an email, it does not get submitted... |
Nathan still needs to-- _ check phoneme inventories in the original source and if there are, type them off |
Here is the phoneme inventory for Nishi 1999, I am not sure it is the format you will want, but it should work one way or another. x means 'doesn't have' |
Excellent, I just uploaded it here, but have it locally as well. I'll change tone letters to upper case, but otherwise, the format is very convenient, and it probably directly qualifies as a orthography profile (but will need to test this). |
So, this issue can be closed, right? Although my data description, at the top, probably needs to be moved somewhere else. |
don't close right now, as I'll need to add the profile to the repository, I just assigned myself to get this finalized. |
Please send an update on this thread. |
Okay, Nishi1999 is the next target, as Mann1998, Nishi1999 and Huang1992 seem to be central (and the other Chinese source, whose name I keep forgetting...). |
This issue may now be superseded. Please review and confirm. |
Yes, the issue which follows on this is the wrong concept list in the csv-file #90, all relevant data should be there. Please look in the folder called "raw" in Nishi for the csv-file I have been using (downloading and opening in openoffice should be straightforward, I hope). |
Nishi is another dataset, just as Mann1998, consisting of morphemes. We need to do the following
The data is pages 100-107 of Nishi 1999 (Four Papers on Burmese: Toward the history of Burmese (the Myanmar langauge) Tokyo: Institute for the Study of Language and Cultures of Asia and Africa (ILLCAA), Tokyo University of Foreign Studies). These data amount to 359 proposed cognates among eight languages, viz. Written Burmese, Spoken Burmese, Achang, Xiandao, Zaiwa, Leqi, Langsu, and Bola. The non-Burmese data are cited from "ZYC, except fora few Achang and Zaiwa forms which are supplied from (Xu and Xu 1984) and (Dai and Cai 1985). Note that entries of all the four Burmish languages, Burmese (and Mod. WB. transliterated by the Beijing method), Achang, Zaiwa and Langsu contained in ZYHC are supplied by the same authors as those in ZYC" (Nishi 1999: 96). (It looks like he also cites from Dai, et al. 1991).
The phoneme inventories are from the same work pp. 90-94. Nishi uses ñ in place of a sign the Chinese use for palatal n (not the usual one) and he uses ï for the apical vowel. He marks irregular cognates in bold, and he notes with the abbreviations x/x, d/c, and d.
The text was updated successfully, but these errors were encountered: