-
Notifications
You must be signed in to change notification settings - Fork 3
[Transcription System] X-SAMPA and/or Kirshenbaum #84
Comments
Yes, I was thinking about that. But given that there's a JS-sampa application which I regularly use (and which people could use to insert data), I was asking myself if it is needed in the end, since, SAMPA is more thought of as a system that renders to a certain subset of Unicode symbols using ASCII-chars, right? One could, however, also just test how far we can go with this. I recommend for rendering, however, to check both the lingpy-sampa symbols, and BXS.vim, which I further extended over the last years: In fact, one could probably use this to replicate BIPA in Sampa-form, maybe a good idea? |
I was thinking about adding them so that clts would not need to rely, not
even on the web interface, on such tools (which are good and useful, of
course). The benefits of a single programmer interface don't need to be
mentioned -- I am actually being a bit selfish here, as I am starting to
rely on clts while developing my model.
A full clts integration would also bring some advantages, such as being
able to translate a feature description into an X-sampa "grapheme". It
would also provide some additional statistics, which is always good.
I could take care of this once I finish UP A and Ruhlen, if you want.
Em 5 de jan de 2018 1:03 PM, "Johann-Mattis List" <notifications@github.com>
escreveu:
… Yes, I was thinking about that. But given that there's a JS-sampa
application which I regularly use (and which people could use to insert
data), I was asking myself if it is needed in the end, since, SAMPA is more
thought of as a system that renders to a certain subset of Unicode symbols
using ASCII-chars, right? One could, however, also just test how far we can
go with this. I recommend for rendering, however, to check both the
lingpy-sampa symbols, and BXS.vim, which I further extended over the last
years:
bxs.vim.txt <https://github.com/cldf/clts/files/1606915/bxs.vim.txt>
In fact, one could probably use this to replicate BIPA in Sampa-form,
maybe a good idea?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAar99_fybWutaJiDt2xVq7VbrEjyTTyks5tHjnBgaJpZM4RUdHk>
.
|
Yes, excellent! I agree that having this would also facilitate handling for those who use the python interface. |
The more I think about it, the more I think that SAMPA is not a transcription system, but a transliteration for IPA. What we can consider instead, maybe, is to take the function for sampa2uni from lingpy and plant it into the util or another part of the package, to make clear that sampa-conversion is the task of CLTS, but not treat it as a transcription system. Since SAMPA can be further extended to cover more than the usual symbols, we could even then add a sampa-keyword to all transcription system methods, and would just be able to query strings using SAMPA, but not forcing it to be used as a full-fledged way to transcribe things. LingPy's sampa2uni-function in fact handles most cases we would expect, so it would be straightforward to just take it from there and later kick it out of lingpy. |
That's a bit philosophical, isn't it? What would be the difference between "transliteration of IPA" and "transcription system"? When you say
then that's exactly the problem CLTS should solve: I.e. turning implicit, possibly complex code into declarative, transparent data. So resorting to using the old code after all, when the aim was to describe what it actually is that "we expect", seems like failure. |
But the essence of sampa is completely different from the esssence of transcription systems. The parsing algorithm needs to be different, since SAMPA does not show the distinction between diachritics and base characters, but instead uses sub-characters to turn a base character into a diacritic. As a result, you cannot parse the grapheme |
Ah ok, I see. So basically, SAMPA is orthography, thus should be handled via orthography profiles. That's ok, since this uses a different, but also well-described and transparent mechanism :) |
Yes, I think this is the best way to go: we make a huge orthoprofile (no need to use lingpy's algo) with all 6000+ symbols, converting them to sampa where possible, provide it as an orthography profile and allow to load it quickly. I am just wondering: as ortho-profiles are in some way important for CLTS, should we consider putting the sampa-profile into the segments-package, or rather into the clts-package? |
I guess it would make sense in the segments package, considering that SAMPA is one of the most prominent ways to write IPA. This would increase the immediate usefulness of the segments package, beyond "just" proper UNICODE tokenization. |
Okay. I suppose we transfer this issue now and make the argument that with help of our ~ 6000 segments, we could easily just provide a huge number of possible segmentations even of SAMPA (maybe excluding clusters, as they will mess it up), and it can be included in the next release of segments. At the same time, we may also use our 6000+ graphemes in our BIPA system to produce an orthography profile that could at some point be used instead of lingpy's segmentation algorithm. |
Should these systems be considered? They would be good candidates for inclusion if we plan to develop a method to recognize an unknown transcription system. They might also be an alternative for situations in which Unicode is still not acceptable, like a fall-back.
The inclusion would be straightforward, a preliminary mapping can be done simply by using the Wikipedia articles:
The text was updated successfully, but these errors were encountered: