thai pronunciation data missing #231

guttenberger · 2023-04-13T17:48:13Z

Hi first of all thank you for this project. I downloaded the thai dictionary from https://kaikki.org/dictionary/Thai/index.html and have notice that the romanisatzion is missing in the json entries.

For example the romanisatzion for https://en.wiktionary.org/wiki/เขียว

is missing inside the json file:

{ 
...
    "word": "เขียว",
    "lang": "Thai",
    "lang_code": "th",
    "sounds": [
        {
            "ipa": "/kʰia̯w˩˩˦/",
            "tags": [
                "standard"
            ]
        }
    ],
...
}

eventhough the romanization is there for synonyms

"synonyms": [
        {
            "roman": "kǐao",
            "word": "ขยว",
            "_dis1": "0 0 0 0 0"
        }
    ],

I find the Paiboon romanization particularly useful since it clearly indicates the tone that needs to be used , very important because thai is a tonal language and the correct use of tones is essential to convey meaning accurately.

The text was updated successfully, but these errors were encountered:

jmviz · 2023-04-14T01:19:54Z

The romanization appears to be intact in forms:

"forms": [
    {
      "form": "kǐao",
      "tags": [
        "romanization"
      ]
    }
  ],

kristian-clausal · 2023-04-14T04:52:03Z

Romanizations belong in forms, and I don't think we'll start adding them to pronunciation data just for Thai. Ideally, romanization data is collected and added to forms, just like we separate out hyphenation out of pronunciation sections. The alternative Royal Institute romanization is missing, so we don't collect romanization data from Pronunciation sections, which might be something to look at later.

The IPA has tones at the end: ˩˩˦.

guttenberger · 2023-04-14T07:16:08Z

ok thank you :)
i also noticed that the audio is missing in the jsons for intance
https://en.wiktionary.org/wiki/ฝรั่ง

has audio but a reference can not be found in
https://kaikki.org/dictionary/All%20languages%20combined/meaning/ฝ/ฝร/ฝรั่ง.html

kristian-clausal · 2023-04-14T07:51:56Z

Thai uses it's own special formatting with a table when every other language doesn't. It will have to be on the backburner. And it's not even a well-formatted table; the cells are visually combined (at least they're still separate cells) and the second column's header is part of the normal cell. And the first column doesn't have any header information at all. I am tempted just to sneak in and just rewrite the whole thing so that it's more like other pronunciation sections. And it's all generated in a Lua module, that doesn't make any of this easier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thai pronunciation data missing #231

thai pronunciation data missing #231

guttenberger commented Apr 13, 2023

jmviz commented Apr 14, 2023

kristian-clausal commented Apr 14, 2023

guttenberger commented Apr 14, 2023 •

edited

kristian-clausal commented Apr 14, 2023

thai pronunciation data missing #231

thai pronunciation data missing #231

Comments

guttenberger commented Apr 13, 2023

jmviz commented Apr 14, 2023

kristian-clausal commented Apr 14, 2023

guttenberger commented Apr 14, 2023 • edited

kristian-clausal commented Apr 14, 2023

guttenberger commented Apr 14, 2023 •

edited