Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Journal of Semitic Studies schema #73

Closed
charlesLoder opened this issue Jun 14, 2023 · 5 comments
Closed

Add Journal of Semitic Studies schema #73

charlesLoder opened this issue Jun 14, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@charlesLoder
Copy link
Owner

See @camilstaps original issue here

I don't think it's possible to recognize short/long vowels(?) to distinguish e.g. i and ī, so I used ī for hireq-yod and i for plain hireq, which may require some manual fixes from the user (what would be helpful for this is a separate field for hireq+meteg and qibbuts+meteg).

Could you provide examples? I think the ADDITIONAL_FEATURES may be able to account for that.



The style guide also prescribes that qamets before hatef qamets be transliterated as long qamets: בַּֽצָּהֳרָֽיִם should be baṣṣå̄hå̆rå̄yim, not baṣṣåhå̆rå̄yim. I am not sure if this can be specified in the current system.

Interesting, so they say the qamets under the tsade should be a qamets qatan, but they maintain a distinction between qamets qatan and qamets gadol in transliteration. The stlyesheet says:

This transcription of the quality of the vowels corresponds to the Tiberian reading tradition of Biblical Hebrew,
with the exception of the shewa. The distribution of vocalic and silent shewa, however, follows the Tiberian
tradition.

Given that Khan is the editor, I would assume that means there is no distinction between qamets qatan and qamets gadol. Maybe I'll have to pry into this one.



Tsere-he is not recognized correctly, I'm not sure why: וְהִנֵּ֥ה should be wǝhinnē, not wǝhinnɛ.

I'll research that.



Let me know what you think of the two questions above.


Initial JSON

{
  "VOCAL_SHEVA": "ǝ",
  "HATAF_SEGOL": "ɛ̆",
  "HATAF_PATAH": "ă",
  "HATAF_QAMATS": "å̆",
  "HIRIQ": "i",
  "TSERE": "ē",
  "SEGOL": "ɛ",
  "PATAH": "a",
  "QAMATS": "å̄",
  "HOLAM": "ō",
  "QUBUTS": "u",
  "DAGESH": "",
  "DAGESH_CHAZAQ": true,
  "MAQAF": " ",
  "PASEQ": "",
  "SOF_PASUQ": "",
  "QAMATS_QATAN": "å",
  "FURTIVE_PATAH": "a",
  "HIRIQ_YOD": "ī",
  "TSERE_YOD": "ē",
  "SEGOL_YOD": "ɛ",
  "SHUREQ": "ū",
  "HOLAM_VAV": "ō",
  "QAMATS_HE": "å̄",
  "SEGOL_HE": "ɛ",
  "TSERE_HE": "ē",
  "MS_SUFX": "å̄yw",
  "ALEF": "ʾ",
  "BET_DAGESH": "b",
  "BET": "",
  "GIMEL": "",
  "GIMEL_DAGESH": "g",
  "DALET": "",
  "DALET_DAGESH": "d",
  "HE": "h",
  "VAV": "w",
  "ZAYIN": "z",
  "HET": "",
  "TET": "",
  "YOD": "y",
  "FINAL_KAF": "",
  "KAF": "",
  "KAF_DAGESH": "k",
  "LAMED": "l",
  "FINAL_MEM": "m",
  "MEM": "m",
  "FINAL_NUN": "n",
  "NUN": "n",
  "SAMEKH": "s",
  "AYIN": "ʿ",
  "FINAL_PE": "",
  "PE": "",
  "PE_DAGESH": "p",
  "FINAL_TSADI": "",
  "TSADI": "",
  "QOF": "q",
  "RESH": "r",
  "SHIN": "š",
  "SIN": "ś",
  "TAV": "",
  "TAV_DAGESH": "t",
  "DIVINE_NAME": "yhwh",
  "SYLLABLE_SEPARATOR": "",
  "ADDITIONAL_FEATURES": [],
  "STRESS_MARKER": {
    "location": "",
    "mark": ""
  },
  "longVowels": true,
  "qametsQatan": true,
  "sqnmlvy": true,
  "wawShureq": true,
  "article": true
}
@camilstaps
Copy link
Contributor

Thanks for handling this so quickly!

An example where recognizing hireq-meteg could be helpful would be הִֽתְקַבְּצ֔וּ. With the schema above this is transliterated as hiṯǝ-, while it should be hīṯǝ. The system does correctly recognize that the meteg means that the schwa is vocal, which is very nice. I think there are still other cases where an etymologically long ī or ū is not written with either vowel letter or meteg, and that JSS would want these to be transcribed with macron as well. But there is no way to recognize these cases (except for having a builtin dictionary).

The style sheet specifically gives the example of צָחֳרַיִם, which should be transliterated ṣå̄ḥå̆rayim. I myself have learned to pronounce qamets before hatef qamets as short, but I don't know based on what tradition that is, and I cannot find this rule in Khan 2020. In Blau 2010 I do find the rule, but only for the Sephardic tradition (in §3.5.3.4; but see also the note which says that in genuine Sephardic pronunciation the qamets is unaffected by a following hatef qamets). However, in §3.5.3.7 Blau writes that "The Tiberian vocalization marks only qualitative differences and not quantitative ones (with the exception of the ultra-short vowels ...)", and then I'm confused why the JSS system distinguishes qamets gadol and qamets qatan at all. I'm sorry to not be able to be of more help (but let me know if you need a copy of Blau).

@charlesLoder
Copy link
Owner Author

I think there are still other cases where an etymologically long ī or ū is not written with either vowel letter or meteg, and that JSS would want these to be transcribed with macron as well. But there is no way to recognize these cases (except for having a builtin dictionary).

Yeah, SBL requires the same, and there is no way to do it w/o a dictionary. I tossed around the idea once, maybe I'll try to incorporate one.

if you need a copy of Blau

If you could that would be wonderful! I'll send you a Twitter DM

@charlesLoder
Copy link
Owner Author

Acc. to the senior editor:

In the Tiberian reading tradition there is no distinction in the quality of qameṣ gadol and qameṣ qaṭan. However, there is a difference in the quantity, which is indicated in the different diacritics in our transcription system.

In Khan's Tiberian Pronunciation v1, the difference between qamets qatan and qamets gadol (though he never says those terms) is length.

Qatan
See: קָדְשֵׁי [qɔðˈʃeː]
and: כָּל־ [kʰɔl]
with short vowels as indicated by no ː mark.

Gadol
The "typical" qamets usuall has a ː mark.
See: יָמִים [jɔːˈmiːim]


As for the schema, I think an ADDITIONAL_FEATURE may work. Something like this untested code:

{
  ADDITIONAL_FEATURES: [
    {
      FEATURE: "syllable",
      // if the syllable contains a qamets qatan character
      HEBREW: /\u{05C7}/,
      TRANSLITERATION: (syllable) => {
        const next = syllable?.next?.value?.text;
        // if the next syllable includes a hateph qamets, then replace the qamets qatan with a regular qamets 
        if(next && next.includes("\u05B3')) {
          return syllable.text.replace("\u{05C7}", "\u{05B8}")
        }
        return syllable.text
      }
    }
  ]
}

@charlesLoder charlesLoder added this to the v2.5.0 milestone Jun 16, 2023
@camilstaps
Copy link
Contributor

That's great! I didn't realize an ADDITIONAL_FEATURE could contain code, then something similar should definitely work to recognize hireq/qibbuts + meteg as well.

If you want, I can have a go, but it may take a while, I have a lot on my plate at the moment. Totally understandable if that's also the case for you of course!

@charlesLoder
Copy link
Owner Author

@camilstaps have at it! I'm totally swamped too :)

There's a folder for schema tests, that would be my only ask. You can just duplicate sblSimple, and add test for these special cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants