Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double marks #61

Closed
asherlporetz opened this issue Mar 8, 2023 · 4 comments
Closed

double marks #61

asherlporetz opened this issue Mar 8, 2023 · 4 comments

Comments

@asherlporetz
Copy link

asherlporetz commented Mar 8, 2023

Not sure of settings but the following fail if inserted in the tests. Text from Sefaria.
produces double dagesh mark instead:

    ${"sin dagesh "}   | ${"הַשָּׂדֶֽה"} | ${"haśādê"}   | ${{ DAGESH_CHAZAQ: "\u0301" }}

produces double stress mark instead:

    ${"geresh"}   | ${"עֵ֝ינֶ֗יךָ"}   | ${"ʿênêˈkā"}   | ${{ STRESS_MARKER: { location: "after-syllable", mark: "ˈ" } }}

produces "ha" instead of "ah":

    ${"furtive patach, sof pasuq"}   | ${"רֽוּחַ׃"}    | ${"rûaḥ"}

does not separate maqaf:

    ${"psalms 2:12 maqaf"}  |  ${"נַשְּׁקוּ־בַ֡ר"}  |  ${"naššǝqû-bar"}

By the way is it possible to have SILENT_SHEVA and MAPPIQ settings (default to blank strings)? For example "שַׁוְעִ֗י" can become "shavi" if silent sheva is not marked, instead of "shav,i". And "הִ֛וא" occurs often enough that it would be great to have a setting for it instead of the cpu-consuming ADDITIONAL_FEATURES; right now it translates to "hiv'" instead of just "hi". Thank you for this project!

@charlesLoder
Copy link
Owner

@asherlporetz

Just noting that I see this.

Had a baby a few months ago, and time on this project has slowed

@asherlporetz
Copy link
Author

No rush. The library is very useful as it is. Have a great time!

@charlesLoder
Copy link
Owner

Found some free time!

I assume these are your own tests?

I'll try to take them one-by-one




produces double dagesh mark instead:

The \u0301 character is being applied to ś, which already has an acute on it.

You would have to write an ADDITIONAL_FEATURE to fix that.



produces double stress mark instead:

It's counting the geresh and the revia as stressed syllables.

What text is that from? That may be a bug, or my limited understanding of taamim (almost certainly the latter).



produces "ha" instead of "ah":

Interesting, if you copy & paste from a text like Mishneh Torah, the sof pasuq after רוּחַ is actually a colon, but in Psalms 18:11 it's an actual sof pasuq.

This is definitely a bug 🐞



does not separate maqaf:

Another bug! 🐞



By the way is it possible to have SILENT_SHEVA and MAPPIQ settings (default to blank strings)?

I like this idea, but not totally sure I follow the first example.

example "שַׁוְעִ֗י" can become "shavi" if silent sheva is not marked, instead of "shav,i".

With the default settings, I get:

console.log(heb.transliterate(`שַׁוְעִ֗י`));
// šawʿî

The ʿ character is the ayin. Is that what you mean in "shav,i"?

And "הִ֛וא" occurs often enough that it would be great to have a setting for it instead of the cpu-consuming ADDITIONAL_FEATURES;

Yeah, the ADDITIONAL_FEATURES are not optimized at all.

I'm not too sure how to check a word like this other than just a regex, which would be pretty similar to an ADDITIONAL_FEATURE.

I'll have to mull over it. If you have any ideas, drop them here.

@charlesLoder
Copy link
Owner

Closes this. New issues have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants