Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show sutra text #33

Merged
merged 6 commits into from
Jan 9, 2023
Merged

Show sutra text #33

merged 6 commits into from
Jan 9, 2023

Conversation

shreevatsa
Copy link
Contributor

Hacky code for #26 , but it actually seems to work:

image

Before merging I guess at minimum we should copy over https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt so that it's hosted here — maybe convert it to a simple data/sutrapatha.tsv as suggested at #26 (comment) that has entries like

8.4.68	अ अ

— but just pushing this commit for now as a savepoint for when I return to this later (not today). (Or if I don't return 😱 )

(Curious how much faster it will get when we change from array of 4000 items to JS object for faster lookup…)

Generated with the following python script:

```
import requests, json
from indic_transliteration import sanscript

data = json.loads(requests.get('https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt').text)['data']

print(len(data))

out = {}
for sutra in data:
    name = f"{sutra['a']}.{sutra['p']}.{sutra['n']}"
    text = sutra['s']
    slp1 = sanscript.transliterate(text, sanscript.DEVANAGARI, sanscript.SLP1)
    if slp1 == 'kftyErfRe':
        back = sanscript.transliterate('kftyEr fRe', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '')
    elif slp1 == 'urft':
        back = sanscript.transliterate('ur ft', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '')
    else:
        back = sanscript.transliterate(slp1, sanscript.SLP1, sanscript.DEVANAGARI)
    assert back == text, (text, slp1, back)
    out[name] = slp1

with open('sutrapatha.json', 'w') as f:
    json.dump(out, f, indent=2)
    f.write('\n')
```
@shreevatsa
Copy link
Contributor Author

The async is kind of weird but it works, and loading sutrapatha.json doesn't block the rest of the app from loading, so this is probably best? Please take a look whether it's ready to merge.

Aside: note that transliterating to SLP1 makes it hard (AFAICT) to recover the original Devanagari (which is debatable in the first place) for:

  • kftyErfRe (turns कृत्यैर्ऋणे into कृत्यैरृणे) and
  • urft (turns उर्ऋत् into उरृत्)

— see indic-transliteration/indic_transliteration_py#75 (this is the kind of thing that I'm hoping a Rust transliteration library would fix by being "pedantic" and requiring specifying a strategy instead of making ad-hoc choices, but it's probably fine for now).

@shreevatsa
Copy link
Contributor Author

shreevatsa commented Jan 9, 2023

Made a small change, updated screenshot (tested with manual removal of "1.3.9": "tasya lopaH" from sutrapatha.json) — no error messages in JS console:

image

Generated with:

```py
import requests, json, csv
from indic_transliteration import sanscript

data = json.loads(requests.get('https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt').text)['data']

print(len(data))

out = []
for sutra in data:
    name = f"{sutra['a']}.{sutra['p']}.{sutra['n']}"
    text = sutra['s']
    slp1 = sanscript.transliterate(text, sanscript.DEVANAGARI, sanscript.SLP1)
    # if slp1 == 'kftyErfRe':
    #     back = sanscript.transliterate('kftyEr fRe', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '')
    # elif slp1 == 'urft':
    #     back = sanscript.transliterate('ur ft', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '')
    # else:
    #     back = sanscript.transliterate(slp1, sanscript.SLP1, sanscript.DEVANAGARI)
    # assert back == text, (text, slp1, back)
    out.append((name, slp1))

with open('sutrapatha.tsv', 'w', newline='') as f:
    writer = csv.writer(f, dialect='excel-tab')
    writer.writerows(out)
```
@akprasad
Copy link
Contributor

akprasad commented Jan 9, 2023

wonderful -- thank you!!

@akprasad akprasad merged commit 2edcd9b into ambuda-org:main Jan 9, 2023
@akprasad
Copy link
Contributor

this is the kind of thing that I'm hoping a Rust transliteration library would fix by being "pedantic" and requiring specifying a strategy instead of making ad-hoc choices, but it's probably fine for now).

I'd love to discuss this with you further now that we have a starter implementation (https://ambuda-org.github.io/vidyut-lipi/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants