-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show sutra text #33
Show sutra text #33
Conversation
Generated with the following python script: ``` import requests, json from indic_transliteration import sanscript data = json.loads(requests.get('https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt').text)['data'] print(len(data)) out = {} for sutra in data: name = f"{sutra['a']}.{sutra['p']}.{sutra['n']}" text = sutra['s'] slp1 = sanscript.transliterate(text, sanscript.DEVANAGARI, sanscript.SLP1) if slp1 == 'kftyErfRe': back = sanscript.transliterate('kftyEr fRe', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '') elif slp1 == 'urft': back = sanscript.transliterate('ur ft', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '') else: back = sanscript.transliterate(slp1, sanscript.SLP1, sanscript.DEVANAGARI) assert back == text, (text, slp1, back) out[name] = slp1 with open('sutrapatha.json', 'w') as f: json.dump(out, f, indent=2) f.write('\n') ```
The async is kind of weird but it works, and loading Aside: note that transliterating to SLP1 makes it hard (AFAICT) to recover the original Devanagari (which is debatable in the first place) for:
— see indic-transliteration/indic_transliteration_py#75 (this is the kind of thing that I'm hoping a Rust transliteration library would fix by being "pedantic" and requiring specifying a strategy instead of making ad-hoc choices, but it's probably fine for now). |
Generated with: ```py import requests, json, csv from indic_transliteration import sanscript data = json.loads(requests.get('https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt').text)['data'] print(len(data)) out = [] for sutra in data: name = f"{sutra['a']}.{sutra['p']}.{sutra['n']}" text = sutra['s'] slp1 = sanscript.transliterate(text, sanscript.DEVANAGARI, sanscript.SLP1) # if slp1 == 'kftyErfRe': # back = sanscript.transliterate('kftyEr fRe', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '') # elif slp1 == 'urft': # back = sanscript.transliterate('ur ft', sanscript.SLP1, sanscript.DEVANAGARI).replace(' ', '') # else: # back = sanscript.transliterate(slp1, sanscript.SLP1, sanscript.DEVANAGARI) # assert back == text, (text, slp1, back) out.append((name, slp1)) with open('sutrapatha.tsv', 'w', newline='') as f: writer = csv.writer(f, dialect='excel-tab') writer.writerows(out) ```
wonderful -- thank you!! |
I'd love to discuss this with you further now that we have a starter implementation (https://ambuda-org.github.io/vidyut-lipi/). |
Hacky code for #26 , but it actually seems to work:
Before merging I guess at minimum we should copy over https://raw.githubusercontent.com/ashtadhyayi-com/data/master/sutraani/data.txt so that it's hosted here — maybe convert it to a simple
data/sutrapatha.tsv
as suggested at #26 (comment) that has entries like— but just pushing this commit for now as a savepoint for when I return to this later (not today). (Or if I don't return 😱 )
(Curious how much faster it will get when we change from array of 4000 items to JS object for faster lookup…)