Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suspect bigrams and trigrams #9

Closed
drdhaval2785 opened this issue Mar 5, 2016 · 3 comments
Closed

suspect bigrams and trigrams #9

drdhaval2785 opened this issue Mar 5, 2016 · 3 comments

Comments

@drdhaval2785
Copy link
Collaborator

I have extracted some base bigrams and trigrams from Sanskrit Dictionaries.
I compared the bigrams of a given word in the grammar commentaries and put the words with n-grams which are not found in the base ngrams in suspect bigram and suspec trigram files.
Systematic analysis of these files would help us bring out some spelling mistakes.

@drdhaval2785
Copy link
Collaborator Author

https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/balamanorama_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/balamanorama_3gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/nyasa_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/nyasa_3gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/kashika_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/kashika_3gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/laghu_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/laghu_3gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/samhita_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/samhita_3gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/tattvabodhini_2gram_suspect.txt
https://github.com/drdhaval2785/ashtadhyayi/blob/master/scripts/ngram/tattvabodhini_3gram_suspect.txt

These are the files which have dictionarywise sutrawise potential errors
e.g.

../../balamanorama/pada-1.2\1.2.65.md
पौत्रप्भृति:पौत्रप्भृति:balamanorama:प्भ्
../../balamanorama/pada-1.3\1.3.2.md
द्वौझष्:द्वौझष्:balamanorama:औझ्

These are mainly errors.

Logic is that the bigrams and trigrams of a particular commentary (test) is compared against bigrams and trigrams of all other commentaries (base) and those ngrams which are missing in the base are flagged with their corresponding words and sUtra details.

@drdhaval2785
Copy link
Collaborator Author

drdhaval2785 commented Feb 25, 2017

@aupasana
Once PR #12 is merged, I will open a new PR for this correction work.
Further enhancement which I have in mind is some kind of auto suggestions based on edit-distance (leventsthein distance) from words of other commentaries.
This will give us some clue as to the correct reading.
Otherwise manual checking is the easy option available.

@drdhaval2785
Copy link
Collaborator Author

https://github.com/sanskrit/ashtadhyayi/issues handles it separately. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant