Skip to content

Experiments on language modeling for indic languages

License

Notifications You must be signed in to change notification settings

ashim95/IndicLM

Repository files navigation

IndicLM

Experiments on language modeling for indic languages

  1. generate_sentences_Yoav_Goldberg.ipynb : Use this file to generate nonce sentences for Yoav Goldberg like experiments.

  2. The sentence pairs for Yoav Goldberg like experiments in Malayalam can be found at this link. The counts for each of the dependency types are as follows:

{
'k7': 29304,
'k4': 19469,
'vmod': 61257,
'r6': 29717,
'k5': 3724,
'nmod': 60813,
'k1': 21190,
'k2': 16209,
'k3': 715
}

The format of the file (tab separated) is as follows:

Column 0 : Dependecy type to which the example belongs
Column 1 : Sentence ID of the original (correct) sentence
Column 2 : Index of the word in the sentence that is being replaced (to generate the incorrect example)
Column 3 : Correct Sentence
Column 4 : Incorrect Sentence with one word replaced

About

Experiments on language modeling for indic languages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published