Skip to content

KaniyamFoundation/all_tamil_nouns

Repository files navigation

all_tamil_nouns

We are working on build a open source spellchecker for Tamil Language using Open-Tamil python library.

We found that we need all the nouns in tamil for quick checking and validating.

It seems our world is full of nouns.

In this repo, we are collecting all the nouns as much possbile.

Read our explorations of building Tamil spellchecker here - https://goinggnu.wordpress.com/category/spellchecker/

counts

nouns - 97875

peryan.in_names/boy - 20391

peyar.in_names/girl - 24030

random_collections - 1115

tamilsurangam.in - 1249

wiktionary - 85256

total - 2,29,916 (all_nouns.txt)

only unique_nouns - 1,92,122 (unique_all_nouns.txt)

===

Further removed the unique sub names and made this file unique_sorted_noun_master.txt.

Will be using this file as a master list for nouns.

wc -l unique_sorted_noun_master.txt

1,53,548 unique_sorted_noun_master.txt

===

Read more about this repo here - https://goinggnu.wordpress.com/2020/05/24/building-tamil-spellchecker-day-3-collecting-all-tamil-nouns/

TODO

  • Collect more nouns and add in this repo.
  • Check for any errors and fix them in these files.
  • Collect all verbs and other forms in tamil too.

About

A project to collect all tamil nouns

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages