Skip to content

Latest commit

 

History

History
52 lines (43 loc) · 2.01 KB

README.md

File metadata and controls

52 lines (43 loc) · 2.01 KB

Language Resource

Collection of stopwords, frequent words and other things.

To help a build application with NLP (Natural Language Processing) like:

  • Stemming
  • Text simplification
  • Text-to-speech
  • Text-proofing
  • Natural language search
  • Query expansion
  • Automated essay scoring
  • Truecasing

or Search Engines like:

Languages

Language ISO 639-1 Name Stopwords Frequent Words Obs
bg Bulgarian Yes No UTF-8
cz Czech Yes No UTF-8
de German Yes Yes
en English Yes Yes
es Spanish Yes + Yes
fi Finnish Yes Yes
fr French Yes Yes
hu Hungarian Yes No UTF-8
it Italian Yes Yes UTF-8
pl Polish Yes No UTF-8
pt Portuguese Yes + No
ru Russian Yes No UTF-8
sv Swedish Yes Yes

Reference

Almost everything was extract from http://members.unine.ch/jacques.savoy/clef/

Contributing

Make a fork, do your changes and request a pull.

Please, also do the modifications on this readme file!

Thanks for your help!