Skip to content
Stopwords for 50 languages in JSON format
JavaScript
Branch: master
Clone or download
6 Merge pull request #10 from 6/fix-it
Fix Italian accents, new release
Latest commit fca10ee Feb 28, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dist
docs Rerun `grunt` to update README and dist Feb 28, 2017
src Fix accents in Savoy source file Feb 28, 2017
test more explicit tests Jun 21, 2016
.travis.yml use travis instead of circleci Jun 21, 2016
Gruntfile.js fix readme Jun 21, 2016
README.md Rerun `grunt` to update README and dist Feb 28, 2017
bower.json Bump NPM and Bower versions Feb 28, 2017
package.json Bump NPM and Bower versions Feb 28, 2017
stopwords-all.json Rerun `grunt` to update README and dist Feb 28, 2017

README.md

stopwords-json Build Status npm Bower

Stopwords for various languages in JSON format. Per Wikipedia:

Stop words are words which are filtered out prior to, or after, processing of natural language data [...] these are some of the most common, short function words, such as the, is, at, which, and on.

You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.

Languages

There are a total of 50 supported languages:

Language Stopword count Filename
Afrikaans 51 af.json
Arabic 162 ar.json
Armenian 45 hy.json
Basque 98 eu.json
Bengali 116 bn.json
Breton 126 br.json
Bulgarian 259 bg.json
Catalan 218 ca.json
Chinese 542 zh.json
Croatian 179 hr.json
Czech 346 cs.json
Danish 101 da.json
Dutch 275 nl.json
English 570 en.json
Esperanto 173 eo.json
Estonian 35 et.json
Finnish 772 fi.json
French 606 fr.json
Galician 160 gl.json
German 596 de.json
Greek 75 el.json
Hausa 39 ha.json
Hebrew 194 he.json
Hindi 225 hi.json
Hungarian 781 hu.json
Indonesian 355 id.json
Irish 109 ga.json
Italian 619 it.json
Japanese 109 ja.json
Korean 679 ko.json
Latin 49 la.json
Latvian 161 lv.json
Marathi 99 mr.json
Norwegian 172 no.json
Persian 332 fa.json
Polish 260 pl.json
Portuguese 408 pt.json
Romanian 282 ro.json
Russian 539 ru.json
Slovak 110 sk.json
Slovenian 446 sl.json
Somalia 30 so.json
Southern Sotho 31 st.json
Spanish 577 es.json
Swahili 74 sw.json
Swedish 401 sv.json
Thai 115 th.json
Turkish 279 tr.json
Yoruba 60 yo.json
Zulu 29 zu.json

Sources

License and Copyright

Copyright (c) 2017 Peter Graham, contributors. Released under the Apache-2.0 license.

You can’t perform that action at this time.