Skip to content
A collection of languages stemmers and stopwords for Lunr Javascript library
JavaScript
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Update node Sep 28, 2019
build Add Arabic language support (PR #44) Sep 28, 2019
demos remove unused file Jun 13, 2017
min Add Arabic language support (PR #44) Sep 28, 2019
test Add Arabic language support (PR #44) Sep 28, 2019
.gitignore made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
.gitmodules add build support for lunr-languages Jul 27, 2014
CONTRIBUTING.md made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
LICENSE lunr languages - first commit Apr 20, 2014
README.md Add Arabic language support (PR #44) Sep 28, 2019
bower.json update bower.json version Apr 3, 2017
lunr.ar.js Added arabic language support Mar 20, 2018
lunr.da.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.de.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.du.js rebuild languages Feb 22, 2019
lunr.es.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.fi.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.fr.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.hu.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.it.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.ja.js build japanese files Jul 28, 2017
lunr.jp.js build japanese files Jul 28, 2017
lunr.multi.js Prevent error when adding multiple languages if one of them doesn't h… Jul 21, 2019
lunr.nl.js Renamed DU to NL, and add a warning message to DU Feb 22, 2019
lunr.no.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.pt.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.ro.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.ru.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.stemmer.support.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.sv.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.th.js able to use wordcut on browser Jun 13, 2017
lunr.tr.js made compatible with all versions of Lunr, added integration tests wi… Apr 3, 2017
lunr.vi.js Added minified files for Vietnamese language Sep 28, 2019
package.json bump package number, added Arabic Sep 28, 2019
tinyseg.js regexp.compile is deprecated, fixing this Sep 28, 2019
wordcut.js able to use wordcut on browser Jun 13, 2017

README.md

Lunr Languages npm Bower Join the chat at https://gitter.im/lunr-languages/Lobby CircleCI branch

Lunr Languages is a Lunr addon that helps you search in documents written in the following languages:

  • German
  • French
  • Spanish
  • Italian
  • Japanese
  • Dutch
  • Danish
  • Portuguese
  • Finnish
  • Romanian
  • Hungarian
  • Russian
  • Norwegian
  • Thai
  • Vietnamese
  • Arabic
  • Contribute with a new language

Lunr Languages is compatible with Lunr version 0.6, 0.7, 1.0 and 2.X.

How to use

Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.

In a web browser

The following example is for the German language (de).

Add the following JS files to the page:

<script src="lunr.js"></script> <!-- lunr.js library -->
<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script> <!-- or any other language you want -->

then, use the language in when initializing lunr:

var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 });
  this.field('body');
  // now you can call this.add(...) to add documents written in German
});

That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.

In a web browser, with RequireJS

Add require.js to the page:

<script src="lib/require.js"></script>

then, use the language in when initializing lunr:

require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) {
  // since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
  // in the end, we will only need lunr.
  stemmerSupport(lunr); // adds lunr.stemmerSupport
  de(lunr); // adds lunr.de key

  // at this point, lunr can be used
  var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 })
  this.field('body')
  // now you can call this.add(...) to add documents written in German
  });
});

With node.js

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.de.js')(lunr); // or any other language you want

var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 })
  this.field('body')
  // now you can call this.add(...) to add documents written in German
});

Indexing multi-language content

If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);

var idx = lunr(function () {
  // the reason "en" does not appear above is that "en" is built in into lunr js
  this.use(lunr.multiLanguage('en', 'ru'));
  // then, the normal lunr index initialization
  // ...
});

You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).

If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:

lunr.multiLanguage('en', 'ru');
var idx = lunr.Index.load(serializedIndex);

How to add a new language

Check the Contributing section

How does Lunr Languages work?

Searching inside documents is not as straight forward as using indexOf(), since there are many things to consider in order to get quality search results:

  • Tokenization
    • Given a string like "Hope you like using Lunr Languages!", the tokenizer would split it into individual words, becoming an array like ['Hope', 'you', 'like', 'using', 'Lunr', 'Languages!']
    • Though it seems a trivial task for Latin characters (just splitting by the space), it gets more complicated for languages like Japanese. Lunr Languages has this included for the Japanese language.
  • Trimming
    • After tokenization, trimming ensures that the words contain just what is needed in them. In our example above, the trimmer would convert Languages! into Languages
    • So, the trimmer basically removes special characters that do not add value for the search purpose.
  • Stemming
    • What happens if our text contains the word consignment but we want to search for consigned? It should find it, since its meaning is the same, only the form is different.
    • A stemmer extracts the root of words that can have many forms and stores it in the index. Then, any search is also stemmed and searched in the index.
    • Lunr Languages does stemming for all the included languages, so you can capture all the forms of words in your documents.
  • Stop words
    • There's no point in adding or searching words like the, it, so, etc. These words are called Stop words
    • Stop words are removed so your index will only contain meaningful words.
    • Lunr Languages includes stop words for all the included languages.

Technical details & Credits

I've created this project by compiling and wrapping stemmers toghether with stop words from various sources so they can be directly used with all the current versions of Lunr.

I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook)

You can’t perform that action at this time.