Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to support lunr 2.x #30

Closed
wants to merge 1 commit into from

Conversation

olivernn
Copy link

This commit adds support for the upcoming release of lunr 2.x. This has not been released yet so its probably best off waiting to merge this until I do that release. Ideally I'd like to be able to have the same great language support in the new Lunr from day one though, so getting this out here now to get feedback.

Lunr has changed the interface for pipeline functions. Before, tokens were strings passed to pipeline functions, Lunr 2.x changes this, wrapping them in a lunr.Token object. This means that all pipeline functions that expect to be working with a string need to be updated to work with a lunr.Token.

This change covers most of the language plugins in this repository. The Japanese plugin required a few more changes, specifically to make use of the new, per index, tokeniser. This should allow a Japanese and non-Japanese indexes to coexist.

There is one potential issue though, searches are now parsed by lunr.QueryParser which expects terms to be whitespace separated. I don't know enough (none) Japanese to get from the demos if this is an issue or not, perhaps someone can lend a hand here.

I have not changed any of the versions etc, I don't know what you want to do here. I'd like to suggest still keeping support for the 0.x and 1.x branches of Lunr, as well as the new interfaces in Lunr 2.x. Perhaps lune-languages could have a similar versioning scheme, to indicate which major version of Lunr is supported. I'm open to ideas here though.

Lunr 2.x has changed the interface for pipeline functions, instead of
just passing around strings, tokens are now represented as an instance
of lunr.Token.

Additionally, Lunr now supports per index tokenisers, this allows the JP
plugin to not have to clobber the global lunr.tokenizer function. This
is fine for when building the index, but queries are parsed by
lunr.QueryParser which may need some help when it comes to parsing
Japanese searches. Currently there is no extension point for the query
parser so we may need to do some work here.

This commit also includes updates for the demos.
@amsdamsgram
Copy link

I've tried it with the french language and it works good.
It would be nice if it could be merged.

@MihaiValentin
Copy link
Owner

MihaiValentin commented Apr 3, 2017

Thanks @olivernn for this PR. It helped me understand the upcoming changes in Lunr 2.

So, I took the insight from here and overhauled Lunr Languages to be compatible with all Lunr versions (0.6.0, 0.7.0, 1.0.0, 2.0.0-alpha.5). The code is now in master and I bumped Lunr Languages to version 1.0.0

This will help users, since no matter what Lunr version they will use, they'll just to have to make sure they use the latest Lunr Languages version.

In order to enforce this, I added integration tests that test the combination between Lunr versions X Lunr Languages languages.

In this way, we'll achieve two things:

  • you will be able to quickly perform a smoke test when releasing new Lunr versions by just adding the newest version to the Lunr Languages tests (2 lines of code)
  • any contributor to Lunr Languages will know as early as possible if his changes/bug-fixes do not break any existing functionality, thus ensuring stability

I will close this MR now.

Here's the commit in which these changes were made: 4c64ac6 . The key changes were made in:

  • lunr.template - forward-compatibility to Lunr 2
  • test/testdata/<languages>.js - testcases for all the languages
  • test/VersionsAndLanguagesTest.js - the test that tests all Lunr versions with all the languages testscases
  • lunr.jp.de (this is not generated from lunr.template) - support for the Japanese tokenizer across all Lunr versions
  • lunr.multi.js - the multi language support also required using the searchPipeline for correctly stemming the search terms in Lunr 2
  • lunr.stemmer.support.js - forward-compatibility to Lunr 2

Should you have any questions, please comment on the commit linked above, or let's talk in Gitter.

@iDams , you can now use it :).

@olivernn
Copy link
Author

olivernn commented Apr 4, 2017

💯 Nice work @MihaiValentin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants