Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Lucene 6.4.1 in jena-text and jena-spatial (JENA-1250) #219

Merged
merged 3 commits into from Mar 10, 2017

Conversation

osma
Copy link
Contributor

@osma osma commented Mar 3, 2017

These two commits will upgrade jena-text and jena-spatial to use Lucene 6.4.1 (and spatial4j 0.6).

Things to note:

  • The Lucene index format is incompatible. Old indexes need to be recreated. There is no backward compatibility, since we are going two major versions up and Lucene generally supports only the index format of the previous major version.
  • The multilingual analyzer has been reimplemented to use a different field layout where every language gets a separate field (e.g. label_en, label_fr) as well as a single generic field (label). This had to be done due to API changes in Lucene 5 that made it impossible to use the old mechanism where a single field was used for all languages.

@osma
Copy link
Contributor Author

osma commented Mar 7, 2017

I've tested this briefly.

  • Unit tests pass for both jena-text and jena-spatial
  • I can create a text index and query it normally
  • I can create a spatial index and query it normally
  • jena.textindexer seems to work
  • jena.textindexdump seems to work
  • jena.spatialindexer seems to work
  • jena.spatialindexdump seems to work

If I try to use a text index created before the Lucene upgrade, I get this exception:

org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/tmp/lucene/segments_2"))): 3 (needs to be between 4 and 6). This version of Lucene only supports indexes created with release 5.0 and later.

This is expected since Lucene 6 is incompatible with old Lucene 4 indexes, see JENA-1250 for details.

Unless anyone objects I will merge this within the next 2 days.

@rvesse
Copy link
Member

rvesse commented Mar 7, 2017

Is there a logical place where we could catch that specific exception and provide a more obvious error e.g. Index format is no longer supported, please rebuild the text index with Jena X.Y.Z or higher

If we do that proactively we are likely to get less support questions that if we leave the default error

@osma
Copy link
Contributor Author

osma commented Mar 7, 2017

@rvesse Good point! I'll see what I can do. I don't think there are many places where this has to be caught, one or two per module (jena-text and jena-spatial).

@osma
Copy link
Contributor Author

osma commented Mar 10, 2017

Added more informative error messages for both jena-text and jena-spatial, as suggested by @rvesse .

@asfgit asfgit merged commit 8bba58b into apache:master Mar 10, 2017
@rvesse
Copy link
Member

rvesse commented Mar 10, 2017

Thanks for adding the more informative error message, the wording is extremely clear and should avoid lots of user confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants