Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9662: Update concurrent index checking usage instructions #281

Merged

Conversation

zacharymorn
Copy link
Contributor

Description

Update concurrent index checking usage instructions

Sample output

> Task :lucene:core:CheckIndex.main() FAILED

ERROR: index path not specified
Usage: java org.apache.lucene.index.CheckIndex pathToIndex [-exorcise] [-slow] [-segment X] [-segment Y] [-threadCount X] [-dir-impl X]

  -exorcise: actually write a new segments_N file, removing any problematic segments
  -fast: just verify file checksums, omitting logical integrity checks
  -slow: do additional slow checks; THIS IS VERY SLOW!
  -codec X: when exorcising, codec to write the new segments_N file with
  -verbose: print additional details
  -segment X: only check the specified segments.  This can be specified multiple
              times, to check more than one segment, eg '-segment _2 -segment _a'.
              You can't use this with the -exorcise option
  -threadCount X: number of new threads created and used to check index concurrently.
                  When not specified, this will default to the number of CPU cores up to 4.
                  When '-threadCount 1' is used, index checking will be performed sequentially.
  -dir-impl X: use a specific FSDirectory implementation. If no package is specified the org.apache.lucene.store package will be used.

**WARNING**: -exorcise *LOSES DATA*. This should only be used on an emergency basis as it will cause
documents (perhaps many) to be permanently removed from the index.  Always make
a backup copy of your index before running this!  Do not run this tool on an index
that is actively being written to.  You have been warned!

Run without -exorcise, this tool will open the index, report version information
and report any exceptions it hits and what action it would take if -exorcise were
specified.  With -exorcise, this tool will remove any segments that have issues and
write a new segments_N file.  This means all documents contained in the affected
segments will be removed.

This tool exits with exit code 1 if the index cannot be opened or has any
corruption, else 0.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

mikemccand pushed a commit to mikemccand/lucene that referenced this pull request Sep 3, 2021
…mpression (apache#281)

Co-authored-by: Matthew Sporleder <matt.sporleder@multiply.com>
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zacharymorn -- I left a couple small comments.

@@ -3994,6 +3994,9 @@ public static Options parseOptions(String[] args) {
+ " -segment X: only check the specified segments. This can be specified multiple\n"
+ " times, to check more than one segment, eg '-segment _2 -segment _a'.\n"
+ " You can't use this with the -exorcise option\n"
+ " -threadCount X: number of new threads created and used to check index concurrently.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just say number of threads used to check index concurrently? I.e. drop the "new" because then I start to wonder if main thread counts :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry :D . Updated the instruction.

@@ -3994,6 +3994,9 @@ public static Options parseOptions(String[] args) {
+ " -segment X: only check the specified segments. This can be specified multiple\n"
+ " times, to check more than one segment, eg '-segment _2 -segment _a'.\n"
+ " You can't use this with the -exorcise option\n"
+ " -threadCount X: number of new threads created and used to check index concurrently.\n"
+ " When not specified, this will default to the number of CPU cores up to 4.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change this default to not cap at 4? Just use number of cores? This is command-line execution, which is typically done only once at a time (versus when CheckIndex is invoked from our tests...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Updated the code and instruction here.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @zacharymorn!

@zacharymorn
Copy link
Contributor Author

Looks great! Thanks @zacharymorn!

Thanks Michael for the review and approval!

@zacharymorn zacharymorn merged commit 7f8607b into apache:main Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants