From d921918e932d1a5c98555e2b7507a6b1f60f9e0c Mon Sep 17 00:00:00 2001 From: Robert Muir Date: Thu, 8 Apr 2021 09:36:36 -0400 Subject: [PATCH 1/2] LUCENE-9916: add a simple regeneration help doc --- help/regeneration.txt | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 help/regeneration.txt diff --git a/help/regeneration.txt b/help/regeneration.txt new file mode 100644 index 000000000000..040fa04f49b3 --- /dev/null +++ b/help/regeneration.txt @@ -0,0 +1,23 @@ +Regeneration +============ + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +------------------------------ + +Regenerate code: + +gradlew regenerate tidy + +Force-regenerate code, even when it isn't necessary: + +gradlew --rerun-tasks regenerate tidy + +Force-regenerate code, except for one tokenizer which is extremely slow: + +gradlew --rerun-tasks regenerate tidy -x generateUAX29URLEmailTokenizer From 484257f67901fd3507259d64e1f5f696a463ef9c Mon Sep 17 00:00:00 2001 From: Dawid Weiss Date: Thu, 8 Apr 2021 21:32:35 +0200 Subject: [PATCH 2/2] Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :) --- gradle/generation/regenerate.gradle | 6 +- gradle/help.gradle | 1 + help/regeneration.txt | 147 +++++++++++++++++++++++++--- 3 files changed, 142 insertions(+), 12 deletions(-) diff --git a/gradle/generation/regenerate.gradle b/gradle/generation/regenerate.gradle index 2f7b8b517ea2..7b83bc1f8c9a 100644 --- a/gradle/generation/regenerate.gradle +++ b/gradle/generation/regenerate.gradle @@ -119,7 +119,7 @@ configure([ expected = expected - same throw new GradleException("Checksums mismatch for derived resources; you might have" + - " modified a generated source file?:\n" + + " modified a generated resource (regenerate task: ${sourceTask.path}IfChanged):\n" + "Actual:\n ${actual.entrySet().join('\n ')}\n\n" + "Expected:\n ${expected.entrySet().join('\n ')}" ) @@ -172,6 +172,10 @@ configure([ project.afterEvaluate { conditionalTask.group sourceTask.group conditionalTask.description sourceTask.description + " (if sources changed)" + + // Hide low-level tasks from help. + sourceTask.group = null + sourceTask.description sourceTask.description + " (low-level)" } // Set conditional execution only if checksum mismatch occurred. diff --git a/gradle/help.gradle b/gradle/help.gradle index eab6a429a7a0..fdc7b03ad9d2 100644 --- a/gradle/help.gradle +++ b/gradle/help.gradle @@ -26,6 +26,7 @@ configure(rootProject) { ["Deps", "help/dependencies.txt", "Declaring, inspecting and excluding dependencies."], ["ForbiddenApis", "help/forbiddenApis.txt", "How to add/apply rules for forbidden APIs."], ["LocalSettings", "help/localSettings.txt", "Local settings, overrides and build performance tweaks."], + ["Regeneration", "help/regeneration.txt", "How to refresh generated and derived resources."], ["Git", "help/git.txt", "Git assistance and guides."], ["IDEs", "help/IDEs.txt", "IDE support."] ] diff --git a/help/regeneration.txt b/help/regeneration.txt index 040fa04f49b3..a9cd1700b0c8 100644 --- a/help/regeneration.txt +++ b/help/regeneration.txt @@ -1,23 +1,148 @@ Regeneration ============ -Lucene makes use of some generated code (e.g. jflex tokenizers). +Lucene has a number of machine-generated resources - some of these are +resource (binary) files, others are Java source files that are stored +(and compiled) with the rest of Lucene source code. -Examples below assume cwd at the gradlew script in the top directory of -the project's checkout. +If you're reading this, chances are that: +1) you've hit a precommit check error that said you've modified a generated + resource and some checksums are out of sync. -Generic regeneration commands ------------------------------- +2) you need to regenerate one (or more) of these resources. -Regenerate code: +In many cases hitting (1) means you'll have to do (2) so let's discuss +these in order. -gradlew regenerate tidy -Force-regenerate code, even when it isn't necessary: +Checksum validation errors +-------------------------- -gradlew --rerun-tasks regenerate tidy +LUCENE-9868 introduced a system of storing (and validating) checksums of +generated files so that they are not accidentally modified. This checkums +system will fail the build with a message similar to this one: -Force-regenerate code, except for one tokenizer which is extremely slow: +Execution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'. +> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged): + Actual: + lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326 -gradlew --rerun-tasks regenerate tidy -x generateUAX29URLEmailTokenizer + Expected: + lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8 + +The message shows you which resources have mismatches on checksums (in this case +StandardTokenizerImpl.java) but also the *module* where the generated +resource exists and the *task name* that should be used to regenerate this resource: + +:lucene:core:generateStandardTokenizerIfChanged + +To resolve the problem, try to: + +1) "git diff" the changes that caused the build failure (to see why the checksums +changed) and then decide whether to update the generated resource's template (or whatever +it is using to emit the generated resource); + +2) regenerate the derived resources, possibly saving new checksums. If you decide to +regenerate, just run the task hinted at in the error message, for example: + +gradlew :lucene:core:generateStandardTokenizerIfChanged + +This regenerates all resources the task "generateStandardTokenizer" produces +and updates the corresponding checksums. + + +Resource regeneration +--------------------- + +The "convention" task for regenerating all derived resources in a given +module is called "regenerate" and you can apply it to all Lucene modules +by running: + +gradlew regenerate + +It is typically much wiser to limit the scope of regeneration to only +the module you're working with though: + +gradlew -p lucene/analysis/common regenerate + +If you're interested in what specific generation tasks are available, see +the task list for the generation group: + +gradlew tasks --group generation + +or limit the output to a particular module: + +gradlew -p lucene/analysis/common tasks --group generation + +which displays (at the moment of writing): + +generateClassicTokenizerIfChanged - Regenerate ClassicTokenizerImpl.java (if sources changed) +generateHTMLStripCharFilterIfChanged - Regenerate HTMLStripCharFilter.java (if sources changed) +generateTldsIfChanged - Regenerate top-level domain jflex macros and tests (if sources changed) +generateUAX29URLEmailTokenizerIfChanged - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed) +generateWikipediaTokenizerIfChanged - Regenerate WikipediaTokenizerImpl.java (if sources changed) +regenerate - Rerun any code or static data generation tasks. +snowball - Regenerates snowball stemmers. + +You may wonder what all those *IfChanged tasks are... + + +Resource checksums, incremental generation and advanced topics +-------------------------------------------------------------- + +Many resource generation tasks require specific tools (perl, python, bash shell) +and resources that may not be available on all platforms. In LUCENE-9868 we tried +to make resource generation tasks "incremental" so that they only run if their +sources (or outputs) have changed. So if you run the generic "regenerate" task, many of the +actual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with +plain console, for example: + +gradlew -p lucene/analysis/common regenerate --console=plain + +... +> Task :lucene:analysis:common:generateUnicodePropsIfChanged +Checksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodeProps +... + +This shouldn't worry you at all. The "*IfChanged" tasks wrap the actual generation +tasks and verify whether the inputs and outputs of a task have changed. If so, the task +is run (and follow-up task such as tidy are scheduled). If the checksums are identical to +what was previously saved, the regeneration task is skipped. + +Of course, sometimes you may want to *force* the regeneration task to run, even if the +checksums indicate nothing has changed. This may happen because of several reasons: + +- the generation task has outputs but no inputs or the inputs are volatile. In this case +only the outputs have checksums and the task will be skipped if the outputs haven't changed. + +- you may want to run the regeneration task just to see that it actually runs and produces +the same checksums (git diff should be clean). This would be a wise periodic sanity check +to ensure everything works as expected. + +If you want to force-run the regeneration, use gradle's "--rerun-tasks" option: + +gradlew regenerate --rerun-tasks + +Scoping the call to a particular module will also work: + +gradlew -p lucene/analysis/common regenerate --rerun-tasks + +Scoping the call to a particular task will also work: + +gradlew -p lucene/analysis/common generateUnicodePropsIfChanged --rerun-tasks + +You *should not* call the underlying generation task directly; this is possible +but discouraged: + +gradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks + +The reason is that some of these generation tasks require follow-up (for example +source code tidying) and, more importantly, the checksums for these +regenerated resources won't be saved (so the next time you run 'check' it'll fail +with checksum mismatches). + +Finally, if you do feel like force-regenerating everything, remember to exclude this +monster... + +gradlew regenerate -x generateUAX29URLEmailTokenizer --rerun-tasks