Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system #582

Merged
merged 49 commits into from
Jan 5, 2022

Conversation

uschindler
Copy link
Contributor

@uschindler uschindler self-assigned this Jan 4, 2022
@uschindler uschindler marked this pull request as draft January 4, 2022 14:19
@rmuir
Copy link
Member

rmuir commented Jan 4, 2022

This is gonna be fun :) Thanks for starting this! It is almost guaranteed to find some bugs. Previously only analyzers-common got this level of testing. I will help wrestle these tests some this evening.

@dweiss
Copy link
Contributor

dweiss commented Jan 4, 2022

@uschindler - can you take a look at #571? Or we can merge that branch here too. I've converted non-distribution .tests projects there to utilize the test framework and generally improved the handling of modular paths (even though the module patching defeated me).

@uschindler
Copy link
Contributor Author

@uschindler - can you take a look at #571? Or we can merge that branch here too. I've converted non-distribution .tests projects there to utilize the test framework and generally improved the handling of modular paths (even though the module patching defeated me).

I will check later, I am a bit busy with argument providers now.

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looks great.

@rmuir
Copy link
Member

rmuir commented Jan 5, 2022

@rmuir I found a failure:

In both these failures I see the JapaneseIterationMarkCharFilter. Maybe shitlist this temporarily until we have time to debug why it messes up offsets?

@uschindler
Copy link
Contributor Author

The name does not matter, only the suffix .tests matters.

It kind of matters, I just think we should do this consistently. If you really like integration.tests name, that is fine, but please also fix the ones in core and distribution to match this scheme?

There's a bit of difference: There is no "analysis" module. The other modules like core have core.tests (i dont like the name, but Dawid has chosen it).

Sure, I can move the classes around no issue. But then analysis.test would be top-level next to the analysis subdirectory. If this is fine to you, I will do.

@uschindler
Copy link
Contributor Author

@rmuir I found a failure:

In both these failures I see the JapaneseIterationMarkCharFilter. Maybe shitlist this temporarily until we have time to debug why it messes up offsets?

I will try to shitlist that one and run another beaster round. If all is fine, I'd like to commit and merge this soon.

@rmuir
Copy link
Member

rmuir commented Jan 5, 2022

Sure, I can move the classes around no issue. But then analysis.test would be top-level next to the analysis subdirectory. If this is fine to you, I will do.

ok, this makes sense to me, if its easy to make it work.

@uschindler
Copy link
Contributor Author

uschindler commented Jan 5, 2022

Another failure:

  2> stage 0: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> zzkuxp<[13-19] +1>
  2> stage 1: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> zzkuxp<[13-19] +1>
  2> stage 2: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0>
  2> stage 3: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0>
  2> last stage: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> oy<[8-10] +1> oy 1<[8-12] +0> 1 zzkuxp<[11-19] +0>
  2> TEST FAIL: useCharFilter=false text='?.e|ek|]oy{1 zzkuxp ZyzzV ycuqjnv axtpppvk \u233b\u23c8\u2314\u232e\u236e\u238d\u235e x d  \"</p>'
  2> Exception from random analyzer:
  2> charfilters=
  2>   org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, ifywufhi, java.io.StringReader@48586999)
  2>   org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@65036838, org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@11d4ba35)
  2> tokenizer=
  2>   org.apache.lucene.analysis.ko.KoreanTokenizer()
  2> filters=
  2>   org.apache.lucene.analysis.en.KStemFilter(ValidatingTokenFilter@595d7938 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false)
  2>   org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@13d08b48 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false, u)
  2>   org.apache.lucene.analysis.util.ElisionFilter(ValidatingTokenFilter@6396b917 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false, [fh, hiiwwxyyd, fcpodqor, qogvhmywr, l, icad])
  2>   Conditional:org.apache.lucene.analysis.ko.KoreanNumberFilter(OneTimeWrapper@5f0558f6 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false)
   >     java.lang.IllegalStateException: last stage: inconsistent startOffset at pos=2: 8 vs 11; token=1 zzkuxp

gradlew :lucene:analysis:integration.tests:test --tests "org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings" -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=E4552C7844FC2DA3 -Ptests.file.encoding=US-ASCII

@dweiss
Copy link
Contributor

dweiss commented Jan 5, 2022

The name does not matter, only the suffix .tests matters.

The "*.tests" convention for project naming is primarily used to exclude these subprojects from being published as Maven artifacts and in various other places in the gradle build code. The prefix does not matter. I think it would be nice to (softly) encourage people to write XYZ.tests projects containing fully modular tests for module XYZ - there are two advantages to this I can think of. For example, a separate source set within the project (src/test-modular...) would complicate the build, add special exclusion rules to tasks and in general be less intuitive. It is an arbitrary choice though - if you can think of a better naming/ solution then it can be certainly changed.

@rmuir
Copy link
Member

rmuir commented Jan 5, 2022

I successfully got 100 runs of this test to pass:

./gradlew :lucene:analysis:integration.tests:beast -Dtests.dups=100 --tests TestRandomChains -Dtests.nightly=true

I think let's leave remaining beasting for jenkins and main branch? This test may find corner case bugs forever if we run it enough. And better to continue work on strengthening the test (e.g. NULL injection).

@uschindler
Copy link
Contributor Author

It is similar now for me. I will now move the integration package around and then wait for another round.

Otherwise I think it is ready.

@uschindler
Copy link
Contributor Author

uschindler commented Jan 5, 2022

I wanted to originally backport this to 9.x. I am just afraid that the two branches get too different (there are many changes in this PR).

@rmuir what do you think? Maybe on 9.x add the filter on "common" and "core"?

@uschindler
Copy link
Contributor Author

I moved the project now. If you git pull, clean all before, otherwise it will find leftover files on precommit

@rmuir
Copy link
Member

rmuir commented Jan 5, 2022

if we can backport it, it is ideal. What is the harm? It is just tests.

@uschindler
Copy link
Contributor Author

if we can backport it, it is ideal. What is the harm? It is just tests.

and it is running more often! :-)

Copy link
Member

@rmuir rmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work here! This will really help keep everything in order

@uschindler uschindler merged commit 475fbd0 into apache:main Jan 5, 2022
@uschindler uschindler deleted the draft/random-chains branch January 5, 2022 14:35
asfgit pushed a commit that referenced this pull request Jan 5, 2022
…ins to a global integration test and discover classes to check from module system (#582)

Co-authored-by: Robert Muir <rmuir@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants