Skip to content

LUCENE-9651: Make benchmarks run again, correct javadocs#71

Merged
mikemccand merged 1 commit into
apache:mainfrom
dweiss:LUCENE-9651
May 14, 2021
Merged

LUCENE-9651: Make benchmarks run again, correct javadocs#71
mikemccand merged 1 commit into
apache:mainfrom
dweiss:LUCENE-9651

Conversation

@dweiss
Copy link
Copy Markdown
Contributor

@dweiss dweiss commented Apr 7, 2021

No description provided.

@dweiss
Copy link
Copy Markdown
Contributor Author

dweiss commented Apr 8, 2021

Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all?

@rmuir
Copy link
Copy Markdown
Member

rmuir commented Apr 8, 2021

Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all?

I've not used this mechanism of the benchmark to do any performance benchmarking: It seems most performance benchmarking from contributors/committers is using https://github.com/mikemccand/luceneutil for this, or writing ad-hoc benchmarks.

Personally, I use this benchmarking package, but via QualityRun's main method, to measure relevance, and I always write my own parser (because every trec-like dataset differs oh-so-slightly and the generic TREC parser we supply never works), and I just hold it in a minimum way (generate submission.txt, then i run trec_eval etc from commandline myself).

The issue why it isn't used might be the dataset, I'm unfamiliar with this reuters dataset and maybe its not big enough for useful benchmarks? I think in general people tend to use these datasets more often for performance benchmarks, often ad-hoc:

  • wikipedia english
  • geonames
  • apache httpd logs
  • NYC Taxis
  • OpenStreetMap

Or maybe its just because perf issues are usually complicated? For example to reproduce LUCENE-9827 I downloaded geonames and wrote a simple standalone .java Indexer (attached to issue) that essentially changes IW's config (flush every doc, SerialMergeScheduler, LZ4 and DEFLATE codec compression) to keep it simple measuring using only a single thread. It ran so slow i had to limit the number of docs to the first N as well.

Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this @dweiss! Alas these benchmarks indeed do not get much love/attention.

janhoy pushed a commit to cominvent/lucene that referenced this pull request May 12, 2021
… updates are distributed (apache#71)

Fixes PerReplicaStatesIntegrationTest.testRestart()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants