Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose correct similarity fns during benchmark runs & re-run benchmarks #773

Merged
merged 19 commits into from
Feb 3, 2021

Conversation

brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Jan 26, 2021

This PR ensures that during benchmarking, the right similarity fns are chosen (i.e. dot_product for DPR and cosine for ES retrieval). This solves #653. This PR will also include a rerun of the retriever query benchmark.

TODO:

  • 0.84 instead of 84 bug
  • Run benchmarks on 0.7.0 @tholor
  • Specify cosine / dot product on website
  • Update benchmarks page on website
  • Post results to social media

@brandenchan brandenchan self-assigned this Jan 26, 2021
@brandenchan
Copy link
Contributor Author

Reran retriever query benchmarks and compared to the numbers reported for Haystack 0.7.0. MAP numbers look a bit better than before across all settings, (perhaps because of filtering misaligned samples #774).

DPR + FAISS HNSW speed has improved significantly

# old
n_docs - queries per second
500k - 3.3168707580865915
100k - 12.84692505158515
10k -   31.34417509568776

# new
n_docs - queries per second
500k - 35.5682201186989
100k - 37.19018429167806
10k   - 39.10424804647853

However, ES + BM25 speed has dropped.

# new
n_docs - queries per second
500k - 64.52084335269566
100k - 96.5938217421237
10k   - 121.87842069254714


# old 
n_docs - queries per second
500k - 91.38510941614904
100k - 162.59167924109505
10k   - 248.9647289083211

Note that ES + DPR speed is about the same as before

# new
n_docs - queries per second
500k - 1.4722693734451493
100k - 6.123145993917852
10k   - 21.642577719925317

# old
n_docs - queries per second
500k - 1.45036114184423
100k - 6.234155953220104
10k   - 24.796429587106445

@tholor Is this expected? Any ideas for what accounts for the drop in speed for BM25?

@tholor
Copy link
Member

tholor commented Jan 27, 2021

DPR + FAISS HNSW speed has improved significantly

Nice, could be related to the recent changes of @lalitpagaria to SQL queries 🤔

@tholor Is this expected? Any ideas for what accounts for the drop in speed for BM25?

No, nothing obvious comes to my mind. @tanaysoni any idea?

@lalitpagaria
Copy link
Contributor

lalitpagaria commented Jan 28, 2021

Wow! 10x increase when number of doc's are more. At this rate it will surpass ES performance when number of documents are more than 1M.

I think improvement done by myself and @tanaysoni have improved performance.

How about having benchmarks for SQL doc store as well? (Like with SQLite, MySQL and posgres)

Regarding ES performance, how we are using ES? Means as single node or in clusters mode? How much heap size assigned to it? Swapping enabled or disabled?

Just suggestion, can we use Cassandra as well here? SyllaDB is very high performing Open Source Cassandra written in pure C can be used.

@lalitpagaria lalitpagaria mentioned this pull request Jan 28, 2021
4 tasks
@tholor tholor changed the title Choose correct similarity fns during benchmark runs Choose correct similarity fns during benchmark runs & re-run benchmarks Feb 1, 2021
@tholor
Copy link
Member

tholor commented Feb 1, 2021

Just rerun all benchmarks on the EC2 image that we previously used + elasticsearch 7.9.2.
From what it seems, there is a slight drop in BM25 speed when moving from 7.9.2 to 7.10 - the remaining difference to your runs is still unclear to me @brandenchan . Hypotheses: i) differences in the EC2 image (e.g. ubuntu 20 vs 18, cuda version, versions of other dependencies ...) or ii) warmup / caching of elastic (possible there are effects between subsequent runs causing different numbers if you run for 500k in isolation or after the 100k runs etc 🤔)

@tholor
Copy link
Member

tholor commented Feb 1, 2021

@brandenchan if the numbers are looking good to you, can you please update the markdown files for the benchmarks on the website so that we have the correct numbers for each version in the dropdown?

@brandenchan
Copy link
Contributor Author

I had a look through the newly reported numbers and compared them to the benchmark for 0.6.0 (0.7.0 is the same).

Retriever query speed
===============
DPR/Elastic   	        about 30% slower
BM25/Elastic 	        significant negative divergence at 10k, 10% divergence for rest
DPR/FAISS flat	generally faster
DPR/FAISS HNSW	scales significantly better

Retriever Performance
=====================
DPR/Elastic		index speed 5% slower, query speed 30% slower, map 4.5%points better
BM25/Elastic	        No significant change
DPR/FAISS flat	index speed 10% slower, query speed 50% faster, MAP about the same
DPR/FAISS HNSW	Index speed about 10% slower, query speed 200% faster, MAP about the same

Retriever MAP
=============
DPR/Elastic		Improvement across the board
BM25/Elastic	        No significant change
DPR/FAISS flat	No significant change
DPR/FAISS HNSW	No significant change

Reader speed
============
All models between 5% and 50% faster
Very slight degradation in F1

The speed of DPR/Elastic seems to have dropped. One factor could be the switching from cosine to dot product similarity. It may be that the dot product implementation is slower. On the flip side, its performance is noticeably better.

Reader speed improvements may be due to the implementation of fast tokenizers.

Not totally sure where the ES/BM25 speed instability is coming from. Could be variability between runs / maybe the building up and tearing down of document stores is causing problems in measuring speed.

@brandenchan
Copy link
Contributor Author

In future, we should do more to control for the environment in which we run benchmarks. We should:

  • always run using pip install -e . --upgrade
  • fix versions of pytorch / transformers
  • fix ES = 7.9.2
  • ensure we're using an ubuntu 18 image on EC2
  • run benchmarks from a docker

@brandenchan brandenchan merged commit f3a3b73 into master Feb 3, 2021
@brandenchan brandenchan deleted the rebenchmark branch February 3, 2021 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants