Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latest benchmark run #652

Merged
merged 3 commits into from
Dec 10, 2020
Merged

Add latest benchmark run #652

merged 3 commits into from
Dec 10, 2020

Conversation

tholor
Copy link
Member

@tholor tholor commented Dec 3, 2020

Add results from the latest full benchmark run

Reader

EM f1 top_n_accuracy top_n reader_time seconds_per_query passages_per_second reader error
0 0.783668 0.826298 0.974296 5 124.916 0.0105272 98.8664 deepset/roberta-base-squad2
1 0.743974 0.789003 0.972021 5 67.8706 0.00571976 181.964 deepset/minilm-uncased-squad2
2 0.694758 0.743267 0.955756 5 116.457 0.00981437 106.047 deepset/bert-base-cased-squad2
3 0.790072 0.832949 0.976909 5 305.629 0.0257567 40.4085 deepset/bert-large-uncased-whole-word-masking-squad2
4 0.803472 0.846217 0.974212 5 305.064 0.0257091 40.4833 deepset/xlm-roberta-large-squad2
5 0.373083 0.423425 0.953902 5 76.9868 0.00648802 160.417 distilbert-base-uncased-distilled-squad

Retriever Indexing

retriever doc_store n_docs indexing_time docs_per_second date_time error
1 dpr elasticsearch 10000 135.805 73.6351 2020-12-02 06:51:48.587178
5 dpr elasticsearch 100000 1352.51 73.9364 2020-12-02 07:23:04.264694
9 dpr elasticsearch 500000 6781.02 73.7352 2020-12-02 10:10:42.147031
0 elastic elasticsearch 10000 20.6943 483.224 2020-12-02 06:49:00.317977
4 elastic elasticsearch 100000 206.471 484.329 2020-12-02 06:59:54.055199
8 elastic elasticsearch 500000 1032.15 484.427 2020-12-02 08:16:15.828533
2 dpr faiss_flat 10000 95.1017 105.151 2020-12-02 06:53:59.472952
6 dpr faiss_flat 100000 954.461 104.771 2020-12-02 07:39:56.194345
10 dpr faiss_flat 500000 4865.15 102.772 2020-12-02 11:34:34.726687
3 dpr faiss_hnsw 10000 103.255 96.8477 2020-12-02 06:56:14.230579
7 dpr faiss_hnsw 100000 1093.96 91.4109 2020-12-02 07:58:43.508489
11 dpr faiss_hnsw 500000 5784.85 86.4327 2020-12-02 13:11:43.328380

Retriever Querying

retriever doc_store n_docs n_queries retrieve_time queries_per_second seconds_per_query recall map top_k date_time error
1 dpr elasticsearch 10000 5791 233.542 24.7964 0.0403284 0.96909 0.880845 10 2020-12-02 13:18:27.808539
5 dpr elasticsearch 100000 5791 928.915 6.23416 0.160407 0.939734 0.821224 10 2020-12-02 13:53:44.689757
9 dpr elasticsearch 500000 5791 3992.8 1.45036 0.689483 0.891901 0.730208 10 2020-12-02 17:35:25.795083
0 elastic elasticsearch 10000 5791 23.2603 248.965 0.00401663 0.810395 0.660997 10 2020-12-02 13:13:03.957613
4 elastic elasticsearch 100000 5791 35.6168 162.592 0.00615038 0.716802 0.559593 10 2020-12-02 13:33:30.417021
8 elastic elasticsearch 500000 5791 63.3692 91.3851 0.0109427 0.623899 0.452459 10 2020-12-02 16:08:13.070376
2 dpr faiss_flat 10000 5791 257.674 22.4742 0.0444955 0.974616 0.897899 10 2020-12-02 13:23:51.002905
6 dpr faiss_flat 100000 5791 1182.71 4.89638 0.204233 0.95752 0.863012 10 2020-12-02 14:18:14.837806
3 dpr faiss_hnsw 10000 5791 184.755 31.3442 0.0319039 0.972198 0.896188 10 2020-12-02 13:28:33.415220
7 dpr faiss_hnsw 100000 5791 450.769 12.8469 0.0778396 0.939907 0.848688 10 2020-12-02 15:10:44.114148
8 dpr faiss_flat 500000 5791 5365.81 1.07924 0.926577 0.929546 0.804583 10 2020-12-02 23:14:44.503864
9 dpr faiss_hnsw 500000 5791 1745.92 3.31687 0.301489 0.882058 0.765677 10 2020-12-03 00:18:53.376265

@tholor
Copy link
Member Author

tholor commented Dec 3, 2020

Noticeable changes:

  • Reader speed improved significantly (e.g. MiniLM from 98 passages/sec to 181)
  • Reader accuracy improved a bit (e.g. RoBERTA)
  • mAP of DPR with ElasticsearchDocumentStore decreased and is not in line with FAISS flat anymore. (Not sure why 🤔 . Possibly due to new elasticsearch version 7.9.0. We need to investigate this in a separate issue)

@lalitpagaria
Copy link
Contributor

Bit curious about this odd number 5791 for n_queries 🤔

Few minor thing observed -

  • retriever benchmark tables are not sorted according to serial number unlike reader's benchmark
  • map should printed as be mAP
  • reader's benchmark does not have date_time
  • It is good to include host machine specification where benchmark is performed
  • Not sure if relevant: How about including (avg, peak, 99%tile) CPU and Memory consumption during each step.

@tholor
Copy link
Member Author

tholor commented Dec 5, 2020

Bit curious about this odd number 5791 for n_queries thinking

We run this on the natural questions eval dataset, which includes 5791 questions.

It is good to include host machine specification where benchmark is performed

At this point, it's always a p3.2xlarge with a V100 GPU and list those details on https://haystack.deepset.ai/bm/benchmarks/

map should printed as be mAP
reader's benchmark does not have date_time
retriever benchmark tables are not sorted according to serial number unlike reader's benchmark

Seems like very minor things to me, but feel free to fix it :)

Not sure if relevant: How about including (avg, peak, 99%tile) CPU and Memory consumption during each step.

Not a priority for now, but I could see that becoming relevant at a later point in time...

@brandenchan
Copy link
Contributor

The scale of the accuracy and speed measures are very different by default. #675 should change this so we don't have to manually edit data

Copy link
Contributor

@brandenchan brandenchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually checked this on the staging website environment and it looks good to me

@tholor tholor merged commit 149d98a into master Dec 10, 2020
@tholor tholor deleted the new_benchmark_run branch December 10, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants