Add latest benchmark run #652

tholor · 2020-12-03T06:29:50Z

Add results from the latest full benchmark run

Reader

	EM	f1	top_n_accuracy	top_n	reader_time	seconds_per_query	passages_per_second	reader
0	0.783668	0.826298	0.974296	5	124.916	0.0105272	98.8664	deepset/roberta-base-squad2
1	0.743974	0.789003	0.972021	5	67.8706	0.00571976	181.964	deepset/minilm-uncased-squad2
2	0.694758	0.743267	0.955756	5	116.457	0.00981437	106.047	deepset/bert-base-cased-squad2
3	0.790072	0.832949	0.976909	5	305.629	0.0257567	40.4085	deepset/bert-large-uncased-whole-word-masking-squad2
4	0.803472	0.846217	0.974212	5	305.064	0.0257091	40.4833	deepset/xlm-roberta-large-squad2
5	0.373083	0.423425	0.953902	5	76.9868	0.00648802	160.417	distilbert-base-uncased-distilled-squad

Retriever Indexing

	retriever	doc_store	n_docs	indexing_time	docs_per_second	date_time
1	dpr	elasticsearch	10000	135.805	73.6351	2020-12-02 06:51:48.587178
5	dpr	elasticsearch	100000	1352.51	73.9364	2020-12-02 07:23:04.264694
9	dpr	elasticsearch	500000	6781.02	73.7352	2020-12-02 10:10:42.147031
0	elastic	elasticsearch	10000	20.6943	483.224	2020-12-02 06:49:00.317977
4	elastic	elasticsearch	100000	206.471	484.329	2020-12-02 06:59:54.055199
8	elastic	elasticsearch	500000	1032.15	484.427	2020-12-02 08:16:15.828533
2	dpr	faiss_flat	10000	95.1017	105.151	2020-12-02 06:53:59.472952
6	dpr	faiss_flat	100000	954.461	104.771	2020-12-02 07:39:56.194345
10	dpr	faiss_flat	500000	4865.15	102.772	2020-12-02 11:34:34.726687
3	dpr	faiss_hnsw	10000	103.255	96.8477	2020-12-02 06:56:14.230579
7	dpr	faiss_hnsw	100000	1093.96	91.4109	2020-12-02 07:58:43.508489
11	dpr	faiss_hnsw	500000	5784.85	86.4327	2020-12-02 13:11:43.328380

Retriever Querying

	retriever	doc_store	n_docs	n_queries	retrieve_time	queries_per_second	seconds_per_query	recall	map	top_k	date_time
1	dpr	elasticsearch	10000	5791	233.542	24.7964	0.0403284	0.96909	0.880845	10	2020-12-02 13:18:27.808539
5	dpr	elasticsearch	100000	5791	928.915	6.23416	0.160407	0.939734	0.821224	10	2020-12-02 13:53:44.689757
9	dpr	elasticsearch	500000	5791	3992.8	1.45036	0.689483	0.891901	0.730208	10	2020-12-02 17:35:25.795083
0	elastic	elasticsearch	10000	5791	23.2603	248.965	0.00401663	0.810395	0.660997	10	2020-12-02 13:13:03.957613
4	elastic	elasticsearch	100000	5791	35.6168	162.592	0.00615038	0.716802	0.559593	10	2020-12-02 13:33:30.417021
8	elastic	elasticsearch	500000	5791	63.3692	91.3851	0.0109427	0.623899	0.452459	10	2020-12-02 16:08:13.070376
2	dpr	faiss_flat	10000	5791	257.674	22.4742	0.0444955	0.974616	0.897899	10	2020-12-02 13:23:51.002905
6	dpr	faiss_flat	100000	5791	1182.71	4.89638	0.204233	0.95752	0.863012	10	2020-12-02 14:18:14.837806
3	dpr	faiss_hnsw	10000	5791	184.755	31.3442	0.0319039	0.972198	0.896188	10	2020-12-02 13:28:33.415220
7	dpr	faiss_hnsw	100000	5791	450.769	12.8469	0.0778396	0.939907	0.848688	10	2020-12-02 15:10:44.114148
8	dpr	faiss_flat	500000	5791	5365.81	1.07924	0.926577	0.929546	0.804583	10	2020-12-02 23:14:44.503864
9	dpr	faiss_hnsw	500000	5791	1745.92	3.31687	0.301489	0.882058	0.765677	10	2020-12-03 00:18:53.376265

tholor · 2020-12-03T06:45:06Z

Noticeable changes:

Reader speed improved significantly (e.g. MiniLM from 98 passages/sec to 181)
Reader accuracy improved a bit (e.g. RoBERTA)
mAP of DPR with ElasticsearchDocumentStore decreased and is not in line with FAISS flat anymore. (Not sure why 🤔 . Possibly due to new elasticsearch version 7.9.0. We need to investigate this in a separate issue)

lalitpagaria · 2020-12-04T23:56:01Z

Bit curious about this odd number 5791 for n_queries 🤔

Few minor thing observed -

retriever benchmark tables are not sorted according to serial number unlike reader's benchmark
map should printed as be mAP
reader's benchmark does not have date_time
It is good to include host machine specification where benchmark is performed
Not sure if relevant: How about including (avg, peak, 99%tile) CPU and Memory consumption during each step.

tholor · 2020-12-05T06:47:06Z

Bit curious about this odd number 5791 for n_queries thinking

We run this on the natural questions eval dataset, which includes 5791 questions.

It is good to include host machine specification where benchmark is performed

At this point, it's always a p3.2xlarge with a V100 GPU and list those details on https://haystack.deepset.ai/bm/benchmarks/

map should printed as be mAP
reader's benchmark does not have date_time
retriever benchmark tables are not sorted according to serial number unlike reader's benchmark

Seems like very minor things to me, but feel free to fix it :)

Not sure if relevant: How about including (avg, peak, 99%tile) CPU and Memory consumption during each step.

Not a priority for now, but I could see that becoming relevant at a later point in time...

brandenchan · 2020-12-10T15:07:00Z

The scale of the accuracy and speed measures are very different by default. #675 should change this so we don't have to manually edit data

brandenchan

I manually checked this on the staging website environment and it looks good to me

tholor added 2 commits December 3, 2020 07:29

add latest benchmark run

bdd91e8

update templates and fix small json errors

abef6ba

tholor requested a review from brandenchan December 3, 2020 06:41

This was referenced Dec 3, 2020

Investigate performance difference between DPR+Elastic and DPR+FAISS Flat #653

Closed

Using Columns names instead of ORM to get all documents #620

Merged

tholor self-assigned this Dec 3, 2020

brandenchan self-assigned this Dec 3, 2020

brandenchan mentioned this pull request Dec 10, 2020

Elastic DocStore with DPR Retriever Performance Drop #674

Closed

Change scale

68c65a9

brandenchan approved these changes Dec 10, 2020

View reviewed changes

tholor merged commit 149d98a into master Dec 10, 2020

tholor deleted the new_benchmark_run branch December 10, 2020 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add latest benchmark run #652

Add latest benchmark run #652

tholor commented Dec 3, 2020 •

edited

Loading

tholor commented Dec 3, 2020 •

edited

Loading

lalitpagaria commented Dec 4, 2020

tholor commented Dec 5, 2020

brandenchan commented Dec 10, 2020

brandenchan left a comment

Add latest benchmark run #652

Add latest benchmark run #652

Conversation

tholor commented Dec 3, 2020 • edited Loading

Reader

Retriever Indexing

Retriever Querying

tholor commented Dec 3, 2020 • edited Loading

lalitpagaria commented Dec 4, 2020

tholor commented Dec 5, 2020

brandenchan commented Dec 10, 2020

brandenchan left a comment

Choose a reason for hiding this comment

tholor commented Dec 3, 2020 •

edited

Loading

tholor commented Dec 3, 2020 •

edited

Loading