Question Answering over Linked Data (QALD)

QALD is a series of evaluation campaigns on question answering over linked data, which aims at providing an up-to-date benchmark for assessing and comparing state-of-the-art systems that mediate between a user, expressing his or her information need in natural language, and RDF data. Thus, it targets all researchers and practitioners working on querying Linked Data, natural language processing for question answering, multilingual information retrieval and related topics. The main goal is to gain insights into the strengths and shortcomings of different approaches and into possible solutions for coping with the large, heterogeneous and distributed nature of Semantic Web data.

QALD challenge began in 2011 and is developing benchmarks that are increasingly being used as standard evaluation venue for question answering over Linked Data. Overviews of past instantiations of the challenge are available from the CLEF Working Notes, CEUR workshop notes as well as ESWC proceedings.

The key challenge for QA over Linked Data is to translate a user's natural language query into such a form that it can be evaluated using standard Semantic Web query processing and inferencing techniques. The main task of QALD therefore is the following:

Given one or several RDF dataset(s) as well as additional knowledge sources and natural language questions or keywords, return the correct answers or a SPARQL query that retrieves these answers.

Table of contents

QALD-9-plus
QALD 9
QALD 8
QALD 7
QALD 6
QALD 5
QALD 4
QALD 3
QALD 2
QALD 1

QALD-9-PLUS

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
QAnswer	2022	-	-	30.39 (Macro F1)	EN	Perevalov et. al.
QAnswer	2022	-	-	19.98 (Macro F1)	DE	Perevalov et. al.
QAnswer	2022	-	-	15.06 (Macro F1)	FR	Perevalov et. al.
QAnswer	2022	-	-	9.57 (Macro F1)	RU	Perevalov et. al.
QAnswer	2022	-	-	5.27 (Micro F1)	EN	Perevalov et. al.
QAnswer	2022	-	-	2.19 (Micro F1)	DE	Perevalov et. al.
QAnswer	2022	-	-	4.06 (Micro F1)	FR	Perevalov et. al.
QAnswer	2022	-	-	1.53 (Micro F1)	RU	Perevalov et. al.
DeepPavlov	2022	-	-	12.40 (Macro F1)	EN	Perevalov et. al.
DeepPavlov	2022	-	-	0.13 (Micro F1)	EN	Perevalov et. al.
Platypus	2022	-	-	15.03 (Macro F1)	EN	Perevalov et. al.
Platypus	2022	-	-	1.26 (Micro F1)	EN	Perevalov et. al.
DeepPavlov	2022	-	-	8.70 (Macro F1)	RU	Perevalov et. al.
DeepPavlov	2022	-	-	0.05 (Micro F1)	RU	Perevalov et. al.
Platypus	2022	-	-	4.17 (Macro F1)	FR	Perevalov et. al.
Platypus	2022	-	-	0.00 (Micro F1)	FR	Perevalov et. al.

QALD-9

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
SGPT_Q,K	2022	-	-	67.82	EN	Al Hasan Rony et al
SPARQLGEN	2023	-	-	67.07	EN	Kovriguina et al.
SGPT_Q	2022	-	-	60.22	EN	Al Hasan Rony et al
Stage I No Noise [2]	2022	80.40	42.10	55.30	EN	Purkayastha et al.
LingTeQA [1]	2020	52.60	64.20	53.50	EN	P. Nhuan et al
qaSQP	2019	45.80	47.10	46.30	EN	Zheng et. al.
chatGPT	2023	-	-	45.71	EN	Tan et al.
GPT-3.5v3	2023	-	-	46.19	EN	Tan et al.
NSpM	2022	-	-	45.34	EN	Al Hasan Rony et al
GPT-3.5v2	2023	-	-	44.95	EN	Tan et al.
KGQAn	2023	49.81	39.39	43.99	EN	Omar et al.
Ensemble BR framework	2023	42.40	47.60	43.00	EN	Chen et al.
KGQAn	2021	50.61	34.67	41.15	EN	Omar et al.
Light-QAWizard	2022	39.80	42.60	40.60	EN	Chen et al.
Stage-I Part Noise [7]	2022	63.90	28.70	39.60	EN	Purkayastha et al.
GPT-3	2023	-	-	38.54	EN	Tan et al.
Stage-II w/o type [5]	2022	59.40	26.10	36.20	EN	Purkayastha et al.
Stage-II w/ type [6]	2022	59.40	26.10	36.20	EN	Purkayastha et al.
Stage-I Full Noise [8]	2022	82.60	23	36.00	EN	Purkayastha et al.
QAWizard	2022	31.10	46.90	33.00	-	EN
QAmp	2019	25	50	33	EN	Vakulenko et. al.
QAwizard	2023	31.10	46.90	33	EN	Chen et al.
WDAqua-core0	2021	-	-	32	EN	Orogat et al.
NSQA	2021	31.89	32.05	31.26	EN	P.Kapanipathi et alf
DTQA	2021	31.41	32.16	30.88	EN	Abdelaziz et al.
NSQA	2021	31.40	32.10	30.80	EN	M. Borroto et al
sparql-qa	2021	31	32.48	30.60	EN	M. Borroto et al
FLAN-T5	2023	-	-	30.17	EN	Tan et al.
DTQA	2023	31.40	32.20	30.10	EN	Chen et al.
gAnswer	2021	-	-	30	EN	Orogat et al.
gAnswer	2021	29.34	32.68	29.81	EN	Abdelaziz et al.
gAnswer [3]	2021	29.30	32.70	29.80	EN	Purkayastha et al.
gAnswer2	2019	29.30	32.70	29.80	EN	Zheng et. al.
gAnswer2	2023	29.30	32.70	29.80	EN	Chen et al.
gAnswer	2021	60.70	31.60	29.60	EN	L Siciliani et al.
TeBaQA	2022	-	-	28.81	EN	Al Hasan Rony et al
WDAqua-core1	2019	22	38	28	EN	Vakulenko et. al.
SQG	2022	-	-	27.85	EN	Al Hasan Rony et al
WDAqua-core1	2019	26.10	26.70	25	EN	Zheng et. al.
WDAqua	2023	26.10	26.70	25	EN	Chen et al.
WDAqua-core1	2021	26.09	26.70	24.99	EN	Abdelaziz et al.
qaSearch	2019	23.60	24.10	23.70	EN	Zheng et. al.
QAnswer	2021	45.90	22.20	19.70	EN	L Siciliani et al.
QASparql	2021	-	-	19	EN	Orogat et al.
TeBaQA	2021	64.40	14.10	13.90	EN	L Siciliani et al.
TeBaQA	2019	12.90	13.40	13	EN	Zheng et. al.
QASystem	2019	9.70	11.60	9.80	EN	Zheng et. al.
AskNow	2021	-	-	8	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	7	EN	Orogat et al.
Elon	2021	4.90	5.30	5	EN	Steinmetz et al.

[1] DBpedia 2016-10.
[2] DBpedia 2016-10.
[3] DBpedia 2016-10.
[4] DBpedia 2016-10.
[5] DBpedia 2016-10.
[6] DBpedia 2016-10.
[7] DBpedia 2016-10.
[8] DBpedia 2016-10.

QALD-8

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Accuracy	Language	Reported by
Ensemble BR framework	2023	52.20	56.10	51.70	-	EN	Chen et al.
qaSQP	2019	45.90	46.30	46.10	-	EN	Zheng et. al.
Light-QAWizard	2022	46.20	50	45.70	-	EN	Chen et al.
gAnswer2	2023	38.62	39.02	38.80	-	EN	Chen et al.
gAnswer2	2019	38.60	39	38.80	-	EN	Zheng et. al.
gAnswer	2021	38.62	39.02	38.80	-	EN	Steinmetz et al.
WDAqua-core0	2021	39.12	40.65	38.72	-	EN	Steinmetz et al.
WDAqua-core0	2019	39.10	40.70	38.70	-	EN	Zheng et. al.
WDAqua	2023	39.10	40.70	38.70	-	EN	Chen et al.
QAwizard	2023	37.50	35.80	34.30	-	EN	Chen et al.
WDAqua-core0	2021	-	-	33	-	EN	Orogat et al.
QASparql	2021	-	-	30	-	EN	Orogat et al.
qaSearch	2019	24.40	24.40	24.40	-	EN	Zheng et. al.
AskNow	2021	-	-	13	-	EN	Orogat et al.
Platypus	2021	-	-	6	-	EN	Orogat et al.
QAKiS	2021	6.10	5.28	5.63	-	EN	Steinmetz et al.
QAKiS	2019	6.10	5.30	5.60	-	EN	Zheng et. al.
Qanary(TM+DP+QB)	2021	-	-	4	-	EN	Orogat et al.
Entity Type Tags Modified	2022	-	-	-	88.15	EN	Lin and Lu
SPARQL Generator	2022	-	-	-	40.09	EN	Lin and Lu

QALD-7

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Accuracy	Language	Reported by
LAMA	2019	-	-	90.50	-	EN	Radoev et. al.
LingTeQA [1]	2020	63.40	73.50	64.20	-	EN	D. Nhuan et al
Liang et al.	2021	81.30	52.70	63.90	-	EN	Liang et al.
Ensemble BR framework	2023	59.80	69.60	61.20	-	EN	Chen et al.
Light-QAWizard	2022	56.50	65.20	59.40	-	EN	Chen et al.
QAwizard	2023	59	59	59	-	EN	Chen et al.
gAnswer2	2020	55.70	59.20	55.60	-	EN	Athreya et. al
WDAqua-core0	2021	48.80	53.50	51.10	-	EN	Liang et al.
WDAqua-core0	2020	49	54	51	-	EN	Athreya et. al
gAnswer2	2023	46.90	49.80	48.70	-	EN	Chen et al.
TeBaQA RNN	2020	41.60	42.30	41.70	-	EN	Athreya et. al
GSM	2022	38	39	38	-	EN	Liu et al.
G Maheshwari et. al. Pointwise	2019	28	43	34	-	EN	G Maheshwari et. al.
AQG-Net	2022	30	37	33	-	EN	Liu et al.
gRGCN	2021	31.33	35.41	30.24	-	EN	Wu et al.
WDAqua-core0	2021	-	-	29	-	EN	Orogat et al.
G Maheshwari et. al. Pairwise	2019	22	38	28	-	EN	G Maheshwari et. al.
gGCN	2021	23.34	31.09	24.37	-	EN	Wu et al.
GGNN	2021	21.76	27.51	21.10	-	EN	Wu et al.
Luo et al.	2021	21.17	24.38	20.16	-	EN	Wu et al.
HR-BiLSTM	2022	20	19	19	-	EN	Liu et al.
Yu et al.	2021	19.72	21.03	19.23	-	EN	Wu et al.
STAGG	2021	19.34	24.63	18.61	-	EN	Wu et al.
QASparql	2021	-	-	17	-	EN	Orogat et al.
WDAqua	2023	16	16.20	16.30	-	EN	Chen et al.
AskNow	2021	-	-	15	-	EN	Orogat et al.
Platypus	2021	-	-	8	-	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	6	-	EN	Orogat et al.
Entity Type Tags Modified	2022	-	-	-	76.69	EN	Lin and Lu
SPARQL Generator	2022	-	-	-	60.74	EN	Lin and Lu

[1] Wikidata.

QALD-6

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
gAnswer	2017	70	89	78	EN	Hu et al.
gAnswer	2021	-	-	25	EN	Orogat et al.
WDAqua-core0	2021	-	-	24	EN	Orogat et al.
QASparql	2021	-	-	17	EN	Orogat et al.
AskNow	2021	-	-	9	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	2	EN	Orogat et al.

QALD-5

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
Xser	2020	74	72	73	EN	Diefenbach et al.
UTQA	2016	-	-	65.2	EN	Ben Veyseh
UTQA	2020	-	-	65	EN	Diefenbach et al.
UTQA	2020	55	53	54	ES	Diefenbach et al.
UTQA	2020	53	51	52	FA	Diefenbach et al.
WDAqua-core1	2020	56	41	47	EN	Diefenbach et al.
AskNow	2020	32	34	33	EN	Diefenbach et al.
WDAqua-core1	2020	88	18	30	IT	Diefenbach et al.
QAnswer	2020	34	26	29	EN	Diefenbach et al.
WDAqua-core1	2020	92	16	28	DE	Diefenbach et al.
WDAqua-core1	2020	90	16	28	FR	Diefenbach et al.
WDAqua-core1	2020	88	14	25	ES	Diefenbach et al.
gAnswer	2021	-	-	20	EN	Orogat et al.
SemGraphQA	2020	19	20	20	EN	Diefenbach et al.
WDAqua-core0	2021	-	-	18	EN	Orogat et al.
YodaQA	2020	18	17	18	EN	Diefenbach et al.
QASparql	2021	-	-	12	EN	Orogat et al.
AskNow	2021	-	-	9	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	2	EN	Orogat et al.

QALD-4

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
Zhang et. al.	2016	89	88	88	EN	Zhang et. al.
POMELO	2016	82	87	85	EN	Zhang et. al.
SINA	2016	80	78	79	EN	Zhang et. al.
Xser	2020	72	71	72	EN	Diefenbach et al.
WDAqua-core1	2020	56	30	39	EN	Diefenbach et al.
gAnswer	2020	37	37	37	EN	Diefenbach et al.
CASIA	2020	32	40	36	EN	Diefenbach et al.
WDAqua-core1	2020	90	20	32	DE	Diefenbach et al.
WDAqua-core1	2020	92	20	32	IT	Diefenbach et al.
WDAqua-core1	2020	90	20	32	ES	Diefenbach et al.
WDAqua-core1	2020	86	18	29	FR	Diefenbach et al.
Intui3	2020	23	25	24	EN	Diefenbach et al.
ISOFT	2020	21	26	23	EN	Diefenbach et al.
Hakimov	2020	52	13	21	EN	Diefenbach et al.
gAnswer	2021	-	-	16	EN	Orogat et al.
RO FII	2016	16	16	16	EN	Zhang et. al.
WDAqua-core0	2021	-	-	12	EN	Orogat et al.
QASparql	2021	-	-	8	EN	Orogat et al.
AskNow	2021	-	-	8	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	1	EN	Orogat et al.

QALD-3

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
virtual player	2015	-	-	64.29	EN	Molino et al.
virtual player	2015	-	-	59.47	EN	Molino et al.
WDAqua-core1	2020	64	42	51	EN	Diefenbach et al.
WDAqua-core1	2020	79	28	42	DE	Diefenbach et al.
WDAqua-core1	2020	83	27	41	FR	Diefenbach et al.
gAnswer	2020	40	40	40	EN	Diefenbach et al.
WDAqua-core1	2020	70	26	38	FR	Diefenbach et al.
Zhu et al.	2020	38	42	38	EN	Diefenbach et al.
WDAqua-core1	2020	77	24	37	ES	Diefenbach et al.
CASIA	2013	35	36	36	EN	He et al.
WDAqua-core1	2020	79	23	36	IT	Diefenbach et al.
CASIA	2013	35	36	36	EN	S He et al
RTV	2020	32	34	33	EN	Diefenbach et al.
Intui2	2020	32	32	32	EN	Diefenbach et al.
SINA	2020	32	32	32	EN	Diefenbach et al.
Intui2 [1]	2013	32	32	32	EN	Corina Dima
DEANNA	2020	21	21	21	EN	Diefenbach et al.
SWIP	2020	16	17	17	EN	Diefenbach et al.
gAnswer	2021	-	-	16	EN	Orogat et al.
AskNow	2021	-	-	13	EN	Orogat et al.
WDAqua-core0 [2]	2021	-	-	11	EN	Orogat et al.
QASparql	2021	-	-	6	EN	Orogat et al.
Qanary(TM+DP+QB)	2021	-	-	2	EN	Orogat et al.

[1] DBpedia 3.8.
[2] DBpedia 2016-04.

QALD-2

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
robustQA [1]	2013	68	68	68	EN	Yahya et al.
BELA	2012	73	62	67	EN	Walter et al.
TLDRet [2]	2018	63	63	63	EN	Rahoman and Ichise
SenseAware	2013	51	53	52	EN	Elbedweihy et al.
semanticQA	2013	83	32	46	EN	Hakimov et al.
SemSeK [3]	2013	44	48	46	EN	Lopez et al.
Alexandria	2013	43	46	45	EN	Lopez et al.
QAKiS	2013	39	37	38	EN	Cabrio et al.
MHE	2013	36	40	38	EN	Lopez et al.
QAKiS	2013	39	37	38	EN	Lopez et al.
WolframAlpha	2012	32	30	30.9	EN	Walter et al.
robustQA [4]	2013	50	15	23	EN	Yahya et al.
gAnswer	2021	-	-	21	EN	Orogat et al.
WDAqua-core0	2021	-	-	16	EN	Orogat et al.
AskNow	2021	-	-	10	EN	Orogat et al.
QASparql	2021	-	-	1	EN	Orogat et al.

[1] factoid type questions.
[2] only temporal quetions.
[3] DBpedia 3.7.
[4] list type questions.

QALD-1

Please see the original paper for details about the dataset creation process, data format, task and participating systems.

Leaderboard

Model / System	Year	Precision	Recall	F1	Language	Reported by
FREyA [1]	2013	63	54	58	EN	Lopez et al.
PowerAqua	2013	52	48	50	En	Lopez et al.
gAnswer	2021	-	-	24	EN	Orogat et al.
WDAqua-core0	2021	-	-	14	EN	Orogat et al.
AskNow	2021	-	-	7	EN	Orogat et al.
QASparql	2021	-	-	1	EN	Orogat et al.

[1] DBpedia 3.6.

Go back to the README