AAQAD 17,000+ Arabic Questions & Answers dataset

Adel Atef, Bassam Mattar, Sandra Sherif, Eman Elrefai, Marwan Torki
Alexandria University, Egypt
eng-adel.meleka1520@alexu.edu.eg, eng-bassam.mattar1520@alexu.edu.eg, sandra.sherif@yahoo.com, eman.lotfy.elrefai@gmail.com ,mtorki@alexu.edu.eg

Current Arabic Machine Reading for Question Answering datasets suffers from important shortcomings. The available datasets are either small-sized high-quality collection or large-sized low-quality dataset. To address the aforementioned problem we present our AlexU Arabic Question-Answer dataset (AAQAD). AAQAD is a new Arabic reading comprehension large-sized high-quality dataset consisting of 17,000+ questions. To collect the AAQAD dataset, we present a fully automated data collector. Our collector works on a set of Arabic Wikipedia articles for extractive question answering task. The chosen articles match the articles used in the well-known SQuAD dataset. We provide evaluation results on the AAQAD dataset using two state-of-the-art models for machine-reading question answering problems Namely, BERT and BIDAF models which result in 0.37 and 0.32 F-1 measure on AAQAD dataset.

Keywords

Automated dataset collection, Arabic Machine Comprehension, Question Answering System, Arabic Machine Reading for Question Answering (A-MRQA), Arabic NLP, Answer Extraction.

Project Description

The project is divided into 2 main phases:

Data Collection

We implemented an automatic dataset generator system capapble of generating arabic dataset with question-answer pairs.

Baseline Models

We plugged our dataset to known baseline models to evaluate some interesting results.

Getting Started & Prerequisites

To reprocuce results given by the paper, you just need to open any required .ipynb in colab and run the cell in order from top to bottom to install requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
AQQAD 1.0		AQQAD 1.0
Baselines		Baselines
Data Collection		Data Collection
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AAQAD 17,000+ Arabic Questions & Answers dataset

Keywords

Project Description

Data Collection

Baseline Models

Getting Started & Prerequisites

Authors

About

Releases

Packages

Contributors 3

Languages

EmanElrefai/AAQAD

Folders and files

Latest commit

History

Repository files navigation

AAQAD 17,000+ Arabic Questions & Answers dataset

Keywords

Project Description

Data Collection

Baseline Models

Getting Started & Prerequisites

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages