Arabic Text Summarization with AraBert Project Report

Executive Summary

The "Arabic Text Summarization with AraBERT" project aimed to harness the power of AraBERT, a transformer-based model, for summarizing Arabic language text. AraBERT was fine-tuned on an Arabic summarization dataset to produce concise and coherent summaries.

Objectives

Explore AraBERT Architecture: Investigate the architecture of AraBERT and its applicability to Arabic text summarization.
Fine-Tuning Process: Describe the process of fine-tuning AraBERT on an Arabic summarization dataset.
Performance Evaluation: Evaluate the performance of the fine-tuned model on a validation dataset.
Case Studies: Present case studies showcasing the model's effectiveness in summarizing diverse Arabic text.

Definitions

ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

ROUGE is a set of metrics used for evaluating the quality of summaries by comparing them to reference summaries. It includes measures such as ROUGE-N (unigrams, bigrams, etc.), ROUGE-L (longest common subsequence), and ROUGE-W (weighted longest common subsequence). These metrics provide a quantitative assessment of the overlap and content similarity between generated and reference summaries.

AraBERT (Arabic BERT)

AraBERT is a transformer-based model introduced for processing Arabic language text. It has been fine-tuned for various natural language processing tasks, including summarization, by leveraging transfer learning.

Methodology

Data Collection

The project utilized the csebuetnlp/xlsum dataset for Arabic summarization. This dataset consists of a variety of documents with corresponding summaries.

Preprocessing

Data preprocessing involved tokenization using the AraBERT tokenizer, handling input-output sequences, and ensuring the proper format for training.

Fine-Tuning

AraBERT was fine-tuned on the Arabic summarization dataset using a sequence-to-sequence training approach.

Results and Findings

Model Evaluation

ROUGE Scores

The model's performance was evaluated using various ROUGE metrics, including ROUGE-N, ROUGE-L, and ROUGE-W. The following table presents the key ROUGE scores:

Metric	Score
ROUGE-1	2.3202
ROUGE-2	0.0977
ROUGE-L	2.3113

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
arabic_summarization_with_araBert.ipynb		arabic_summarization_with_araBert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Text Summarization with AraBert Project Report

Executive Summary

Objectives

Definitions

ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

AraBERT (Arabic BERT)

Methodology

Data Collection

Preprocessing

Fine-Tuning

Results and Findings

Model Evaluation

About

Releases

Packages

Languages

aliasharf123/arabic-summarization-with-araBert

Folders and files

Latest commit

History

Repository files navigation

Arabic Text Summarization with AraBert Project Report

Executive Summary

Objectives

Definitions

ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

AraBERT (Arabic BERT)

Methodology

Data Collection

Preprocessing

Fine-Tuning

Results and Findings

Model Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages