Transformer-Based Text Classification on Reuters Dataset

Project Overview

This project implements a Transformer-based deep learning model for multi-class text classification using the Reuters newswire dataset. The main objective is to analyze how Transformer depth (number of encoder layers) affects classification performance and to compare Transformers with traditional RNN-based architectures.

The task involves classifying news articles into 46 distinct categories, making it a realistic and challenging Natural Language Processing (NLP) problem.

Problem Statement

Traditional sequence models such as RNNs, LSTMs, and GRUs often face limitations including:

Difficulty handling long-range dependencies
Sequential computation that limits parallelism
Degradation of contextual understanding in long sequences

This project addresses the question:

How effectively can Transformer architectures model textual data, and what is the optimal model depth for this task?

Solution Approach

A custom Transformer encoder architecture is implemented using TensorFlow and Keras. The model is trained and evaluated by varying the number of Transformer encoder layers to study performance trends.

Core Components

Token embedding and positional embedding
Multi-head self-attention mechanism
Feed-forward neural networks
Residual connections with layer normalization
Global average pooling for document-level classification

Model Architecture and Configuration

Vocabulary size: 10,000
Maximum sequence length: 200
Embedding dimension: 32
Number of attention heads: 4
Number of output classes: 46
Optimizer: Adam
Loss function: Sparse Categorical Crossentropy

Three Transformer configurations were evaluated:

3 encoder layers
5 encoder layers
7 encoder layers

Experiments and Evaluation

Model performance was evaluated using:

Classification accuracy
Weighted F1-score
Confusion matrices

Performance Summary

Number of Transformer Layers	Accuracy	F1 Score
3 Layers	0.7511	0.7461
5 Layers	0.7427	0.7377
7 Layers	0.7235	0.7156

Best-performing model: Transformer with 3 encoder layers

Increasing the number of layers beyond this point resulted in reduced performance, indicating overfitting and optimization challenges for deeper architectures on this dataset.

Comparison with RNN-Based Models

The Transformer models were compared with previously implemented sequence models:

Simple RNN
LSTM
GRU
Bidirectional RNN variants

The Transformer architecture demonstrated competitive and often superior F1-scores, highlighting its ability to capture contextual relationships more effectively than traditional recurrent models.

Visual Analysis

The project generates:

Confusion matrices for each Transformer configuration
F1-score comparison plots between RNN-based models and Transformer models

These visualizations provide deeper insight into class-wise performance and overall model behavior.

Technologies Used

Python
TensorFlow / Keras
NumPy
Matplotlib and Seaborn
Scikit-learn

Dataset

Reuters Newswire Dataset
Training samples: 8,982
Test samples: 2,246
Number of categories: 46

Key Takeaways

Transformer architectures are highly effective for text classification tasks
Deeper models do not always guarantee better performance
Optimal depth depends on dataset size and complexity
F1-score is a crucial metric for multi-class NLP problems

Future Improvements

Hyperparameter tuning for attention heads and embedding size
Integration of pretrained word embeddings
Learning rate scheduling and regularization
Fine-tuning pretrained language models such as BERT

Author

Aadithya K L

This project focuses on understanding model behavior and architectural trade-offs rather than treating deep learning as a black box.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
KLAadithya_Deep_learning_Project.ipynb		KLAadithya_Deep_learning_Project.ipynb
KLAadithya_Deep_learning_Project.pdf		KLAadithya_Deep_learning_Project.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer-Based Text Classification on Reuters Dataset

Project Overview

Problem Statement

Solution Approach

Core Components

Model Architecture and Configuration

Experiments and Evaluation

Performance Summary

Comparison with RNN-Based Models

Visual Analysis

Technologies Used

Dataset

Key Takeaways

Future Improvements

Author

About

Uh oh!

Releases

Packages

Languages

Aadithya-kl/Deep-learning

Folders and files

Latest commit

History

Repository files navigation

Transformer-Based Text Classification on Reuters Dataset

Project Overview

Problem Statement

Solution Approach

Core Components

Model Architecture and Configuration

Experiments and Evaluation

Performance Summary

Comparison with RNN-Based Models

Visual Analysis

Technologies Used

Dataset

Key Takeaways

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages