Skip to content

Aadithya-kl/Deep-learning

Repository files navigation

Transformer-Based Text Classification on Reuters Dataset

Project Overview

This project implements a Transformer-based deep learning model for multi-class text classification using the Reuters newswire dataset. The main objective is to analyze how Transformer depth (number of encoder layers) affects classification performance and to compare Transformers with traditional RNN-based architectures.

The task involves classifying news articles into 46 distinct categories, making it a realistic and challenging Natural Language Processing (NLP) problem.


Problem Statement

Traditional sequence models such as RNNs, LSTMs, and GRUs often face limitations including:

  • Difficulty handling long-range dependencies
  • Sequential computation that limits parallelism
  • Degradation of contextual understanding in long sequences

This project addresses the question:

How effectively can Transformer architectures model textual data, and what is the optimal model depth for this task?


Solution Approach

A custom Transformer encoder architecture is implemented using TensorFlow and Keras. The model is trained and evaluated by varying the number of Transformer encoder layers to study performance trends.

Core Components

  • Token embedding and positional embedding
  • Multi-head self-attention mechanism
  • Feed-forward neural networks
  • Residual connections with layer normalization
  • Global average pooling for document-level classification

Model Architecture and Configuration

  • Vocabulary size: 10,000
  • Maximum sequence length: 200
  • Embedding dimension: 32
  • Number of attention heads: 4
  • Number of output classes: 46
  • Optimizer: Adam
  • Loss function: Sparse Categorical Crossentropy

Three Transformer configurations were evaluated:

  • 3 encoder layers
  • 5 encoder layers
  • 7 encoder layers

Experiments and Evaluation

Model performance was evaluated using:

  • Classification accuracy
  • Weighted F1-score
  • Confusion matrices

Performance Summary

Number of Transformer Layers Accuracy F1 Score
3 Layers 0.7511 0.7461
5 Layers 0.7427 0.7377
7 Layers 0.7235 0.7156

Best-performing model: Transformer with 3 encoder layers

Increasing the number of layers beyond this point resulted in reduced performance, indicating overfitting and optimization challenges for deeper architectures on this dataset.


Comparison with RNN-Based Models

The Transformer models were compared with previously implemented sequence models:

  • Simple RNN
  • LSTM
  • GRU
  • Bidirectional RNN variants

The Transformer architecture demonstrated competitive and often superior F1-scores, highlighting its ability to capture contextual relationships more effectively than traditional recurrent models.


Visual Analysis

The project generates:

  • Confusion matrices for each Transformer configuration
  • F1-score comparison plots between RNN-based models and Transformer models

These visualizations provide deeper insight into class-wise performance and overall model behavior.


Technologies Used

  • Python
  • TensorFlow / Keras
  • NumPy
  • Matplotlib and Seaborn
  • Scikit-learn

Dataset

  • Reuters Newswire Dataset
  • Training samples: 8,982
  • Test samples: 2,246
  • Number of categories: 46

Key Takeaways

  • Transformer architectures are highly effective for text classification tasks
  • Deeper models do not always guarantee better performance
  • Optimal depth depends on dataset size and complexity
  • F1-score is a crucial metric for multi-class NLP problems

Future Improvements

  • Hyperparameter tuning for attention heads and embedding size
  • Integration of pretrained word embeddings
  • Learning rate scheduling and regularization
  • Fine-tuning pretrained language models such as BERT

Author

Aadithya K L

This project focuses on understanding model behavior and architectural trade-offs rather than treating deep learning as a black box.