# Deep Learning-based Text Classification Report

This notebook is a written version of the report based on the paper “Deep Learning–based Text Classification: A Comprehensive Review” and includes implementations on the AG News dataset.

## Abstract
Text classification is a core task in natural language processing (NLP). This notebook summarizes the techniques discussed in the paper “Deep Learning–based Text Classification: A Comprehensive Review” and presents practical implementations of RNN, CNN, and Transformer models on the AG News dataset. We demonstrate how each model performs and discuss insights and future directions.

## I. Introduction
Text classification, or categorization, aims to assign text units to predefined categories. Applications include sentiment analysis, spam detection, news classification, and customer feedback interpretation. The growth of text data has driven the need for automated classification, which deep learning models address effectively.

## II. Methods Reviewed
The referenced paper organizes deep learning-based models into various architectures, including feed-forward networks, RNNs, CNNs, attention-based models, transformers, and graph neural networks. Each category contributes uniquely: RNNs capture sequential dependencies; CNNs identify local patterns; transformers model long-range context efficiently.

## III. Dataset and Preprocessing
The AG News dataset comprises four news categories: World, Sports, Business, and Sci/Tech. Data preprocessing includes tokenizing with TensorFlow's TextVectorization layer, truncating or padding sequences to 200 tokens, and batching data for training and evaluation.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models
import tensorflow_datasets as tfds

(train_ds, test_ds), ds_info = tfds.load(
    'ag_news_subset',
    split=['train', 'test'],
    as_supervised=True,
    with_info=True
)

tokenizer = tf.keras.layers.TextVectorization(max_tokens=20000, output_sequence_length=200)
train_text = train_ds.map(lambda x, y: x)
tokenizer.adapt(train_text)

def preprocess(text, label):
    text = tokenizer(text)
    return text, label

train_ds = train_ds.map(preprocess).shuffle(10000).batch(32).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.map(preprocess).batch(32).prefetch(tf.data.AUTOTUNE)



Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/ag_news_subset/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/2 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/ag_news_subset/incomplete.S7DCEV_1.0.0/ag_news_subset-train.tfrecord*...: …

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/ag_news_subset/incomplete.S7DCEV_1.0.0/ag_news_subset-test.tfrecord*...:  …

Dataset ag_news_subset downloaded and prepared to /root/tensorflow_datasets/ag_news_subset/1.0.0. Subsequent calls will reuse this data.


## IV. Model Implementation

In [3]:
# RNN Model
model_rnn = models.Sequential([
    layers.Embedding(20000, 128),
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dense(64, activation='relu'),
    layers.Dense(4, activation='softmax')
])

model_rnn.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_rnn.fit(train_ds, validation_data=test_ds, epochs=10)

Epoch 1/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1080s[0m 286ms/step - accuracy: 0.8401 - loss: 0.4386 - val_accuracy: 0.9063 - val_loss: 0.2766
Epoch 2/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1148s[0m 298ms/step - accuracy: 0.9262 - loss: 0.2120 - val_accuracy: 0.9093 - val_loss: 0.2713
Epoch 3/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1155s[0m 297ms/step - accuracy: 0.9466 - loss: 0.1512 - val_accuracy: 0.9099 - val_loss: 0.3040
Epoch 4/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1155s[0m 295ms/step - accuracy: 0.9639 - loss: 0.1014 - val_accuracy: 0.9029 - val_loss: 0.3431
Epoch 5/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1096s[0m 292ms/step - accuracy: 0.9762 - loss: 0.0659 - val_accuracy: 0.9030 - val_loss: 0.4359
Epoch 6/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1096s[0m 292ms/step - accuracy: 0.9845 - loss: 0.0446 - val_accuracy: 0.8967 - val

<keras.src.callbacks.history.History at 0x79ede1c611d0>

In [4]:
# CNN Model
model_cnn = models.Sequential([
    layers.Embedding(20000, 128),
    layers.Conv1D(128, 5, activation='relu'),
    layers.GlobalMaxPooling1D(),
    layers.Dense(64, activation='relu'),
    layers.Dense(4, activation='softmax')
])

model_cnn.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_cnn.fit(train_ds, validation_data=test_ds, epochs=10)

Epoch 1/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m450s[0m 119ms/step - accuracy: 0.8165 - loss: 0.4817 - val_accuracy: 0.9083 - val_loss: 0.2739
Epoch 2/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m500s[0m 118ms/step - accuracy: 0.9328 - loss: 0.1954 - val_accuracy: 0.9064 - val_loss: 0.3086
Epoch 3/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m443s[0m 118ms/step - accuracy: 0.9620 - loss: 0.1083 - val_accuracy: 0.9053 - val_loss: 0.3599
Epoch 4/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m449s[0m 119ms/step - accuracy: 0.9817 - loss: 0.0558 - val_accuracy: 0.9032 - val_loss: 0.4693
Epoch 5/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m444s[0m 118ms/step - accuracy: 0.9886 - loss: 0.0349 - val_accuracy: 0.9000 - val_loss: 0.5299
Epoch 6/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m493s[0m 116ms/step - accuracy: 0.9910 - loss: 0.0269 - val_accuracy: 0.8996 - val_loss:

<keras.src.callbacks.history.History at 0x79ede1f7b390>

In [2]:
# Transformer Model
class TransformerBlock(tf.keras.layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = tf.keras.Sequential([
            layers.Dense(ff_dim, activation='relu'),
            layers.Dense(embed_dim)
        ])
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training=False):
        attn_output = self.att(inputs, inputs)
        out1 = self.layernorm1(inputs + self.dropout1(attn_output, training=training))
        ffn_output = self.ffn(out1)
        return self.layernorm2(out1 + self.dropout2(ffn_output, training=training))

embed_dim = 128
num_heads = 4
ff_dim = 128

inputs = layers.Input(shape=(None,))
x = layers.Embedding(20000, embed_dim)(inputs)
x = TransformerBlock(embed_dim, num_heads, ff_dim)(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(4, activation='softmax')(x)

model_transformer = tf.keras.Model(inputs=inputs, outputs=outputs)
model_transformer.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_transformer.fit(train_ds, validation_data=test_ds, epochs=10)

Epoch 1/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2514s[0m 669ms/step - accuracy: 0.7860 - loss: 0.5168 - val_accuracy: 0.9055 - val_loss: 0.2821
Epoch 2/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2549s[0m 671ms/step - accuracy: 0.9260 - loss: 0.2192 - val_accuracy: 0.9042 - val_loss: 0.3014
Epoch 3/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2517s[0m 664ms/step - accuracy: 0.9407 - loss: 0.1701 - val_accuracy: 0.8959 - val_loss: 0.3757
Epoch 4/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2481s[0m 661ms/step - accuracy: 0.9538 - loss: 0.1256 - val_accuracy: 0.8880 - val_loss: 0.4353
Epoch 5/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2572s[0m 685ms/step - accuracy: 0.9645 - loss: 0.0954 - val_accuracy: 0.8875 - val_loss: 0.5354
Epoch 6/10
[1m3750/3750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2551s[0m 680ms/step - accuracy: 0.9706 - loss: 0.0774 - val_accuracy: 0.8859 - val

<keras.src.callbacks.history.History at 0x7d309b1376d0>

## V. Observations
- RNNs were slower but captured sequence semantics.
- CNNs trained quickly and captured meaningful n-grams.
- Transformers were stable and effective for contextual understanding.

## VI. Future Work
Future efforts include leveraging pre-trained models like BERT, handling multilingual data, enhancing model robustness, and adding interpretability through attention visualization.

## VII. Conclusion
Deep learning models significantly improve text classification. Our implementations validate the insights from the review paper, especially the advantages of transformer-based approaches.