# Title: AIDI FINAL PROJECT

#### Group Member Names : Aishwariya Balaji, Pardeep Kaur



### INTRODUCTION:
Text classification is a fundamental task in natural language processing (NLP) with applications in news categorization, sentiment analysis, spam detection, and more. This project focuses on implementing and evaluating a Lightweight TextCNN model for text classification, inspired by the architecture proposed in Light-Weighted CNN for Text Classification
*********************************************************************************************************************
#### AIM :
The primary goal is twofold:

Reproduction — faithfully replicate the performance of the proposed Lightweight TextCNN model using the AG_NEWS dataset.

Contribution — extend the original work by introducing a baseline MLP model for comparison and experimenting with a dual-optimizer training strategy (Adam followed by SGD after validation plateau) to improve generalization.

The AG_NEWS dataset contains news articles labeled into four categories: World, Sports, Business, and Sci/Tech. The models are implemented in PyTorch, with preprocessing handled through the HuggingFace datasets library for efficiency and reproducibility.

By comparing the Lightweight TextCNN to the MLP baseline, this project aims to highlight the advantages of convolutional architectures for capturing local n-gram features in text data, while also exploring training techniques that can improve final performance.
*********************************************************************************************************************
#### Github Repo:

*********************************************************************************************************************
#### DESCRIPTION OF PAPER:
The paper introduces a compact CNN architecture for text classification that reduces computational cost while maintaining high accuracy. It replaces standard convolutions with depthwise separable convolutions, drastically lowering parameter counts. Multiple kernel sizes capture diverse n-gram features, and global max pooling distills the most important signals. Experiments show the model matches or outperforms traditional CNNs on benchmark datasets with significantly fewer resources, making it ideal for deployment in low-resource environments.
*********************************************************************************************************************
#### PROBLEM STATEMENT :
Text classification models often achieve high accuracy but at the cost of large model sizes and high computational requirements, making them impractical for deployment in resource-constrained environments. The challenge is to design a model that maintains strong classification performance while reducing computational complexity and parameter count. This project addresses the problem by reproducing and extending the Lightweight TextCNN architecture proposed by Yadav (2020) and comparing it against a baseline MLP model on the AG_NEWS dataset.
*********************************************************************************************************************
#### CONTEXT OF THE PROBLEM:
With the exponential growth of textual data from news articles, social media, and online platforms, text classification has become a crucial task in natural language processing. While deep learning models such as CNNs and Transformers deliver high accuracy, they often require significant computational power and memory, which limits their use on mobile devices, embedded systems, or real-time applications. This creates a need for lightweight yet accurate architectures that can operate efficiently without sacrificing performance. The Lightweight TextCNN proposed by Yadav (2020) addresses this need by using depthwise separable convolutions to reduce complexity, making it an ideal candidate for scenarios where both speed and accuracy are essential.
*********************************************************************************************************************
#### SOLUTION:
To address the need for efficient yet accurate text classification, this project implements and evaluates the Lightweight TextCNN architecture proposed by Yadav (2020). The model uses depthwise separable convolutions with multiple kernel sizes to capture diverse n-gram features while drastically reducing parameter count and computational cost.

The solution involves:

Reproducing the Lightweight TextCNN using the AG_NEWS dataset via the HuggingFace datasets library.

Implementing a baseline MLP model for performance comparison.

Introducing a dual-optimizer training strategy—starting with Adam for fast convergence, then switching to SGD when validation performance plateaus to improve generalization.

Evaluating both models on accuracy, macro-F1 score, and parameter efficiency.

This approach ensures a balance between high predictive performance and low resource usage, making the model practical for deployment in constrained environments.


# Background
*********************************************************************************************************************


|Reference|Explanation|Dataset/Input|Weakness|
|------|------|------|------|


Reference	Explanation	Dataset/Input	Weakness
Kaggle Customer Shopping Dataset	The dataset captures customer shopping behavior, including transactional, demographic, and product-level information. It is commonly used for sales prediction, customer segmentation, and purchase behavior analysis.	CustomerID, ProductID, PurchaseDate, Quantity, Price, TotalAmount	Missing values in some demographic fields; time-stamped data may be irregular; inconsistent product naming.
Retail Sales Analysis Literature	Prior research shows that combining historical sales data with promotional campaigns and seasonal trends improves demand forecasting accuracy.	Historical sales data, promotion logs, store location info	Some models ignore external factors like holidays or local events, which can reduce prediction accuracy.
Data Cleaning & Preprocessing Studies	Many studies emphasize the importance of data cleaning, normalization, and handling missing or inconsistent values to improve model performance.	Raw transactional data, categorical and numeric fields	Cleaning can be time-consuming; over-cleaning may remove important outliers.
ML-based Prediction Approaches	Machine learning approaches like Random Forest, Gradient Boosting, and XGBoost are widely applied to forecast customer purchases and product demand.	Features derived from past purchase history, product info, customer demographics	Overfitting can occur if dataset is small or highly imbalanced; requires feature engineering.
Visualization & Reporting Tools	Visualization helps identify patterns, trends, and anomalies before model building. Tools like Power BI, Excel, and Python libraries (Matplotlib, Seaborn) are frequently used.	Aggregated metrics, sales trends, time-series data	Visualization alone cannot predict future trends; requires proper statistical or ML models for forecasting.
*********************************************************************************************************************






# Implement paper code :
******************************************************************************************************************

In [6]:
# Install datasets if not already installed
try:
    import datasets
except ImportError:
    import subprocess, sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "datasets"])

import os, re, math, random, json, platform, sys
from collections import Counter

import numpy as np
import torch
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report, confusion_matrix

print("Python:", sys.version)
print("Platform:", platform.platform())
print("PyTorch:", torch.__version__)
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", DEVICE)

# Optional: silence HuggingFace symlink warning on Windows
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"


Python: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Platform: Windows-11-10.0.26100-SP0
PyTorch: 2.5.1
Device: cpu


In [7]:

SEED = 42
def set_seed(seed=SEED):
    random.seed(seed)
    np.random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(SEED)

In [8]:

# 3.1 Load
ds = load_dataset("ag_news")
train_texts = ds["train"]["text"]
train_labels = ds["train"]["label"]
test_texts  = ds["test"]["text"]
test_labels = ds["test"]["label"]

# 3.2 Tokenizer (basic_english-like)
_word_re = re.compile(r"[A-Za-z0-9']+")
def tokenize(text: str):
    return _word_re.findall(text.lower())

# 3.3 Build vocab from train only
PAD, UNK = "<pad>", "<unk>"
min_freq = 2
counter = Counter()
for t in train_texts:
    counter.update(tokenize(t))

itos = [PAD, UNK] + [tok for tok, freq in counter.items() if freq >= min_freq]
stoi = {tok: i for i, tok in enumerate(itos)}
PAD_ID, UNK_ID = stoi[PAD], stoi[UNK]

class SimpleVocab:
    def __init__(self, stoi, itos, unk_id=1):
        self.stoi = stoi
        self.itos = itos
        self.unk_id = unk_id
    def __len__(self): return len(self.itos)
    def __getitem__(self, tok): return self.stoi.get(tok, self.unk_id)

vocab = SimpleVocab(stoi, itos, unk_id=UNK_ID)

# 3.4 Numericalize
MAX_LEN = 256
def encode_text(text: str):
    ids = [vocab[tok] for tok in tokenize(text)]
    ids = ids[:MAX_LEN]
    if len(ids) < MAX_LEN:
        ids += [PAD_ID] * (MAX_LEN - len(ids))
    return ids

def to_tensors(texts, labels):
    X = [encode_text(t) for t in texts]
    y = list(labels)
    X = torch.tensor(X, dtype=torch.long)
    y = torch.tensor(y, dtype=torch.long)  # already 0..3
    return X, y

X_train_full, y_train_full = to_tensors(train_texts, train_labels)
X_test, y_test = to_tensors(test_texts, test_labels)

# 3.5 Train/Val split
X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full, test_size=0.1, random_state=SEED, stratify=y_train_full
)

print("Vocab size:", len(vocab))
print("Shapes:", X_train.shape, X_val.shape, X_test.shape)
print("Label counts (train):", torch.bincount(y_train).tolist())


README.md: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

Vocab size: 46177
Shapes: torch.Size([108000, 256]) torch.Size([12000, 256]) torch.Size([7600, 256])
Label counts (train): [27000, 27000, 27000, 27000]


In [9]:
from torch.utils.data import TensorDataset, DataLoader

BATCH_SIZE = 128
use_pin = (DEVICE == "cuda")

train_dl = DataLoader(TensorDataset(X_train, y_train), batch_size=BATCH_SIZE, shuffle=True,  pin_memory=use_pin)
val_dl   = DataLoader(TensorDataset(X_val,   y_val),   batch_size=BATCH_SIZE, shuffle=False, pin_memory=use_pin)
test_dl  = DataLoader(TensorDataset(X_test,  y_test),  batch_size=BATCH_SIZE, shuffle=False, pin_memory=use_pin)

if not use_pin:
    print("Info: Running on CPU. pin_memory disabled (no impact on correctness).")


Info: Running on CPU. pin_memory disabled (no impact on correctness).


In [10]:

class DepthwiseSeparableConv1d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, padding):
        super().__init__()
        self.depthwise = nn.Conv1d(in_channels, in_channels, kernel_size=kernel_size,
                                   groups=in_channels, padding=padding, bias=False)
        self.pointwise = nn.Conv1d(in_channels, out_channels, kernel_size=1, bias=True)
        self.bn = nn.BatchNorm1d(out_channels)
    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)
        x = self.bn(x)
        return F.relu(x)

class LightweightTextCNN(nn.Module):
    def __init__(self, vocab_size, embed_dim=128, num_classes=4, kernel_sizes=(3,4,5), channels=64, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=PAD_ID)
        self.convs = nn.ModuleList([
            DepthwiseSeparableConv1d(embed_dim, channels, k, padding=k//2)
            for k in kernel_sizes
        ])
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(len(kernel_sizes) * channels, num_classes)
        for m in self.modules():
            if isinstance(m, (nn.Conv1d, nn.Linear)):
                nn.init.kaiming_uniform_(m.weight, a=math.sqrt(5))
                if getattr(m, "bias", None) is not None:
                    nn.init.zeros_(m.bias)
    def forward(self, x):
        x = self.embedding(x)
        x = x.transpose(1, 2)
        feats = []
        for conv in self.convs:
            c = conv(x)
            p = F.adaptive_max_pool1d(c, 1).squeeze(-1)
            feats.append(p)
        h = torch.cat(feats, dim=1)
        h = self.dropout(h)
        return self.fc(h)

class MLPText(nn.Module):
    def __init__(self, vocab_size, embed_dim=128, hidden=256, num_classes=4, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=PAD_ID)
        self.fc1 = nn.Linear(embed_dim * MAX_LEN, hidden)
        self.bn1 = nn.BatchNorm1d(hidden)
        self.drop = nn.Dropout(dropout)
        self.fc2 = nn.Linear(hidden, num_classes)
    def forward(self, x):
        x = self.embedding(x)
        x = x.reshape(x.size(0), -1)
        x = self.drop(F.relu(self.bn1(self.fc1(x))))
        return self.fc2(x)

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

textcnn = LightweightTextCNN(vocab_size=len(vocab)).to(DEVICE)
mlp     = MLPText(vocab_size=len(vocab)).to(DEVICE)
print("TextCNN params:", count_parameters(textcnn))
print("MLP params:", count_parameters(mlp))


NameError: name 'nn' is not defined

In [None]:

def run_epoch(model, loader, criterion, optimizer=None):
    is_train = optimizer is not None
    model.train(is_train)
    losses, ys, yhats = [], [], []
    for xb, yb in loader:
        xb, yb = xb.to(DEVICE), yb.to(DEVICE)
        if is_train:
            optimizer.zero_grad(set_to_none=True)
        logits = model(xb)
        loss = criterion(logits, yb)
        if is_train:
            loss.backward()
            optimizer.step()
        losses.append(loss.item())
        ys.extend(yb.detach().cpu().numpy().tolist())
        yhats.extend(logits.argmax(1).detach().cpu().numpy().tolist())
    return {
        "loss": float(np.mean(losses)),
        "acc": accuracy_score(ys, yhats),
        "f1m": f1_score(ys, yhats, average="macro")
    }

def train_with_dual_optimizer(model, train_dl, val_dl, epochs=8, lr_adam=1e-3, lr_sgd=5e-3,
                              plateau_patience=2, min_delta=1e-4, momentum=0.9):
    criterion = nn.CrossEntropyLoss()
    opt = torch.optim.Adam(model.parameters(), lr=lr_adam)
    using_sgd, best_val, no_improve = False, float("inf"), 0
    history = []
    for ep in range(1, epochs+1):
        tr = run_epoch(model, train_dl, criterion, optimizer=opt)
        va = run_epoch(model, val_dl,   criterion, optimizer=None)
        improved = best_val - va["loss"] > min_delta
        if improved:
            best_val, no_improve = va["loss"], 0
        else:
            no_improve += 1
        if (no_improve >= plateau_patience) and (not using_sgd):
            opt = torch.optim.SGD(model.parameters(), lr=lr_sgd, momentum=momentum)
            using_sgd = True
        history.append({"epoch": ep, "train": tr, "val": va, "optimizer": "SGD" if using_sgd else "Adam"})
        print(f"Epoch {ep:02d} | opt={'SGD' if using_sgd else 'Adam'} | "
              f"train loss {tr['loss']:.4f} acc {tr['acc']:.3f} | "
              f"val loss {va['loss']:.4f} acc {va['acc']:.3f}")
    return model, history


In [None]:
# ==== FAST MODE (smoke test) ====
# 1) Subsample the dataset (keep class balance)
def balanced_subset(X, y, per_class=2000, seed=42):
    import torch, numpy as np, random
    g = torch.Generator().manual_seed(seed)
    idxs = []
    y_np = y.numpy()
    for c in range(4):
        cls_idx = np.where(y_np == c)[0]
        take = min(per_class, len(cls_idx))
        sel = np.random.default_rng(seed+c).choice(cls_idx, size=take, replace=False)
        idxs.extend(sel.tolist())
    random.Random(seed).shuffle(idxs)
    return X[idxs], y[idxs]

X_train_fast, y_train_fast = balanced_subset(X_train, y_train, per_class=2000)  # ~8k rows
X_val_fast,   y_val_fast   = balanced_subset(X_val,   y_val,   per_class=500)   # ~2k rows
X_test_fast,  y_test_fast  = X_test[:2000], y_test[:2000]                        # quick test slice

from torch.utils.data import TensorDataset, DataLoader
BATCH_SIZE_FAST = 256
use_pin = (DEVICE == "cuda")
train_dl_fast = DataLoader(TensorDataset(X_train_fast, y_train_fast), batch_size=BATCH_SIZE_FAST, shuffle=True,  pin_memory=use_pin, num_workers=2)
val_dl_fast   = DataLoader(TensorDataset(X_val_fast,   y_val_fast),   batch_size=BATCH_SIZE_FAST, shuffle=False, pin_memory=use_pin, num_workers=2)
test_dl_fast  = DataLoader(TensorDataset(X_test_fast,  y_test_fast),  batch_size=BATCH_SIZE_FAST, shuffle=False, pin_memory=use_pin, num_workers=2)

# 2) Smaller models (fewer channels/embedding) + fewer epochs
textcnn_fast = LightweightTextCNN(vocab_size=len(vocab), embed_dim=64, kernel_sizes=(3,4,5), channels=32, dropout=0.5).to(DEVICE)
mlp_fast     = MLPText(vocab_size=len(vocab), embed_dim=64, hidden=128, dropout=0.5).to(DEVICE)

print("FAST TextCNN params:", sum(p.numel() for p in textcnn_fast.parameters() if p.requires_grad))
print("FAST MLP params:",     sum(p.numel() for p in mlp_fast.parameters()     if p.requires_grad))

# 3) Train + evaluate (2 epochs)
textcnn_fast, hist_c_fast = train_with_dual_optimizer(textcnn_fast, train_dl_fast, val_dl_fast, epochs=2, lr_adam=1e-3, lr_sgd=5e-3)
from sklearn.metrics import classification_report
crit = nn.CrossEntropyLoss()
tc_test = run_epoch(textcnn_fast, test_dl_fast, crit, optimizer=None)
print("\nFAST TextCNN — Test metrics:", tc_test)

mlp_fast, hist_m_fast = train_with_dual_optimizer(mlp_fast, train_dl_fast, val_dl_fast, epochs=2, lr_adam=1e-3, lr_sgd=5e-3)
mlp_test = run_epoch(mlp_fast, test_dl_fast, crit, optimizer=None)
print("\nFAST MLP — Test metrics:", mlp_test)


In [None]:
print("kernel alive")

In [None]:

# Train TextCNN
textcnn, history_c = train_with_dual_optimizer(textcnn, train_dl, val_dl, epochs=8)

# Evaluate TextCNN
crit = nn.CrossEntropyLoss()
test_metrics_c = run_epoch(textcnn, test_dl, crit, optimizer=None)
print("\nTextCNN — Test metrics:", test_metrics_c)

# Detailed report
y_true, y_pred = [], []
textcnn.eval()
with torch.no_grad():
    for xb, yb in test_dl:
        xb = xb.to(DEVICE)
        logits = textcnn(xb)
        y_true.extend(yb.numpy().tolist())
        y_pred.extend(logits.cpu().argmax(1).numpy().tolist())
print("\nTextCNN — Classification Report:\n")
print(classification_report(y_true, y_pred, digits=4))

# Train MLP
mlp, history_m = train_with_dual_optimizer(mlp, train_dl, val_dl, epochs=8)

# Evaluate MLP
mlp_test = run_epoch(mlp, test_dl, nn.CrossEntropyLoss(), optimizer=None)
print("\nMLP — Test metrics:", mlp_test)

y_true_m, y_pred_m = [], []
mlp.eval()
with torch.no_grad():
    for xb, yb in test_dl:
        xb = xb.to(DEVICE)
        logits = mlp(xb)
        y_true_m.extend(yb.numpy().tolist())
        y_pred_m.extend(logits.cpu().argmax(1).numpy().tolist())
print("\nMLP — Classification Report:\n")
print(classification_report(y_true_m, y_pred_m, digits=4))

# Results summary
results = {
    "TextCNN_test": test_metrics_c,
    "MLP_test": mlp_test,
    "TextCNN_params": int(sum(p.numel() for p in textcnn.parameters() if p.requires_grad)),
    "MLP_params": int(sum(p.numel() for p in mlp.parameters() if p.requires_grad))
}
print("\nResults summary:\n", json.dumps(results, indent=2))
print(f"\nTextCNN acc={results['TextCNN_test']['acc']:.3f} f1-macro={results['TextCNN_test']['f1m']:.3f} "
      f"({results['TextCNN_params']} params) | "
      f"MLP acc={results['MLP_test']['acc']:.3f} f1-macro={results['MLP_test']['f1m']:.3f} "
      f"({results['MLP_params']} params))")


In [None]:

os.makedirs("artifacts", exist_ok=True)
torch.save(textcnn.state_dict(), "artifacts/lightweight_textcnn.pt")
torch.save(mlp.state_dict(), "artifacts/mlp_text.pt")
with open("artifacts/results.json", "w") as f:
    json.dump(results, f, indent=2)
print("Saved model weights and metrics to ./artifacts")

In [None]:
try:
    import matplotlib.pyplot as plt
    plt.figure()
    plt.plot([h["train"]["loss"] for h in history_c], label="TextCNN train loss")
    plt.plot([h["val"]["loss"] for h in history_c], label="TextCNN val loss")
    plt.legend(); plt.show()
    plt.figure()
    plt.plot([h["train"]["acc"] for h in history_c], label="TextCNN train acc")
    plt.plot([h["val"]["acc"] for h in history_c], label="TextCNN val acc")
    plt.legend(); plt.show()
except Exception as e:
    print("Skipping plots:", e)

### Results :
*******************************************************************************************************************************
The evaluation of Random Forest, Gradient Boosting, and XGBoost showed that while Random Forest was stable and handled noisy data effectively, Gradient Boosting captured complex relationships but risked overfitting, and XGBoost delivered the best accuracy and robustness to missing values. Key drivers of customer behavior included total purchase amount, transaction frequency, demographics, promotions, and seasonal factors, while rare product categories contributed minimally. Visualizations highlighted sales peaks during holidays and promotions, strong revenue from loyal customers, and high demand concentrated in specific products. Overall, XGBoost emerged as the most reliable predictor, providing actionable insights for targeted marketing, inventory optimization, and loyalty program strategies.

#### Observations :
*******************************************************************************************************************************
Analysis showed that loyal customers consistently generated higher revenue, while purchase frequency varied across categories, suggesting opportunities for targeted marketing. Sales trends aligned with promotions and holidays, and certain products displayed clear seasonal patterns. Among models tested, XGBoost delivered the best accuracy, outperforming Random Forest and Gradient Boosting. Key predictors included purchase amount, transaction frequency, demographics, promotions, and seasonal factors. However, missing data and imbalanced purchases reduced accuracy for rare items, emphasizing the need for thorough preprocessing. Overall, the insights support personalized promotions, improved inventory management, and better planning of marketing campaigns.

### Conclusion and Future Direction :
*******************************************************************************************************************************
#### Learnings :
The project involved building a complete data pipeline, starting with ETL in SSIS to clean, transform, and automate data loading. Real-world challenges like missing values and inconsistent entries required extensive preprocessing. Feature engineering, such as creating purchase frequency and seasonal indicators, enhanced model accuracy. Random Forest, Gradient Boosting, and XGBoost were tested, with performance improving through careful tuning and data balancing. Visualizations highlighted sales trends and customer patterns, making insights actionable. Overall, the workflow—from ingestion to reporting—provided practical experience in managing end-to-end analytics.

*******************************************************************************************************************************
#### Results Discussion :
The comparison of Random Forest, Gradient Boosting, and XGBoost showed that Random Forest was stable with little tuning, Gradient Boosting captured complex patterns but risked overfitting, and XGBoost delivered the best accuracy with proper tuning and regularization. Key predictors included purchase amount, transaction frequency, demographics, promotions, and seasonal trends. Visualizations highlighted sales peaks during promotions and holidays, as well as high-value customer segments. Despite challenges with missing data, imbalanced classes, and irregular timestamps, the models effectively captured purchasing patterns, with XGBoost offering the strongest results for inventory, promotions, and segmentation strategies.

*******************************************************************************************************************************
#### Limitations :
The dataset presented several challenges, including missing values, inconsistent product naming, and irregular timestamps, all of which required substantial preprocessing. Even after cleaning, minor inaccuracies may still influence predictions. Since the models relied only on transactional and demographic data, external factors such as competitor actions, seasonal events, or broader economic conditions were not considered, limiting accuracy. Data imbalance was another issue, with certain products and customer groups underrepresented, causing weaker performance on less common categories. Additionally, advanced models like Gradient Boosting and XGBoost carry a risk of overfitting without careful hyperparameter tuning, particularly on smaller datasets. Scalability remains a concern, as testing was performed on a restricted dataset, meaning larger-scale or real-time applications would need further optimization. Lastly, the models are geared toward short-term purchase forecasts and may not fully capture evolving customer behavior or long-term market shifts.
### Future Extension :
Incorporating external factors such as holidays, competitor pricing, and economic trends can make predictions more accurate, while real-time deployment with automated cloud pipelines ensures continuous updates and scalability. Advanced models combined with interactive dashboards further enable personalized promotions and actionable insights, helping businesses make faster and more effective decisions.

# References:

[1]:  

- Ritu Yadav (2020). *Light-Weighted CNN for Text Classification.* arXiv:2004.07922.
- AG_NEWS dataset (news topic classification), available via HuggingFace `datasets`.