This repository is a PyTorch reading-and-reimplementation project based on the paper:
- Paper:
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction - arXiv:
https://arxiv.org/abs/2411.09852
The repository follows the same engineering layout as OneTrans_Pytorch, but replaces the backbone with an InterFormer-style architecture that combines:
- non-sequence interaction learning
- sequence modeling with personalized FFN
- cross-modal summarization between the two branches
There are two layers in this codebase:
- The backbone port This is the PyTorch implementation of the InterFormer building blocks.
- The training wrapper This adds dataset loading, feature tensorization, metrics, mixed precision, checkpointing, and CLI training support.
This repo is therefore a runnable local training scaffold around an InterFormer-style backbone, not just a shape demo.
The model design follows the paper-level architecture:
- non-sequence feature preprocessing into per-field tokens
- sequence preprocessing through a MaskNet-like block
- Interaction Arch for behavior-aware non-sequence learning
- Sequence Arch for context-aware sequence modeling
- Cross Arch for low-dimensional information exchange
The default Interaction Arch follows the paper's reported benchmark setting:
DHEN- with
DOTandDCNas the ensembled interaction modules
Because the runnable demo script reuses the local TAAC tensorization pipeline from OneTrans_Pytorch, the data interface is adapted to the available numeric tensors:
- non-sequence features are treated as numeric fields and tokenized field-wise
- sequence features are treated as per-step numeric channels and unified by
SequenceMaskNet
This keeps the project runnable while preserving the main architecture of the paper.
This file is the backbone reference implementation. It contains:
NumericFieldTokenizerSequenceMaskNetLinearCompressedEmbeddingSelfGatingDotProductInteractionDCNInteractionDHENInteractionPersonalizedFFNRotaryMultiHeadAttentionInterFormerBlockInterFormerEncoder
This is the closest file to the core paper architecture.
Task-level model wrappers live here.
Current file:
models/taac_interformer.py
This wraps the backbone into a trainable classifier:
- tokenizes non-sequence features into field tokens
- applies MaskNet-style sequence preprocessing
- stacks InterFormer blocks
- summarizes the final non-sequence and sequence states
- applies a classification head
Reusable non-model logic lives here.
utils/common.pyGeneral helpers such as seed setup and split generation.utils/metrics.pyAccuracy and AUC computation.utils/taac_data.pyDataset loading, schema handling, feature conversion, and tensor construction.
Runnable entrypoints live here.
Current file:
scripts/run_taac2026_sample.py
This script is the main training entrypoint. It handles:
- dataset download or local parquet reading
- feature tensorization
- model construction
- AMP setup
- training and validation loops
- checkpoint save and resume
At a high level, the model flow is:
- Map non-sequence features into field tokens.
- Map sequence channels into dense sequence tokens through
SequenceMaskNet. - Compute an initial non-sequence summary and prepend it as CLS tokens.
- Repeat
LInterFormer blocks:- summarize non-sequence tokens
- summarize sequence tokens through CLS, PMA, and recent tokens
- update non-sequence tokens through the Interaction Arch
- update sequence tokens through PFFN and rotary attention
- Recompute final non-sequence and sequence summaries.
- Flatten and concatenate both summaries for downstream prediction.
- non-sequential input:
[batch_size, non_seq_dim] - sequential input:
[batch_size, seq_len, seq_feature_dim]
python main_pytorch.pypython scripts/run_taac2026_sample.py --epochs 5 --batch-size 32python scripts/run_taac2026_sample.py --epochs 5 --batch-size 32 --interaction-backbone dhenpython scripts/run_taac2026_sample.py --epochs 10 --batch-size 32 --resume best_model_20260416_120000.pt --save-checkpointThis repository is not an official release from the paper authors. It is:
- a local PyTorch reimplementation of the published InterFormer architecture
- adapted to the engineering conventions used in
OneTrans_Pytorch - wrapped in a runnable training scaffold over the same local TAAC demo pipeline