ARHQ: Activation Residual Hessian Quantization

This repository provides the reference implementation for ARHQ, a low-rank-assisted quantization method designed to reduce the propagation of activation quantization error through linear layers.

ARHQ decomposes each linear weight into a quantized residual branch and a full-precision low-rank branch:

W = W_res + L,      L = B A^T
Y_hat = Q_x(X) Q_w(W_res)^T + X L^T

Unlike a standard low-rank reconstruction objective, ARHQ chooses L according to the activation quantization residual:

E_x = X - Q_x(X)
min_rank(L)<=r || E_x (W - L)^T ||_F^2

With G_x = E_x^T E_x / N, this becomes a weighted low-rank decomposition under the activation residual Hessian metric. The resulting LoRA branch is kept in floating point, while the residual branch is evaluated with simulated nvfp4 quantization.

Repository Scope

The codebase is organized as a minimal research implementation:

arhq/
  calibration.py      # Build calibration tensors from calibration data
  decompose.py        # Extract ARHQ LoRA factors and residual weights
  eval_quantized.py   # Simulated nvfp4 + LoRA inference
  lowrank.py          # ARHQ objective, decomposition, and SNR utilities
  quant.py            # nvfp4 quantization simulation
  transforms.py       # Optional smoothing transforms
  data.py             # Lightweight evaluation data loader
  eval_utils.py       # Generation result helpers

scripts/
  01_build_calibration.sh
  02_extract_lora.sh
  03_eval_nvfp4_lora.sh
  04_eval_single.sh

The implementation also includes a reproduced SVDQuant-style baseline for comparison experiments. ARHQ remains the main method in this repository.

Workflow

1. Prepare Calibration Data

Place the calibration data source under:

data/calib_data/

Then collect layer-wise activation and weight tensors:

bash scripts/01_build_calibration.sh cuda:0 0-35 all 0-127 30000

The extracted calibration tensors are saved to:

data/calib_tensor/

2. Extract ARHQ LoRA Factors

Run ARHQ decomposition with rank 128:

bash scripts/02_extract_lora.sh cuda:0 0-35 128 all

The decomposition artifacts are saved under:

results/layer_results/

Each layer projection stores the LoRA factors, the residual weight, and optional smoothing metadata. The same entry point can also run the reproduced SVDQuant baseline for side-by-side SNR comparison.

3. Simulate nvfp4 + LoRA Inference

Run evaluation with the extracted ARHQ factors. In our current experiments, we evaluate ARHQ on ZebraLogic:

bash scripts/03_eval_nvfp4_lora.sh arhq smoothing 128 ZebraLogic cuda:0 all

For a single prompt:

QUESTION="What is 2+2? Put the final answer in \\boxed{}." \
bash scripts/04_eval_single.sh arhq smoothing 128 cuda:0 all

Python Entry Points

The shell scripts are thin wrappers around the Python modules:

python -m arhq.calibration \
  --model_path /path/to/model \
  --result_dir data/calib_data \
  --output_dir data/calib_tensor \
  --layers 0-35 \
  --module_set all

python -m arhq.decompose \
  --calib_dir data/calib_tensor \
  --output_dir results/layer_results \
  --layers 0-35 \
  --module_set all \
  --rank 128 \
  --configs arhq:raw,arhq:smoothing,svdquant:smoothing

python -m arhq.eval_quantized \
  --model_path /path/to/model \
  --decomp_dir results/layer_results \
  --method arhq \
  --setting smoothing \
  --rank 128 \
  --module_set all \
  --datasets ZebraLogic

Notes

method=arhq is the proposed method. method=r_only is kept only as a backward-compatible alias for older experiment artifacts.

method=svdquant is a reproduced comparison baseline.

The current inference path is a simulation of nvfp4 quantization rather than a fused hardware kernel implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
arhq		arhq
docs		docs
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARHQ: Activation Residual Hessian Quantization

Repository Scope

Workflow

1. Prepare Calibration Data

2. Extract ARHQ LoRA Factors

3. Simulate nvfp4 + LoRA Inference

Python Entry Points

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARHQ: Activation Residual Hessian Quantization

Repository Scope

Workflow

1. Prepare Calibration Data

2. Extract ARHQ LoRA Factors

3. Simulate nvfp4 + LoRA Inference

Python Entry Points

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages