Official implementation of Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective.
Paper (OpenReview): https://openreview.net/forum?id=PCJGU7DEEX
Venue: Workshop on Scientific Methods for Understanding Deep Learning (ICLR 2026)
Code repository: https://github.com/BellChingH/NZK
This code implements Zeroth-Order Gradient Descent (ZOGD) and First-Order Gradient Descent (FOGD) in function space using the Neural Tangent Kernel (NTK) and Neural Zeroth-order Kernel (NZK). Each experiment script writes a single MATLAB-compatible file data.mat under an output subfolder; you can load it in Python (scipy.io.loadmat), MATLAB, or Julia to analyze or plot results.
NZK/
├── zo_ntk/ # Core library
│ ├── kernels.py # NTK and NZK construction
│ ├── training.py # FOGD and ZOGD loops in function space
│ ├── models.py # Linear target, FFN
│ ├── data.py # Synthetic data, MNIST/CIFAR/ImageNet loaders, FFN linearization
│ └── utils.py # Saving results to data.mat
├── experiments/ # Reproducible experiment scripts (recommended)
│ ├── run_linear_ntk.py # Linear synthetic (degree 2 or 50)
│ ├── run_linear_ntk_distribution.py # Linear, NZK with different z distributions
│ ├── run_linear_ntk_variance.py # Linear, NZK with different Gaussian variances
│ ├── run_ffn_ntk.py # FFN on synthetic sphere data
│ ├── run_ffn_mnist.py # FFN on MNIST (binary classes)
│ ├── run_ffn_cifar.py # FFN on CIFAR-10 (binary classes)
│ └── run_ffn_imagenet.py # FFN on Tiny ImageNet (binary classes)
├── notebooks/ # Original Jupyter experiments
├── requirements.txt
├── requirements-notebooks.txt # Optional: Jupyter + matplotlib for notebooks/
└── README.md
Clone the official repository and enter the project root (the folder that contains zo_ntk/ and experiments/):
git clone https://github.com/BellChingH/NZK.git
cd NZK
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtImage experiments need datasets, opencv-python-headless, and Pillow; they are listed in requirements.txt. First-time runs will download the corresponding Hugging Face datasets.
The original .ipynb experiments live under notebooks/, grouped by topic (same names as the legacy folders: linear_zo_fo_ntk, FFN_zo_fo_mnist, etc.). Install extra dependencies and see usage notes:
pip install -r requirements-notebooks.txt-
Working directory: Always run commands from inside the repository root (
NZK/) sopython -m experiments.<script>resolves the package correctly. -
List options for any script:
python -m experiments.run_linear_ntk --help python -m experiments.run_ffn_mnist --help
-
Minimal examples:
cd NZK # Linear synthetic: NTK + NZK (one Gaussian NZK), degree 2 python -m experiments.run_linear_ntk --degree 2 --out_dir output_linear_ntk # FFN on MNIST (e.g. digits 3 vs 5, 8×8 inputs) python -m experiments.run_ffn_mnist --digit_neg 3 --digit_pos 5 --out_dir output_ffn_mnist
-
Output location: Each script creates a subfolder under
--out_dir(see table below) and writesdata.matthere. The terminal printsSaved to <path>when finished.
| Goal | Command (from repo root NZK/) |
|---|---|
| Linear synthetic, one NZK (normal) | python -m experiments.run_linear_ntk --degree 2 --out_dir output_linear_ntk |
| Linear synthetic, degree 50 | python -m experiments.run_linear_ntk --degree 50 --out_dir output_linear_ntk |
| Linear, multiple z distributions | python -m experiments.run_linear_ntk_distribution --out_dir output_linear_ntk_distribution |
| Linear, multiple Gaussian variances | python -m experiments.run_linear_ntk_variance --variances 0.5 1.0 1.5 --out_dir output_linear_ntk_variance |
| FFN synthetic | python -m experiments.run_ffn_ntk --degree 2 --out_dir output_ffn_ntk |
| FFN MNIST binary | python -m experiments.run_ffn_mnist --digit_neg 3 --digit_pos 5 --out_dir output_ffn_mnist |
| FFN CIFAR-10 binary | python -m experiments.run_ffn_cifar --digit_neg 2 --digit_pos 9 --out_dir output_ffn_cifar |
| FFN Tiny ImageNet binary | python -m experiments.run_ffn_imagenet --class_neg 0 --class_pos 1 --out_dir output_ffn_imagenet |
Default subfolder names under --out_dir: degree_<d> (linear / FFN synthetic), distribution_mixture_degree_<d> (distribution), variance_mixture_degree_<d> (variance), mnist_<res>_<neg>_<pos>, cifar_<res>_<neg>_<pos>, imagenet_<res>_<neg>_<pos>.
All arrays are saved as NumPy types inside the .mat file (load with scipy.io.loadmat). Variable names use the historical spelling kernal (not kernel) for NTK/NZK matrices.
| Variable | Meaning |
|---|---|
loss_list_fo |
Per-iteration MSE loss ((1/2n)|f-y|^2) for first-order (NTK) dynamics; length = iteration_total. |
loss_list_zo |
Same for zeroth-order (NZK) dynamics (when a single ZO run is saved). |
f_fo_list |
Function values (f) along the FO trajectory, concatenated at iterations listed in function_record_iteration_list (each snapshot has shape (sample_num, 1); overall first dimension is len(function_record_iteration_list) * sample_num). |
f_zo_list |
Same for the ZO trajectory (single-run scripts). |
f_diff_list_fo |
Per-iteration update (f_{t+1}-f_t) for FO; concatenated along the first axis (length iteration_total * sample_num in the flattened layout stored). |
f_diff_list_zo |
Same for ZO (single-run scripts). |
function_record_iteration_list |
0-based iteration indices at which f_*_list snapshots were stored. |
NTK_kernal_fo |
First-order kernel matrix (K_{\mathrm{NTK}} = XX^\top) in feature space (samples × samples). |
NZK_kernal_zo |
Monte Carlo estimate of the NZK for the chosen (z) law (samples × samples). |
Also includes: theta_gt, data_sample (unit-norm rows), target_gt, noise_value.
- One FO run: same
theta_gt,data_sample,target_gt,noise_value,NTK_kernal_fo,loss_list_fo,f_fo_list, etc. - Several ZO runs are stored with suffixes on variable names, for example:
loss_list_zo_normal_1.0,f_zo_list_t_10.0,NZK_kernal_zo_laplace_0.5,f_diff_list_zo_t_1000.0(exact keys depend ondistandscale, or on variance strings like0.5).
run_linear_ntk_variancealso storesvariances(the list used).
Includes data_sample, target_gt, and the common FO/ZO fields above (no theta_gt in the same form as linear; targets come from the synthetic linear teacher before the FFN features).
Same FO/ZO fields as above, plus:
| Variable | Meaning |
|---|---|
standard_loss_list_zo |
Per-iteration loss for standard zeroth-order optimization in parameter space (finite-difference along random directions), comparable to “ZO-parametric” curves in the paper/notebooks. |
Plots: This repository does not generate PDF figures automatically; use the arrays in data.mat to reproduce loss curves, kernel heatmaps, and function snapshots as in the original notebooks.
| Flag | Role | Default (typical) |
|---|---|---|
--eta_fo, --eta_zo |
Step sizes for FOGD and ZOGD | 1e-3 |
--iteration_total |
Number of iterations | 16000 |
--nzk_samples |
Monte Carlo samples to estimate NZK | 10000 |
--seed |
Random seed | script-dependent |
--device |
cpu or cuda |
auto (cuda if available) |
Linear scripts also expose --degree, --sample_num; image scripts use --resolution, class indices, and --sample_per_class. See --help on each script.
| Original folder | Refactored script / API |
|---|---|
| linear_zo_fo_ntk | run_linear_ntk.py, build_ntk, build_nzk_linear, train_fo_loop_with_f0, train_zo_loop_with_f0 |
| linear_zo_fo_ntk_distribution | run_linear_ntk_distribution.py |
| linear_zo_fo_ntk_variance | run_linear_ntk_variance.py |
| FFN_zo_fo_ntk | run_ffn_ntk.py, FFN, linearize_ffn, build_nzk |
| FFN_zo_fo_mnist | run_ffn_mnist.py, train_zo_standard_loop |
| FFN_zo_fo_cifar | run_ffn_cifar.py |
| FFN_zo_fo_imagenet | run_ffn_imagenet.py |
Kernels: NTK (= XX^\top). NZK uses (\mathbb{E}_z[\cdot]) with the same functional form as in the paper; linear scripts support normal / Student-(t) / Laplace and scales as in build_nzk_linear.
If you use this code, please cite:
@inproceedings{zhang2026model,
title = {Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective},
author = {Chen Zhang and Yuxin Cheng and Chenchen Ding and Shuqi Wang and Jingreng Lei and Runsheng Yu and Yik-Chung WU and Ngai Wong},
booktitle = {Workshop on Scientific Methods for Understanding Deep Learning},
year = {2026},
url = {https://openreview.net/forum?id=PCJGU7DEEX},
}Paper link: https://openreview.net/forum?id=PCJGU7DEEX
[Specify your license, e.g. MIT, Apache-2.0.]