Jiaming An andraiming

Jiaming An

"The interesting thing about a model is not what it gets right — it's what it confidently gets wrong."

MSc student at Harbin Institute of Technology. I spend most of my time trying to figure out why multimodal models hallucinate, why speech tokenizers throw away the wrong bits, and why my vision-language data pipeline is always somehow the bottleneck.

Mostly PyTorch. Occasionally CUDA when I have to. Lots of YAML.

What I'm poking at right now

📐 Probing compositional reasoning in MLLMs — turns out "to the left of" is harder than it looks
🎙️ Comparing discrete speech tokenizers on downstream tasks (and discovering bitrate isn't everything)
🧹 Web-scale image-text data curation — 80% of the work, 20% of the credit
📝 Slowly writing a thesis. Slowly.

Open-source things I maintain

mm-reason-bench A lightweight benchmark suite for multimodal reasoning. VQA, charts, spatial, compositional.	speech-tokenizer-arena Drop in a tokenizer, get a leaderboard. EnCodec, HuBERT-units, DAC, SpeechTokenizer side-by-side.
vl-data-engine — Scalable preprocessing & filtering pipeline for VL pretraining data. CLIP filtering, perceptual dedup, language filters, webdataset shards.

🧰 Things I reach for

Also: torchaudio, open_clip, webdataset, vLLM, DeepSpeed, slurm, way too many wandb tabs.

Stats

_{📍 Harbin · ☕ probably awake · 📬 reach me via issues on any of the repos above}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jiaming An andraiming

Achievements

Achievements

Block or report andraiming

Jiaming An

What I'm poking at right now

Open-source things I maintain

Stats

Popular repositories Loading

Uh oh!