Skip to content
View andraiming's full-sized avatar
  • Harbin Institute of Technology
  • Harbin, China

Block or report andraiming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
andraiming/README.md

Jiaming An

"The interesting thing about a model is not what it gets right — it's what it confidently gets wrong."

MSc student at Harbin Institute of Technology. I spend most of my time trying to figure out why multimodal models hallucinate, why speech tokenizers throw away the wrong bits, and why my vision-language data pipeline is always somehow the bottleneck.

Mostly PyTorch. Occasionally CUDA when I have to. Lots of YAML.


What I'm poking at right now

  • 📐 Probing compositional reasoning in MLLMs — turns out "to the left of" is harder than it looks
  • 🎙️ Comparing discrete speech tokenizers on downstream tasks (and discovering bitrate isn't everything)
  • 🧹 Web-scale image-text data curation — 80% of the work, 20% of the credit
  • 📝 Slowly writing a thesis. Slowly.

Open-source things I maintain

mm-reason-bench
A lightweight benchmark suite for multimodal reasoning. VQA, charts, spatial, compositional.

speech-tokenizer-arena
Drop in a tokenizer, get a leaderboard. EnCodec, HuBERT-units, DAC, SpeechTokenizer side-by-side.

vl-data-engine — Scalable preprocessing & filtering pipeline for VL pretraining data. CLIP filtering, perceptual dedup, language filters, webdataset shards.

🧰 Things I reach for

Python PyTorch HuggingFace CUDA Ray Linux LaTeX tmux

Also: torchaudio, open_clip, webdataset, vLLM, DeepSpeed, slurm, way too many wandb tabs.

Stats


📍 Harbin · ☕ probably awake · 📬 reach me via issues on any of the repos above

Popular repositories Loading

  1. speech-tokenizer-arena speech-tokenizer-arena Public

    A side-by-side benchmarking playground for discrete speech tokenizers (EnCodec, HuBERT-units, SpeechTokenizer, etc.).

    Python 54

  2. andraiming andraiming Public

  3. mm-reason-bench mm-reason-bench Public

    A lightweight benchmark suite for evaluating multimodal LLMs on visual reasoning tasks.

    Python

  4. vl-data-engine vl-data-engine Public

    Scalable preprocessing & filtering pipeline for vision-language pretraining datasets (CC, LAION-style, web-scraped image-text).

    Python