Skip to content

andraiming/andraiming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Jiaming An

"The interesting thing about a model is not what it gets right β€” it's what it confidently gets wrong."

MSc student at Harbin Institute of Technology. I spend most of my time trying to figure out why multimodal models hallucinate, why speech tokenizers throw away the wrong bits, and why my vision-language data pipeline is always somehow the bottleneck.

Mostly PyTorch. Occasionally CUDA when I have to. Lots of YAML.


What I'm poking at right now

  • πŸ“ Probing compositional reasoning in MLLMs β€” turns out "to the left of" is harder than it looks
  • πŸŽ™οΈ Comparing discrete speech tokenizers on downstream tasks (and discovering bitrate isn't everything)
  • 🧹 Web-scale image-text data curation β€” 80% of the work, 20% of the credit
  • πŸ“ Slowly writing a thesis. Slowly.

Open-source things I maintain

mm-reason-bench
A lightweight benchmark suite for multimodal reasoning. VQA, charts, spatial, compositional.

speech-tokenizer-arena
Drop in a tokenizer, get a leaderboard. EnCodec, HuBERT-units, DAC, SpeechTokenizer side-by-side.

vl-data-engine β€” Scalable preprocessing & filtering pipeline for VL pretraining data. CLIP filtering, perceptual dedup, language filters, webdataset shards.

🧰 Things I reach for

Python PyTorch HuggingFace CUDA Ray Linux LaTeX tmux

Also: torchaudio, open_clip, webdataset, vLLM, DeepSpeed, slurm, way too many wandb tabs.

Stats


πŸ“ Harbin Β· β˜• probably awake Β· πŸ“¬ reach me via issues on any of the repos above

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors