Postdoctoral Computer Vision π Research Scientist at Lawrence Livermore National Laboratory. Personal github account.
Interested in transformers for video processing. Two-way attention. Multimodal LLMs. Parameter-efficient fine-tuning.
Multimodal π Large Language Models:
- Uploading code
soonfor adapting any image transformer and transformer language model into a multimodal-llm (MLLM) - Train a custom adapter to link the latent representations of the two token sequences
- Potentially fine-tune with parameter-efficient fine-tuning (peft, LoRA)
- Custom training pipeline for bootstrapping text-image pairs into <text, image, text>, <text, image>, <image, text> as an augmentation
- Hosting as a Gradio or Huggingface space to demo
Chemical π¨ Sensing:
- Novel architectures for multitask learning + early classification of time series
- Optimized preprocessing
- Learning from Samples Worth Learning
Molecular π¬ Representations:
- VicReg over molecular images for augmentation-invariant embeddings
- Graph Transformers over 3D molecular structure for unsupervised property embeddings
- Fine-tuning LLMs for multimodality in images, video, or domain-specific data types
- Graph contrastive representation learning
- Sequential representations of time series
Publications and some public projects on my page