Skip to content
View Purshow's full-sized avatar

Highlights

  • Pro

Block or report Purshow

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Langbridge: Interpreting Image as a Combination of Language Embeddings

3 Updated Mar 25, 2025
16 Updated Mar 27, 2025

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Python 87 7 Updated Mar 27, 2025

A suite of image and video neural tokenizers

Jupyter Notebook 1,589 74 Updated Feb 11, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,841 503 Updated Mar 28, 2025

Fast Diffusion Models with Transformers

Python 811 108 Updated Oct 25, 2024

TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

73 Updated Mar 26, 2025

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 280 14 Updated Dec 22, 2024

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,410 1,443 Updated Mar 10, 2025

The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2…

Python 40 2 Updated Jan 12, 2025

WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Jupyter Notebook 69 2 Updated Mar 19, 2025

A Token-level Text Image Foundation Model for Document Understanding

Python 78 6 Updated Mar 25, 2025

This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…

Python 404 9 Updated Mar 24, 2025

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 179 6 Updated Mar 21, 2025

Official code for the paper "Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models" (ICLR 2025 Oral)

3 Updated Feb 7, 2025

[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'

Python 97 4 Updated Mar 25, 2025

UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inputs, making it easy to integrate both visual understanding and image gener…

Python 35 2 Updated Feb 21, 2025

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Python 79 6 Updated Oct 10, 2024

MMR1: Advancing the Frontiers of Multimodal Reasoning

148 4 Updated Mar 17, 2025
Python 55 3 Updated Mar 22, 2025
Jupyter Notebook 16 1 Updated Mar 29, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,824 1,496 Updated Mar 28, 2025

🔥CVPR 2025 Multimodal Large Language Models Paper List

123 3 Updated Mar 12, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.

Python 94 5 Updated Mar 10, 2025

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

171 4 Updated Mar 26, 2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

Python 54 Updated Mar 19, 2025

[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"

Python 139 9 Updated Mar 4, 2025

MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Python 458 16 Updated Mar 29, 2025
Next
Showing results