-
Microsoft Research
- https://hypjudy.github.io/website/
Highlights
- Pro
Stars
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Open-Sora: Democratizing Efficient Video Production for All
A fork to add multimodal model training to open-r1
Fully open reproduction of DeepSeek-R1
Investigating CoT Reasoning in Autoregressive Image Generation
A curated list of recent diffusion models for video generation, editing, and various other applications.
A high-throughput and memory-efficient inference and serving engine for LLMs
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
A generative world for general-purpose robotics & embodied AI learning.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Playing Pokemon Red with Reinforcement Learning
A suite of image and video neural tokenizers
Get your documents ready for gen AI
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Deezer source separation library including pretrained models.
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Tools to download and cleanup Common Crawl data
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838