Stars
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Agentless🐱: an agentless approach to automatically solve software development problems
✨First Open-Source R1-like Video-LLM [2025/02/18]
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Train your AI self, amplify you, bridge the world
NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Latest Advances on System-2 Reasoning
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
No fortress, purely open ground. OpenManus is Coming.
The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Ola: Pushing the Frontiers of Omni-Modal Language Model
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
Applying the ideas of Deepseek R1 to computer use
Implementation of a Transformer, but completely in Triton
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Witness the aha moment of VLM with less than $3.
Magic to turn Cursor/Windsurf as 90% of Devin
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.