Skip to content
View by2101's full-sized avatar

Block or report by2101

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 1,356 67 Updated Mar 28, 2025

Agentless🐱: an agentless approach to automatically solve software development problems

Python 1,600 168 Updated Dec 22, 2024

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 301 11 Updated Feb 23, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,396 1,442 Updated Mar 10, 2025

TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

72 Updated Mar 26, 2025

Train your AI self, amplify you, bridge the world

Python 6,220 408 Updated Mar 28, 2025

TTS Towards Human-Sounding Speech

Python 3,111 222 Updated Mar 27, 2025

NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.

Jupyter Notebook 2,902 345 Updated Mar 24, 2025

🤗 R1-AQA Model: mispeech/r1-aqa

Python 209 17 Updated Mar 28, 2025

Latest Advances on System-2 Reasoning

Python 859 33 Updated Mar 27, 2025

TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

Python 367 47 Updated Mar 26, 2025

MMR1: Advancing the Frontiers of Multimodal Reasoning

148 4 Updated Mar 17, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 839 61 Updated Mar 28, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 40,466 6,770 Updated Mar 27, 2025

The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.

Python 202 19 Updated Mar 17, 2025

使用vllm加速cosyvoice2的推理

Jupyter Notebook 122 16 Updated Mar 25, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,102 535 Updated Mar 28, 2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 320 14 Updated Feb 28, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 527 29 Updated Nov 19, 2024

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

Python 1,871 68 Updated Mar 26, 2025

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 937 73 Updated Feb 9, 2025

Applying the ideas of Deepseek R1 to computer use

Python 210 8 Updated Feb 6, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,233 147 Updated Mar 20, 2025

快速提取音视频内容,整理成一份结构化的markdown笔记

Python 1,569 226 Updated Jul 26, 2024

Implementation of a Transformer, but completely in Triton

Python 261 16 Updated Apr 5, 2022

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 828 61 Updated Mar 27, 2025

Witness the aha moment of VLM with less than $3.

Python 3,421 266 Updated Mar 1, 2025

Magic to turn Cursor/Windsurf as 90% of Devin

Python 5,211 689 Updated Mar 22, 2025

Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.

C 70 2 Updated Feb 2, 2025
Next
Showing results