Starred repositories
Web EPUB and PDF text to speech document reader. Read documents in realtime with high-quality TTS; or extract audiobooks. Use your own Kokoro TTS API or Open AI API endpoint.
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
Run Orpheus 3B Locally With LM Studio
A high-throughput and memory-efficient inference and serving engine for LLMs
FlashInfer: Kernel Library for LLM Serving
The official Soundwave repository
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Transcribe audio and video files with speaker diarization and logically grouped timestamps
OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
An invisible desktop application to help you pass your technical interviews.
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
The python library for real-time communication
Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Using whisper and spleeter to sync lyrics with song timeline
A fast inference library for running LLMs locally on modern consumer-class GPUs
🧠+🎧 Build your music algorithms and AI models with the next-gen DAW 🔥