Stars
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
The official Python SDK for Model Context Protocol servers and clients
Make websites accessible for AI agents
Fully open reproduction of DeepSeek-R1
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Efficient Triton Kernels for LLM Training
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
Speech To Speech: an effort for an open-sourced and modular GPT4-o
real time face swap and one-click video deepfake with only a single image
A Comprehensive Toolkit for High-Quality PDF Content Extraction
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high …
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis
A high-throughput and memory-efficient inference and serving engine for LLMs
WisdoMentor - Series: A LLM for undergraduates | 博导智言(辅助大学生 学习)
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
repository for dreamoving-phantom https://www.modelscope.cn/studios/vigen/DreaMoving_Phantom/summary. DreaMoving-Phantom is a general and automatic image enhancement and super resolution framework.
Open-Sora: Democratizing Efficient Video Production for All
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Easily train a good VC model with voice data <= 10 mins!