Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
- Tool (AI LLM)
- Game (Agent)
- Code
- Writer
- Image
- Texture
- Shader
- 3D Model
- Avatar
- Animation
- Visual
- Video
- Audio
- Music
- Singing Voice
- Speech
- Analytics
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AgentGPT | 🤖 Assemble, configure, and deploy autonomous AI Agents in your browser. | Tool | ||
AICommand | ChatGPT integration with Unity Editor. | Unity | Tool | |
AIOS | LLM Agent Operating System. | Tool | ||
AI Scientist | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. | arXiv | Tool | |
Assistant CLI | A comfortable CLI tool to use ChatGPT service🔥 | Tool | ||
Auto-GPT | An experimental open-source attempt to make GPT-4 fully autonomous. | Tool | ||
BabyAGI | This Python script is an example of an AI-powered task management system. | Tool | ||
👶🤖🖥️ BabyAGI UI | BabyAGI UI is designed to make it easier to run and develop with babyagi in a web app, like a ChatGPT. | Tool | ||
baichuan-7B | A large-scale 7B pretraining language model developed by Baichuan. | Tool | ||
Baichuan-13B | A 13B large language model developed by Baichuan Intelligent Technology. | Tool | ||
Baichuan 2 | A series of large language models developed by Baichuan Intelligent Technology. | Tool | ||
Bisheng | Bisheng is an open LLM devops platform for next generation AI applications. | Tool | ||
Character-LLM | A Trainable Agent for Role-Playing. | arXiv | Tool | |
ChatDev | Communicative Agents for Software Development. | arXiv | Tool | |
ChatGPT-API-unity | Binds ChatGPT chat completion API to pure C# on Unity. | Unity | Tool | |
ChatGPTForUnity | ChatGPT for unity. | Unity | Tool | |
ChatRWKV | ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. | Tool | ||
ChatYuan | Large Language Model for Dialogue in Chinese and English. | Tool | ||
Chinese-LLaMA-Alpaca-3 | (Chinese Llama-3 LLMs) developed from Meta Llama 3. | Tool | ||
Chrome-GPT | An AutoGPT agent that controls Chrome on your desktop. | Tool | ||
CogVLM | CogVLM, a powerful open-source visual language foundation model. | arXiv | Tool | |
CoreNet | A library for training deep neural networks. | Tool | ||
DBRX | DBRX is a large language model trained by Databricks. | Tool | ||
DCLM | DataComp for Language Models. | arXiv | Tool | |
DemoGPT | Auto Gen-AI App Generator with the Power of Llama 2 | Tool | ||
Design2Code | Automating Front-End Engineering | Tool | ||
Devika | Devika is an Agentic AI Software Engineer. | Tool | ||
Devon | An open-source pair programmer. | Tool | ||
Dora | Generating powerful websites, one prompt at a time. | Tool | ||
Flowise | Drag & drop UI to build your customized LLM flow using LangchainJS. | Tool | ||
Gemini | Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code. | Tool | ||
Gemma | Gemma is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models. | Tool | ||
gemma.cpp | lightweight, standalone C++ inference engine for Google's Gemma models. | Tool | ||
GLM-4 | GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. | Tool | ||
GPT4All | A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue. | Tool | ||
GPT-4o | GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. | Tool | ||
GPTScript | Develop LLM Apps in Natural Language. | Tool | ||
Grok-1 | The weights and architecture of our 314 billion parameter Mixture-of-Experts model, Grok-1. | Tool | ||
HuggingChat | Making the community's best AI chat models available to everyone. | Tool | ||
Hugging Face API Unity Integration | This Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. | Unity | Tool | |
ImageBind | ImageBind One Embedding Space to Bind Them All. | arXiv | Tool | |
Index-1.9B | A SOTA lightweight multilingual LLM. | Tool | ||
InteractML-Unity | InteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D. | Unity | Tool | |
InteractML-Unreal Engine | Bringing Machine Learning to Unreal Engine. | Unreal Engine | Tool | |
InternLM | InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. | arXiv | Tool | |
InternLM-XComposer | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Tool | |
Jan | Bring AI to your Desktop. | Tool | ||
Lamini | Lamini allows any engineering team to outperform general purpose LLMs through RLHF and fine- tuning on their own data. | Tool | ||
LaMini-LM | LaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions. | Tool | ||
LangChain | LangChain is a framework for developing applications powered by language models. | Tool | ||
LangFlow | ⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | Tool | ||
LaVague | Automate automation with Large Action Model framework. | Tool | ||
Lemur | Open Foundation Models for Language Agents. | Tool | ||
Lepton AI | A Pythonic framework to simplify AI service building. | Tool | ||
Lit-LLaMA | Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. | Tool | ||
llama2-webui | Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). | Tool | ||
Llama 3 | The official Meta Llama 3 GitHub site. | Tool | ||
Llama 3.1 | Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. | Tool | ||
LLaSM | Large Language and Speech Model. | Tool | ||
LLM Answer Engine | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | Tool | ||
llm.c | LLM training in simple, raw C/CUDA. | Tool | ||
LLMUnity | Create characters in Unity with LLMs! | Unity | Tool | |
LLocalSearch | LLocalSearch is a completely locally running search engine using LLM Agents. | Tool | ||
LogicGamesSolver | A Python tool to solve logic games with AI, Deep Learning and Computer Vision. | Tool | ||
LongWriter | LongWriter: Unleashing 10,000+ Word Generation From Long Context LLMs. | arXiv | Tool | |
Large World Model (LWM) | Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model. | arXiv | Tool | |
Lumina-T2X | Lumina-T2X is a unified framework for Text to Any Modality Generation. | arXiv | Tool | |
MetaGPT | The Multi-Agent Framework | Tool | ||
MiniCPM-2B | An end-side LLM outperforms Llama2-13B. | Tool | ||
MiniGPT-4 | Enhancing Vision-language Understanding with Advanced Large Language Models. | arXiv | Tool | |
MiniGPT-5 | Interleaved Vision-and-Language Generation via Generative Vokens. | arXiv | Tool | |
Mixtral 8x7B | A high quality Sparse Mixture-of-Experts. | arXiv | Tool | |
Mistral 7B | The best 7B model to date, Apache 2.0. | Tool | ||
Mistral Large | Mistral Large is a new cutting-edge text generation model. It reaches top-tier reasoning capabilities. | Tool | ||
MLC LLM | Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. | Tool | ||
MobiLlama | Towards Accurate and Lightweight Fully Transparent GPT. | arXiv | Tool | |
MoE-LLaVA | Mixture of Experts for Large Vision-Language Models. | arXiv | Tool | |
Moshi | Moshi is an experimental conversational AI. | Tool | ||
Moshi | Moshi: a speech-text foundation model for real time dialogue. | Tool | ||
MOSS | An open-source tool-augmented conversational language model from Fudan University. | Tool | ||
mPLUG-Owl🦉 | Modularization Empowers Large Language Models with Multimodality. | arXiv | Tool | |
Nemotron-4 | A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. | arXiv | Tool | |
NExT-GPT | Any-to-Any Multimodal Large Language Model. | Tool | ||
OLMo | Open Language Model | arXiv | Tool | |
OmniLMM | Large multi-modal models for strong performance and efficient deployment. | Tool | ||
OneLLM | One Framework to Align All Modalities with Language. | arXiv | Tool | |
Open-Assistant | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. | Tool | ||
OpenDevin | An autonomous AI software engineer. | Tool | ||
Orion-14B | Orion-14B is a family of models includes a 14B foundation LLM, and a series of models. | arXiv | Tool | |
Panda | Overseas Chinese open source large language model, based on Llama-7B, -13B, -33B, -65B for continuous pre-training in the Chinese field. | Tool | ||
Perplexica | An AI-powered search engine. | Tool | ||
Pi | AI chatbot designed for personal assistance and emotional support. | Tool | ||
Qwen1.5 | Qwen1.5 is the improved version of Qwen. | Tool | ||
Qwen2 | Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. | Tool | ||
Qwen-7B | The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. | Tool | ||
RepoAgent | RepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. | arXiv | Tool | |
Sanity AI Engine | Sanity AI Engine for the Unity Game Development Tool. | Unity | Tool | |
SearchGPT | 🌳 Connecting ChatGPT with the Internet | Tool | ||
ShareGPT4V | Improving Large Multi-Modal Models with Better Captions. | Tool | ||
Skywork | Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. | Tool | ||
StableLM | Stability AI Language Models. | arXiv | Tool | |
Stanford Alpaca | An Instruction-following LLaMA Model. | Tool | ||
Text generation web UI | A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA. | Tool | ||
TinyChatEngine | On-Device LLM Inference Library. | Tool | ||
ToolBench | An open platform for training, serving, and evaluating large language model for tool learning. | Tool | ||
Unity ChatGPT | Unity ChatGPT Experiments. | Unity | Tool | |
Unity OpenAI-API Integration | Integrate openai GPT-3 language model and ChatGPT API into a Unity project. | Unity | Tool | |
Unreal Engine 5 Llama LoRA | A proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools. | Unreal Engine | Tool | |
UnrealGPT | A collection of Unreal Engine 5 Editor Utility widgets powered by GPT3/4. | Unreal Engine | Tool | |
Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Tool | |
WebGPT | Run GPT model on the browser with WebGPU. | Tool | ||
Web3-GPT | Deploy smart contracts with AI | Tool | ||
WordGPT | 🤖 Bring the power of ChatGPT to Microsoft Word | Tool | ||
XAgent | An Autonomous LLM Agent for Complex Task Solving. | Tool | ||
Yi | A series of large language models trained from scratch by developers. | Tool | ||
01 Project | The open-source language model computer. | Tool |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AgentBench | A Comprehensive Benchmark to Evaluate LLMs as Agents. | arXiv | Agent | |
Agent Group Chat | An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior. | arXiv | Agent | |
Agent K | An autoagentic AGI that is self-evolving and modular. | Agent | ||
AgentScope | Start building LLM-empowered multi-agent applications in an easier way. | arXiv | Agent | |
AgentSims | An Open-Source Sandbox for Large Language Model Evaluation. | Agent | ||
AI Town | AI Town is a virtual town where AI characters live, chat and socialize. | Agent | ||
anime.gf | Local & Open Source Alternative to CharacterAI. | Game | ||
Astrocade | Create games with AI | Game | ||
Atomic Agents | The Atomic Agents framework is designed to be modular, extensible, and easy to use. | Agent | ||
AutoAgents | A Framework for Automatic Agent Generation. | Agent | ||
AutoGen | Enable Next-Gen Large Language Model Applications. | arXiv | Agent | |
behaviac | Behaviac is a framework of the game AI development. | Framework | ||
Biomes | Biomes is an open source sandbox MMORPG built for the web using web technologies such as Next.js, Typescript, React and WebAssembly. | Game | ||
Buffer of Thoughts | Thought-Augmented Reasoning with Large Language Models. | arXiv | Agent | |
Byzer-Agent | Easy, fast, and distributed agent framework for everyone. | Agent | ||
Cat Town | A C(h)atGPT-powered simulation with cats. | Agent | ||
Cat Town | A C(h)atGPT-powered simulation with cats. | Agent | ||
CharacterGLM | Customizing Chinese Conversational AI Characters with Large Language Models. | arXiv | Agent | |
ChatDev | Communicative Agents for Software Development. | arXiv | Agent | |
CogAgent | CogAgent is an open-source visual language model improved based on CogVLM. | arXiv | Agent | |
Cradle | Towards General Computer Control. | Agent | ||
crewAI | Framework for orchestrating role-playing, autonomous AI agents. | Agent | ||
Dify | Dify is an open-source LLM app building platform. | Agent | ||
Digital Life Project | Autonomous 3D Characters with Social Intelligence. | arXiv | Agent | |
everything-ai | Your fully proficient, AI-powered and local chatbot assistant🤖. | Agent | ||
fabric | fabric is an open-source framework for augmenting humans using AI. | Agent | ||
FastGPT | FastGPT is a knowledge-based platform built on the LLM. | Agent | ||
fastRAG | Efficient Retrieval Augmentation and Generation Framework. | Agent | ||
GameAISDK | Image-based game AI automation framework. | Framework | ||
GameNGen | Diffusion Models Are Real-Time Game Engines. | arXiv | Game | |
GameGen-O | GameGen-O: Open-world Video Game Generation. | Game | ||
GenAgent | GenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI. | arXiv | Agent | |
Generative Agents | Interactive Simulacra of Human Behavior. | arXiv | Agent | |
Genie | Generative Interactive Environments. | Game | ||
gigax | Runtime, LLM-powered NPCs. | Game | ||
HippoRAG | Neurobiologically Inspired Long-Term Memory for Large Language Models. | arXiv | Agent | |
Interactive LLM Powered NPCs | Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game! | Game | ||
IoA | An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | Agent | ||
KwaiAgents | A generalized information-seeking agent system with Large Language Models (LLMs). | arXiv | Agent | |
LangChain | Get your LLM application from prototype to production. | Agent | ||
Langflow | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | Agent | ||
LangGraph Studio | LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. | Agent | ||
LARP | Language-Agent Role Play for open-world games. | arXiv | Agent | |
LLama Agentic System | Agentic components of the Llama Stack APIs. | Agent | ||
LlamaIndex | LlamaIndex is a data framework for your LLM application. | Agent | ||
MindSearch | 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). | Agent | ||
Mixture of Agents (MoA) | Mixture-of-Agents Enhances Large Language Model Capabilities. | arXiv | Agent | |
MMRole | MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents. | arXiv | Agent | |
Moonlander.ai | Start building 3D games without any coding using generative AI. | Framework | ||
MuG Diffusion | MuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves. | Game | ||
Oasis | Oasis is an interactive world model developed by Decart and Etched. Based on diffusion transformers, Oasis takes in user keyboard input and generates gameplay in an autoregressive manner. | Game | ||
OmAgent | A multimodal agent framework for solving complex tasks. | Agent | ||
OpenAgents | An Open Platform for Language Agents in the Wild. | Agent | ||
Opus | An AI app that turns text into a video game. | Game | ||
Pipecat | Open Source framework for voice and multimodal conversational AI. | Agent | ||
Qwen-Agent | Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. | Agent | ||
Ragas | Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. | Agent | ||
RPBench-Auto | An automated pipeline for evaluating LLMs for role-playing. | Game | ||
SIMA | A generalist AI agent for 3D virtual environments. | Agent | ||
StoryGames.ai | AI for Dreamers Make Games. | Game | ||
SWE-agent | Agent Computer Interfaces Enable Software Engineering Language Models. | arXiv | Agent | |
TaskGen | A Task-based agentic framework building on StrictJSON outputs by LLM agents. | Agent | ||
TEN Agent | TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. | Agent | ||
Translation Agent | Agentic translation using reflection workflow. | Agent | ||
Twitter Personality is a web application that analyzes your Twitter handle to create a personalized personality profile using Wordware AI Agent. | Agent | |||
Unbounded | Unbounded: A Generative Infinite Game of Character Life Simulation. | arXiv | Game | |
Video2Game | Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. | arXiv | Game | |
V-IRL | Grounding Virtual Intelligence in Real Life. | arXiv | Agent | |
WebDesignAgent | An agent used for webdesign. | Agent | ||
XAgent | An Autonomous LLM Agent for Complex Task Solving. | Agent |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AI Code Translator | Use AI to translate code from one language to another. | Code | ||
aiXcoder-7B | aiXcoder-7B Code Large Language Model. | Code | ||
bloop | bloop is a fast code search engine written in Rust. | Code | ||
Chapyter | ChatGPT Code Interpreter in Jupyter Notebooks. | Code | ||
CodeGeeX | An Open Multilingual Code Generation Model. | arXiv | Code | |
CodeGeeX2 | A More Powerful Multilingual Code Generation Model. | Code | ||
CodeGeeX4 | CodeGeeX4: Open Multilingual Code Generation Model. | Code | ||
CodeGen | CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. | arXiv | Code | |
CodeGen2 | CodeGen2 models for program synthesis. | arXiv | Code | |
Code Llama | Code Llama is a large language models for code based on Llama 2. | Code | ||
CodeTF | One-stop Transformer Library for State-of-the-art Code LLM. | Code | ||
CodeT5 | Open Code LLMs for Code Understanding and Generation. | Code | ||
Cursor | Write, edit, and chat about your code with GPT-4 in a new type of editor. | Code | ||
DeepSeek Coder | DeepSeek Coder: Let the Code Write Itself. | arXiv | Code | |
OpenAI Codex | OpenAI Codex is a descendant of GPT-3. | Code | ||
PandasAI | Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational. | Code | ||
RobloxScripterAI | RobloxScripterAI is an AI-powered code generation tool for Roblox. | Roblox | Code | |
Scikit-LLM | Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. | Code | ||
SoTaNa | The Open-Source Software Development Assistant. | arXiv | Code | |
Stable Code 3B | Coding on the Edge. | Code | ||
StarCoder | 💫 StarCoder is a language model (LM) trained on source code and natural language text. | arXiv | Code | |
StarCoder 2 | StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. | arXiv | Code | |
UnityGen AI | UnityGen AI is an AI-powered code generation plugin for Unity. | Unity | Code | |
Void | Void is an open source Cursor alternative. Write code with the best AI tools, retain full control over your data, and access powerful AI features. | Code |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AI-Writer | AI writes novels, generates fantasy and romance web articles, etc. Chinese pre-trained generative model. | Writer | ||
Notebook.ai | Notebook.ai is a set of tools for writers, game designers, and roleplayers to create magnificent universes – and everything within them. | Writer | ||
Novel | Notion-style WYSIWYG editor with AI-powered autocompletions. | Writer | ||
NovelAI | Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around. | Writer |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AnyDoor | Zero-shot Object-level Image Customization. | arXiv | Image | |
AnyText | Multilingual Visual Text Generation And Editing. | arXiv | Image | |
AutoStudio | Crafting Consistent Subjects in Multi-turn Interactive Image Generation. | arXiv | Image | |
Blender-ControlNet | Using ControlNet right in Blender. | Blender | Image | |
BriVL | Bridging Vision and Language Model. | arXiv | Image | |
CatVTON | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models. | arXiv | Image | |
CLIPasso | A method for converting an image of an object to a sketch, allowing for varying levels of abstraction. | arXiv | Image | |
ClipDrop | Create stunning visuals in seconds. | Image | ||
ComfyUI | A powerful and modular stable diffusion GUI with a graph/nodes interface. | Image | ||
ConceptLab | Creative Generation using Diffusion Prior Constraints. | arXiv | Image | |
ControlNet | ControlNet is a neural network structure to control diffusion models by adding extra conditions. | arXiv | Image | |
CSGO | CSGO: Content-Style Composition in Text-to-Image Generation. | arXiv | Image | |
DALL·E 2 | DALL·E 2 is an AI system that can create realistic images and art from a description in natural language. | Image | ||
Dashtoon Studio | Dashtoon Studio is an AI powered comic creation platform. | Comic | ||
DeepAI | DeepAI offers a suite of tools that use AI to enhance your creativity. | Image | ||
DeepFloyd IF | IF by DeepFloyd Lab at StabilityAI. | Image | ||
Depth Anything V2 | Depth Anything V2 | arXiv | Image | |
Depth map library and poser | Depth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui. | Image | ||
Diffuse to Choose | Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All. | arXiv | Image | |
Disco Diffusion | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations. | Image | ||
DragGAN | Interactive Point-based Manipulation on the Generative Image Manifold. | arXiv | Image | |
Draw Things | AI- assisted image generation in Your Pocket. | Image | ||
DWPose | Effective Whole-body Pose Estimation with Two-stages Distillation. | arXiv | Image | |
EasyPhoto | Your Smart AI Photo Generator. | Image | ||
Flux | This repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers. | Image | ||
Follow-Your-Click | Open-domain Regional Image Animation via Short Prompts. | arXiv | Image | |
Fooocus | Focus on prompting and generating. | Image | ||
GIFfusion | Create GIFs and Videos using Stable Diffusion. | Image | ||
Grounded-Segment-Anything | Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs. | arXiv | Image | |
HivisionIDPhotos | HivisionIDPhotos: a lightweight and efficient AI ID photos tools. | Image | ||
Hua | Hua is an AI image editor with Stable Diffusion (and more). | Image | ||
Hunyuan-DiT | A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. | arXiv | Image | |
IC-Light | IC-Light is a project to manipulate the illumination of images. | Image | ||
Ideogram | Helping people become more creative. | Image | ||
Imagen | Imagen is an AI system that creates photorealistic images from input text. | Image | ||
img2img-turbo | One-Step Image-to-Image with SD-Turbo. | Image | ||
Img2Prompt | Get prompts from stable diffusion generated images. | Image | ||
InstantID | Zero-shot Identity-Preserving Generation in Seconds. | arXiv | Image | |
InternLM-XComposer2 | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Image | |
KOALA | Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. | Image | ||
Kolors | Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. | Image | ||
KREA | Generate images and videos with a delightful AI-powered design tool. | Image | ||
LaVi-Bridge | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation. | arXiv | Image | |
LayerDiffusion | Transparent Image Layer Diffusion using Latent Transparency. | arXiv | Image | |
Lexica | A Stable Diffusion prompts search engine. | Image | ||
LlamaGen | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation. | arXiv | Image | |
Lumina-mGPT | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. | arXiv | Image | |
MetaShoot | MetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way. | Unreal Engine | Image | |
Midjourney | Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. | Image | ||
MIGC | MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis. | arXiv | Image | |
MimicBrush | Zero-shot Image Editing with Reference Imitation. | arXiv | Image | |
OmniGen | OmniGen: Unified Image Generation. | arXiv | Image | |
Omost | Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability. | Image | ||
Openpose Editor | Openpose Editor for AUTOMATIC1111's stable-diffusion-webui. | Image | ||
Outfit Anyone | Ultra-high quality virtual try-on for Any Clothing and Any Person. | Image | ||
PaintsUndo | PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings. | Image | ||
PhotoMaker | Customizing Realistic Human Photos via Stacked ID Embedding. | arXiv | Image | |
Photoroom | AI Background Generator. | Image | ||
Plask | AI image generation in the cloud. | Image | ||
Prompt.Art | The Generators Hub. | Image | ||
PuLID | Pure and Lightning ID Customization via Contrastive Alignment. | arXiv | Image | |
Rich-Text-to-Image | Expressive Text-to-Image Generation with Rich Text. | arXiv | Image | |
RPG-DiffusionMaster | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | Image | ||
SEED-Story | SEED-Story: Multimodal Long Story Generation with Large Language Model. | arXiv | Image | |
Segment Anything | Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click. | arXiv | Image | |
Segment Anything Model 2 (SAM 2) | SAM 2: Segment Anything in Images and Videos. | arXiv | Image | |
sd-webui-controlnet | WebUI extension for ControlNet. | Image | ||
SDXL-Lightning | Progressive Adversarial Diffusion Distillation. | arXiv | Image | |
SDXS | Real-Time One-Step Latent Diffusion Models with Image Conditions. | Image | ||
Stable.art | Photoshop plugin for Stable Diffusion with Automatic1111 as backend (locally or with Google Colab). | Image | ||
Stable Cascade | Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name "Stable Cascade". | Image | ||
Stable Diffusion | A latent text-to-image diffusion model. | Image | ||
stable-diffusion.cpp | Stable Diffusion in pure C/C++. | Image | ||
Stable Diffusion web UI | A browser interface based on Gradio library for Stable Diffusion. | Image | ||
Stable Diffusion web UI | Web-based UI for Stable Diffusion. | Image | ||
Stable Diffusion WebUI Chinese | Chinese version of stable-diffusion-webui. | Image | ||
Stable Diffusion XL | Generate images from text. | arXiv | Image | |
Stable Diffusion XL Turbo | Real-Time Text-to-Image Generation. | Image | ||
Stable Diffusion 3.5 | Stable Diffusion 3.5 open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. | Image | ||
Stable Doodle | Stable Doodle is a sketch-to-image tool that converts a simple drawing into a dynamic image. | Image | ||
StableStudio | StableStudio by Stability AI | Image | ||
StoryMaker | StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation. | arXiv | Image | |
StreamDiffusion | A Pipeline-Level Solution for Real-Time Interactive Generation. | Image | ||
StyleDrop | Text-To-Image Generation in Any Style. | arXiv | Image | |
SyncDreamer | Generating Multiview-consistent Images from a Single-view Image. | arXiv | Image | |
UltraEdit | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale. | arXiv | Image | |
UltraPixel | UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks. | arXiv | Image | |
Unity ML Stable Diffusion | Core ML Stable Diffusion on Unity. | Unity | Image | |
Vispunk Visions | Text-to-Image generation platform. | Image |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
CRM | Single Image to 3D Textured Mesh with Convolutional Reconstruction Model. | arXiv | Texture | |
DreamMat | High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. | arXiv | Texture | |
DreamSpace | Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. | Texture | ||
Dream Textures | Stable Diffusion built-in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | Blender | Texture | |
InstructHumans | Editing Animated 3D Human Textures with Instructions. | arXiv | Texture | |
InteX | Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting. | arXiv | Texture | |
MaterialSeg3D | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. | arXiv | Texture | |
MeshAnything | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. | arXiv | Mesh | |
Neuralangelo | High-Fidelity Neural Surface Reconstruction. | arXiv | Texture | |
Paint-it | Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | Texture | ||
Polycam | Create your own 3D textures just by typing. | Texture | ||
TexFusion | Synthesizing 3D Textures with Text-Guided Image Diffusion Models. | arXiv | Texture | |
Text2Tex | Text-driven texture Synthesis via Diffusion Models. | arXiv | Texture | |
Texture Lab | AI-generated texures. You can generate your own with a text prompt. | Texture | ||
With Poly | Create Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library. | Texture | ||
X-Mesh | X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. | arXiv | Texture |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AI Shader | ChatGPT-powered shader generator for Unity. | Unity | Shader |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
Animate3D | Animate3D: Animating Any 3D Model with Multi-view Video Diffusion. | arXiv | 3D | |
Anything-3D | Segment-Anything + 3D. Let's lift the anything to 3D. | arXiv | Model | |
Any2Point | Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding. | arXiv | 3D | |
BlenderGPT | Use commands in English to control Blender with OpenAI's GPT-4. | Blender | Model | |
Blender-GPT | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration. | Blender | Model | |
Blockade Labs | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | Model | ||
CF-3DGS | COLMAP-Free 3D Gaussian Splatting. | arXiv | 3D | |
CharacterGen | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization. | arXiv | 3D | |
chatGPT-maya | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions. | Maya | Model | |
CityDreamer | Compositional Generative Model of Unbounded 3D Cities. | arXiv | 3D | |
CSM | Generate 3D worlds from images and videos. | 3D | ||
Dash | Your Copilot for World Building in Unreal Engine. | Unreal Engine | 3D | |
DreamCatalyst | DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation. | arXiv | 3D | |
DreamGaussian4D | Generative 4D Gaussian Splatting. | arXiv | 4D | |
DUSt3R | Geometric 3D Vision Made Easy. | arXiv | 3D | |
GALA3D | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. | arXiv | 3D | |
GaussCtrl | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing. | arXiv | 3D | |
GaussianCube | A Structured and Explicit Radiance Representation for 3D Generative Modeling. | arXiv | 3D | |
GaussianDreamer | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors. | arXiv | 3D | |
GenieLabs | Empower your game with AI-UGC. | 3D | ||
HiFA | High-fidelity Text-to-3D with advance Diffusion guidance. | Model | ||
HoloDreamer | HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions. | arXiv | 3D | |
Infinigen | Infinite Photorealistic Worlds using Procedural Generation. | arXiv | 3D | |
Instruct-NeRF2NeRF | Editing 3D Scenes with Instructions. | arXiv | Model | |
Interactive3D | Create What You Want by Interactive 3D Generation. | arXiv | 3D | |
Isotropic3D | Image-to-3D Generation Based on a Single CLIP Embedding. | 3D | ||
LATTE3D | Large-scale Amortized Text-To-Enhanced3D Synthesis. | arXiv | 3D | |
LION | Latent Point Diffusion Models for 3D Shape Generation. | arXiv | Model | |
Luma AI | Capture in lifelike 3D. Unmatched photorealism, reflections, and details. The future of VFX is now, for everyone! | Model | ||
lumine AI | AI-Powered Creativity. | 3D | ||
Make-It-3D | High-Fidelity 3D Creation from A Single Image with Diffusion Prior. | arXiv | Model | |
Meshy | Create Stunning 3D Game Assets with AI. | 3D | ||
Mootion | Magical 3D AI Animation Maker. | 3D | ||
MVDream | Multi-view Diffusion for 3D Generation. | arXiv | 3D | |
NVIDIA Instant NeRF | Instant neural graphics primitives: lightning fast NeRF and more. | Model | ||
One-2-3-45 | Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization. | arXiv | Model | |
Paint3D | Paint Anything 3D with Lighting-Less Texture Diffusion Models. | arXiv | 3D | |
PAniC-3D | Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. | arXiv | Model | |
Point·E | Point cloud diffusion for 3D model synthesis. | Model | ||
ProlificDreamer | High-Fidelity and diverse Text-to-3D generation with Variational score Distillation. | arXiv | Model | |
SF3D | SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement. | arXiv | 3D | |
Shap-E | Generate 3D objects conditioned on text or images. | arXiv | Model | |
Sloyd | 3D modelling has never been easier. | Model | ||
Spline AI | The power of AI is coming to the 3rd dimension. Generate objects, animations, and textures using prompts. | Model | ||
Stable Dreamfusion | A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | Model | ||
SV3D | Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. | arXiv | 3D | |
Tafi | AI text to 3D character engine. | Model | ||
3D-GPT | Procedural 3D Modeling with Large Language Models. | arXiv | 3D | |
3D-LLM | Injecting the 3D World into Large Language Models. | arXiv | 3D | |
3Dpresso | Extract a 3D model of an object, captured on a video. | Model | ||
3DTopia | Text-to-3D Generation within 5 Minutes. | arXiv | 3D | |
3DTopia-XL | 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. | arXiv | 3D | |
threestudio | A unified framework for 3D content generation. | Model | ||
TripoSR | A state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image. | arXiv | Model | |
Unique3D | High-Quality and Efficient 3D Mesh Generation from a Single Image. | arXiv | 3D | |
UnityGaussianSplatting | Toy Gaussian Splatting visualization in Unity. | Unity | 3D | |
ViVid-1-to-3 | Novel View Synthesis with Video Diffusion Models. | arXiv | 3D | |
Voxcraft | Crafting Ready-to-Use 3D Models with AI. | 3D | ||
Wonder3D | Single Image to 3D using Cross-Domain Diffusion. | arXiv | 3D | |
Zero-1-to-3 | Zero-shot One Image to 3D Object. | arXiv | Model |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AniPortrait | Audio-Driven Synthesis of Photorealistic Portrait Animations. | arXiv | Avatar | |
CALM | Conditional Adversarial Latent Models for Directable Virtual Characters. | arXiv | Avatar | |
ChatAvatar | Progressive generation Of Animatable 3D Faces Under Text guidance. | Avatar | ||
ChatdollKit | ChatdollKit enables you to make your 3D model into a chatbot. | Unity | Avatar | |
DreamTalk | When Expressive Talking Head Generation Meets Diffusion Probabilistic Models. | arXiv | Avatar | |
Duix | Duix - Silicon-Based Digital Human SDK 🌐🤖 | Avatar | ||
EchoMimic | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions. | arXiv | Avatar | |
EMOPortraits | Emotion-enhanced Multimodal One-shot Head Avatars. | Avatar | ||
E3 Gen | Efficient, Expressive and Editable Avatars Generation. | arXiv | Avatar | |
ExAvatar | ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. | arXiv | Avatar | |
GeneAvatar | Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image. | arXiv | Avatar | |
GeneFace++ | Generalized and Stable Real-Time 3D Talking Face Generation. | Avatar | ||
Hallo | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation. | arXiv | Avatar | |
Hallo2 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation. | arXiv | Avatar | |
HeadSculpt | Crafting 3D Head Avatars with Text. | arXiv | Avatar | |
IntrinsicAvatar | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing. | arXiv | Avatar | |
Linly-Talker | Digital Avatar Conversational System. | Avatar | ||
LivePortrait | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. | arXiv | Avatar | |
MotionGPT | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs. | arXiv | Avatar | |
MusePose | MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation. | Avatar | ||
MuseTalk | Real-Time High Quality Lip Synchorization with Latent Space Inpainting. | Avatar | ||
MuseV | Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | Avatar | ||
Portrait4D | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. | arXiv | Avatar | |
Ready Player Me | Integrate customizable avatars into your game or app in days. | Avatar | ||
RodinHD | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models. | arXiv | Avatar | |
StyleAvatar3D | Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. | arXiv | Avatar | |
Text2Control3D | Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model. | arXiv | Avatar | |
Topo4D | Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. | arXiv | Avatar | |
UnityAIWithChatGPT | Based on Unity, ChatGPT+UnityChan voice interactive display is realized. | Unity | Avatar | |
Vid2Avatar | 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. | arXiv | Avatar | |
VLOGGER | Multimodal Diffusion for Embodied Avatar Synthesis. | Avatar | ||
Wild2Avatar | Rendering Humans Behind Occlusions. | arXiv | Avatar |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
Animate Anyone | Consistent and Controllable Image-to-Video Synthesis for Character Animation. | arXiv | Animation | |
AnimateAnything | Fine-Grained Open Domain Image Animation with Motion Guidance. | arXiv | Animation | |
AnimateDiff | Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. | arXiv | Animation | |
AnimateLCM | Let's Accelerate the Video Generation within 4 Steps! | arXiv | Animation | |
AnimateZero | Video Diffusion Models are Zero-Shot Image Animators. | arXiv | Animation | |
AnimationGPT | An AIGC tool for generating game combat motion assets. | Animation | ||
Deforum | Deforum leverages Stable Diffusion to generate evolving AI visuals. | Animation | ||
DrawingSpinUp | DrawingSpinUp: 3D Animation from Single Character Drawings. | arXiv | Animation | |
DreaMoving | A Human Video Generation Framework based on Diffusion Models. | arXiv | Animation | |
FaceFusion | Next generation face swapper and enhancer. | Animation | ||
FreeInit | Bridging Initialization Gap in Video Diffusion Models. | arXiv | Animation | |
GeneFace | Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis. | arXiv | Animation | |
ID-Animator | Zero-Shot Identity-Preserving Human Video Generation. | arXiv | Animation | |
MagicAnimate | Temporally Consistent Human Image Animation using Diffusion Model. | arXiv | Animation | |
NUWA | DragNUWA is an open-domain diffusion-based video generation model takes text, image, and trajectory controls as inputs to achieve controllable video generation. | arXiv | Animation | |
NUWA-Infinity | NUWA-Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | Animation | ||
NUWA-XL | A novel Diffusion over Diffusion architecture for eXtremely Long video generation. | Animation | ||
Omni Animation | AI Generated High Fidelity Animations. | Animation | ||
PIA | Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models. | arXiv | Animation | |
SadTalker | Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. | arXiv | Animation | |
SadTalker-Video-Lip-Sync | This project is based on SadTalkers Wav2lip for video lip synthesis. | Animation | ||
Stable Animation | A powerful text-to-animation tool for developers. | Animation | ||
TaleCrafter | An interactive story visualization tool that support multiple characters. | arXiv | Animation | |
ToonCrafter | ToonCrafter: Generative Cartoon Interpolation. | arXiv | Animation | |
Wav2Lip | Accurately Lip-syncing Videos In The Wild. | arXiv | Animation | |
Wonder Studio | An AI tool that automatically animates, lights and composes CG characters into a live-action scene. | Animation |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
Cambrian-1 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. | arXiv | Multimodal LLMs | |
CogVLM2 | GPT4V-level open-source multi-modal model based on Llama3-8B. | Visual | ||
CoTracker | It is Better to Track Together. | arXiv | Visual | |
EVF-SAM | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. | arXiv | Visual | |
FaceHi | It is Better to Track Together. | Visual | ||
InternLM-XComposer2 | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Visual | |
Kangaroo | Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input. | Visual | ||
LGVI | Towards Language-Driven Video Inpainting via Multimodal Large Language Models. | Visual | ||
LLaVA++ | Extending Visual Capabilities with LLaMA-3 and Phi-3. | Visual | ||
LLaVA-OneVision | LLaVA-OneVision: Easy Visual Task Transfer. | arXiv | Visual | |
LongVA | Long Context Transfer from Language to Vision. | arXiv | Visual | |
MaskViT | Masked Visual Pre-Training for Video Prediction. | arXiv | Visual | |
MiniCPM-Llama3-V 2.5 | A GPT-4V Level MLLM on Your Phone. | Visual | ||
MoE-LLaVA | Mixture of Experts for Large Vision-Language Models. | arXiv | Visual | |
MotionLLM | Understanding Human Behaviors from Human Motions and Videos. | arXiv | Visual | |
PLLaVA | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. | arXiv | Visual | |
Qwen-VL | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. | arXiv | Visual | |
Sapiens | Sapiens: Foundation for Human Vision Models. | arXiv | Visual | |
ShareGPT4V | Improving Large Multi-modal Models with Better Captions. | arXiv | Visual | |
SOLO | SOLO: A Single Transformer for Scalable Vision-Language Modeling. | arXiv | Visual | |
Video-CCAM | Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | Visual | ||
Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Visual | |
VideoLLaMA 2 | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. | arXiv | Visual | |
Video-MME | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. | arXiv | Visual | |
Vitron | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. | Visual | ||
VILA | VILA: On Pre-training for Visual Language Models. | arXiv | Visual |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
360DVD | Controllable Panorama Video Generation with 360-Degree Video Diffusion Model. | arXiv | Video | |
Animate-A-Story | Retrieval-Augmented Video Generation for Telling a Story. | arXiv | Video | |
Anything in Any Scene | Photorealistic Video Object Insertion. | Video | ||
ART•V | Auto-Regressive Text-to-Video Generation with Diffusion Models. | arXiv | Video | |
Assistive | Meet the generative video platform that brings your ideas to life. | Video | ||
AtomoVideo | High Fidelity Image-to-Video Generation. | arXiv | Video | |
BackgroundRemover | Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source. | Video | ||
Boximator | Generating Rich and Controllable Motions for Video Synthesis. | arXiv | Video | |
CoDeF | Content Deformation Fields for Temporally Consistent Video Processing. | arXiv | Video | |
CogVideo | Generate Videos from Text Descriptions. | Video | ||
CogVideoX | CogVideoX is an open-source version of the video generation model, which is homologous to 清影. | Video | ||
CogVLM | CogVLM is a powerful open-source visual language model (VLM). | Visual | ||
CoNR | Genarate vivid dancing videos from hand-drawn anime character sheets(ACS). | arXiv | Video | |
Decohere | Create what can't be filmed. | Video | ||
Descript | Descript is the simple, powerful , and fun way to edit. | Video | ||
Diffutoon | High-Resolution Editable Toon Shading via Diffusion Models. | arXiv | Video | |
dolphin | General video interaction platform based on LLMs. | Video | ||
DomoAI | Amplify Your Creativity with DomoAI. | Video | ||
DreamCinema | DreamCinema: Cinematic Transfer with Free Camera and 3D Character. | arXiv | Video | |
DynamiCrafter | Animating Open-domain Images with Video Diffusion Priors. | arXiv | Video | |
EDGE | We introduce EDGE, a powerful method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to arbitrary input music. | arXiv | Video | |
EMO | Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. | arXiv | Video | |
Emu Video | Factorizing Text-to-Video Generation by Explicit Image Conditioning. | Video | ||
Etna | Etna can generate corresponding video content based on short text descriptions. | Video | ||
Fairy | Fast Parallelized Instruction-Guided Video-to-Video Synthesis. | Video | ||
Follow-Your-Canvas | Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. | arXiv | Video | |
Follow Your Pose | Pose-Guided Text-to-Video Generation using Pose-Free Videos. | arXiv | Video | |
FullJourney | Your complete suite of AI Creation tools at your fingertips. | Video | ||
Gen-2 | A multi-modal AI system that can generate novel videos with text, images, or video clips. | Video | ||
Generative Dynamics | Generative Image Dynamics. | Video | ||
Genie | Generative Interactive Environments. | arXiv | Video | |
Genmo | Magically make videos with AI. | Video | ||
GenTron | Diffusion Transformers for Image and Video Generation. | Video | ||
HiGen | Hierarchical Spatio-temporal Decoupling for Text-to-Video generation. | Video | ||
Hotshot-XL | Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | Video | ||
Imagen Video | Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. | Video | ||
InstructVideo | Instructing Video Diffusion Models with Human Feedback. | arXiv | Video | |
I2VGen-XL | High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. | arXiv | Video | |
LaVie | High-Quality Video Generation with Cascaded Latent Diffusion Models. | arXiv | Video | |
LTX Studio | LTX Studio is a holistic, AI-driven filmmaking platform for creators, marketers, filmmakers and studios. | Video | ||
Lumiere | A Space-Time Diffusion Model for Video Generation. | arXiv | Video | |
LVDM | Latent Video Diffusion Models for High-Fidelity Long Video Generation. | arXiv | Video | |
MagicVideo | Efficient Video Generation With Latent Diffusion Models. | arXiv | Video | |
MagicVideo-V2 | Multi-Stage High-Aesthetic Video Generation. | arXiv | Video | |
Magic Hour | AI Video for Creators made simple. | Video | ||
MAGVIT-v2 | Tokenizer is key to visual generation. | Video | ||
MAGVIT | Masked Generative Video Transformer. | Video | ||
Make-A-Video | Make-A-Video is a state-of-the-art AI system that generates videos from text. | arXiv | Video | |
Make Pixels Dance | High-Dynamic Video Generation. | arXiv | Video | |
Make-Your-Video | Customized Video Generation Using Textual and Structural Guidance. | arXiv | Video | |
MicroCinema | A Divide-and-Conquer Approach for Text-to-Video Generation. | arXiv | Video | |
MIMO | MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling. | arXiv | Video | |
Mini-Gemini | Mining the Potential of Multi-modality Vision Language Models. | Vision | ||
MobileVidFactory | Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. | Video | ||
Mochi 1 | Mochi 1 is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. | Video | ||
MOFA-Video | Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model. | arXiv | Video | |
MoneyPrinterTurbo | Use large models to generate short videos with one click. | Video | ||
Moonvalley | Moonvalley is a groundbreaking new text-to-video generative AI model. | Video | ||
Mora | More like Sora for Generalist Video Generation. | arXiv | Video | |
Morph Studio | With our Text-to-Video AI Magic, manifest your creativity through your prompt. | Video | ||
MotionClone | MotionClone: Training-Free Motion Cloning for Controllable Video Generation. | arXiv | Video | |
MotionCtrl | A Unified and Flexible Motion Controller for Video Generation. | arXiv | Video | |
MotionDirector | Motion Customization of Text-to-Video Diffusion Models. | arXiv | Video | |
Motionshop | An application of replacing the characters in video with 3D avatars. | Video | ||
Mov2mov | Mov2mov plugin for Automatic1111/stable-diffusion-webui. | Video | ||
MovieFactory | Automatic Movie Creation from Text using Large Generative Models for Language and Images. | arXiv | Video | |
Neural Frames | Discover the synthesizer for the visual world. | Video | ||
NeverEnds | Create your world. | Video | ||
Open-Sora | Democratizing Efficient Video Production for All. | Video | ||
Open-Sora | Open-Sora Plan. | Video | ||
Phenaki | A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes. | arXiv | Video | |
Pika Labs | Pika Labs is revolutionizing video-making experience with AI. | Video | ||
Pixeling | Pixeling empowers our customers to create highly precise, ultra-realistic, and extremely controllable visual content including images, videos and 3D models. | Video | ||
PixVerse | Create breath-taking videos with AI. | Video | ||
Pollinations | Creating gets easy, fast, and fun. | Video | ||
Reuse and Diffuse | Iterative Denoising for Text-to-Video Generation. | arXiv | Video | |
ShortGPT | An experimental AI framework for automated short/video content creation. | Video | ||
Show-1 | Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. | arXiv | Video | |
Snap Video | Scaled Spatiotemporal Transformers for Text-to-Video Synthesis. | arXiv | Video | |
Sora | Creating video from text. | Video | ||
SoraWebui | SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | Video | ||
StableVideo | Text-driven Consistency-aware Diffusion Video Editing. | Video | ||
Stable Video Diffusion | Stable Video Diffusion (SVD) Image-to-Video. | Video | ||
StoryDiffusion | Consistent Self-Attention for Long-Range Image and Video Generation. | arXiv | Video | |
StreamingT2V | Consistent, Dynamic, and Extendable Long Video Generation from Text. | arXiv | Video | |
StyleCrafter | nhancing Stylized Text-to-Video Generation with Style Adapter. | arXiv | Video | |
TATS | Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer. | Video | ||
Text2Video-Zero | Text-to-Image Diffusion Models are Zero-Shot Video Generators. | arXiv | Video | |
TF-T2V | A Recipe for Scaling up Text-to-Video Generation with Text-free Videos. | arXiv | Video | |
Tora | Tora: Trajectory-oriented Diffusion Transformer for Video Generation. | arXiv | Video | |
Track-Anything | Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. | arXiv | Video | |
Tune-A-Video | One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. | arXiv | Video | |
TwelveLabs | Multimodal AI that understands videos like humans. | Video | ||
UniVG | Towards UNIfied-modal Video Generation. | Video | ||
Vchitect-2.0 | Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. | Video | ||
VGen | A holistic video generation ecosystem for video generation building on diffusion models. | arXiv | Video | |
ViewCrafter | ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis. | arXiv | Video | |
Video-ChatGPT | Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. | arXiv | Video | |
VideoComposer | Compositional Video Synthesis with Motion Controllability. | arXiv | Video | |
VideoCrafter1 | Open Diffusion Models for High-Quality Video Generation. | arXiv | Video | |
VideoCrafter2 | Overcoming Data Limitations for High-Quality Video Diffusion Models. | arXiv | Video | |
VideoDrafter | Content-Consistent Multi-Scene Video Generation with LLM. | arXiv | Video | |
VideoElevator | Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models. | arXiv | Video | |
VideoFactory | Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation. | Video | ||
VideoGen | A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. | arXiv | Video | |
VideoLCM | Video Latent Consistency Model. | arXiv | Video | |
Video LDMs | Align your Latents: High- resolution Video Synthesis with Latent Diffusion Models. | arXiv | Video | |
Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Video | |
VideoMamba | State Space Model for Efficient Video Understanding. | arXiv | Video | |
Video-of-Thought | Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | Video | ||
VideoPoet | A large language model for zero-shot video generation. | arXiv | Video | |
Vispunk Motion | Create realistic videos using just text. | Video | ||
VisualRWKV | VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | Visual | ||
V-JEPA | Video Joint Embedding Predictive Architecture. | arXiv | Video | |
W.A.L.T | Photorealistic Video Generation with Diffusion Models. | arXiv | Video | |
Zeroscope | Zeroscope Text-to-Video. | Video |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AcademiCodec | An Open Source Audio Codec Model for Academic Research. | Audio | ||
Amphion | An Open-Source Audio, Music, and Speech Generation Toolkit. | arXiv | Audio | |
ArchiSound | Audio generation using diffusion models, in PyTorch. | Audio | ||
Audiobox | Unified Audio Generation with Natural Language Prompts. | Audio | ||
AudioEditing | Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. | arXiv | Audio | |
Audiogen Codec | A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵. | Audio | ||
AudioGPT | Understanding and Generating Speech, Music, Sound, and Talking Head. | arXiv | Audio | |
AudioLCM | Text-to-Audio Generation with Latent Consistency Models. | arXiv | Audio | |
AudioLDM | Text-to-Audio Generation with Latent Diffusion Models. | arXiv | Audio | |
AudioLDM 2 | Learning Holistic Audio Generation with Self-supervised Pretraining. | arXiv | Audio | |
Auffusion | Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. | arXiv | Audio | |
CTAG | Creative Text-to-Audio Generation via Synthesizer Programming. | Audio | ||
FoleyCrafter | FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. | arXiv | Audio | |
MAGNeT | Masked Audio Generation using a Single Non-Autoregressive Transformer. | Audio | ||
Make-An-Audio | Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. | arXiv | Audio | |
Make-An-Audio 3 | Transforming Text into Audio via Flow-based Large Diffusion Transformers. | arXiv | Audio | |
NeuralSound | Learning-based Modal Sound Synthesis with Acoustic Transfer. | arXiv | Audio | |
OptimizerAI | Sounds for Creators, Game makers, Artists, Video makers. | Audio | ||
Qwen2-Audio | Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud. | arXiv | Audio | |
SEE-2-SOUND | Zero-Shot Spatial Environment-to-Spatial Sound. | arXiv | Audio | |
SoundStorm | Efficient Parallel Audio Generation. | arXiv | Audio | |
Stable Audio | Fast Timing-Conditioned Latent Audio Diffusion. | Audio | ||
Stable Audio Open | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. | Audio | ||
SyncFusion | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis. | arXiv | Audio | |
TANGO | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | Audio | ||
VTA-LDM | Video-to-Audio Generation with Hidden Alignment. | arXiv | Audio | |
WavJourney | Compositional Audio Creation with Large Language Models. | arXiv | Audio |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
AIVA | The Artificial Intelligence composing emotional soundtrack music. | Music | ||
Amper Music | Custom music generation technology powered by Amper. | Music | ||
Boomy | Create generative music. Share it with the world. | Music | ||
ChatMusician | Fostering Intrinsic Musical Abilities Into LLM. | Music | ||
Chord2Melody | Automatic Music Generation AI. | Music | ||
Diff-BGM | A Diffusion Model for Video Background Music Generation. | arXiv | Music | |
FluxMusic | FluxMusic: Text-to-Music Generation with Rectified Flow Transformer. | arXiv | Music | |
GPTAbleton | Draft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc. | Music | ||
HeyMusic.AI | AI Music Generator | Music | ||
Image to Music | AI Image to Music Generator is a tool that uses artificial intelligence to convert images into music. | Music | ||
JEN-1 | Text-Guided Universal Music Generation with Omnidirectional Diffusion Models. | Music | ||
Jukebox | A Generative Model for Music. | arXiv | Music | |
Magenta | Magenta is a research project exploring the role of machine learning in the process of creating art and music. | Music | ||
MeLoDy | Efficient Neural Music Generation | Music | ||
Mubert | AI Generative Music. | Music | ||
MuseNet | A deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. | Music | ||
MusicGen | Simple and Controllable Music Generation. | arXiv | Music | |
MusicLDM | Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. | arXiv | Music | |
MusicLM | Generating Music From Text. | arXiv | Music | |
Riffusion App | Riffusion is an app for real-time music generation with stable diffusion. | Music | ||
Sonauto | Sonauto is an AI music editor that turns prompts, lyrics, or melodies into full songs in any style. | Music | ||
SoundRaw | AI music generator for creators. | Music | ||
Soundry AI | Generative AI tools including text-to-sound and infinite sample packs. | Music |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
DiffSinger | Singing Voice Synthesis via Shallow Diffusion Mechanism. | arXiv | Singing Voice | |
Retrieval-based-Voice-Conversion-WebUI | An easy-to-use SVC framework based on VITS. | Singing Voice | ||
so-vits-svc | SoftVC VITS Singing Voice Conversion. | Singing Voice | ||
VI-SVS | Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger. | Singing Voice |
Source | Description | Paper | Game Engine | Type |
---|---|---|---|---|
Applio | Ultimate voice cloning tool, meticulously optimized for unrivaled power, modularity, and user-friendly experience. | Speech | ||
Audyo | Text in. Audio out. | Speech | ||
Bark | Text-Prompted Generative Audio Model. | Speech | ||
Bert-VITS2 | VITS2 Backbone with multilingual bert. | Speech | ||
ChatTTS | ChatTTS is a generative speech model for daily dialogue. | Speech | ||
CLAPSpeech | Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training. | arXiv | Speech | |
CosyVoice | Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. | Speech | ||
DEX-TTS | Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability. | arXiv | Speech | |
EmotiVoice | A Multi-Voice and Prompt-Controlled TTS Engine. | Speech | ||
Fliki | Turn text into videos with AI voices. | Speech | ||
GLM-4-Voice | GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. | Speech | ||
Glow-TTS | A Generative Flow for Text-to-Speech via Monotonic Alignment Search. | arXiv | Speech | |
GPT-SoVITS | A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI. | Speech | ||
LOVO | LOVO is the go-to AI Voice Generator & Text to Speech platform for thousands of creators. | Speech | ||
MahaTTS | An Open-Source Large Speech Generation Model. | Speech | ||
Matcha-TTS | A fast TTS architecture with conditional flow matching. | arXiv | Speech | |
MeloTTS | High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | Speech | ||
MetaVoice-1B | AI for human-level speech intelligence. | Speech | ||
Narakeet | Easily Create Voiceovers Using Realistic Text to Speech. | Speech | ||
Mini-Omni | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | arXiv | Speech | |
One-Shot-Voice-Cloning | One Shot Voice Cloning base on Unet-TTS. | Speech | ||
OpenVoice | Instant voice cloning by MyShell. | Speech | ||
OverFlow | Putting flows on top of neural transducers for better TTS. | Speech | ||
RealtimeTTS | RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. | Speech | ||
SenseVoice | SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). | Speech | ||
SpeechGPT | Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. | arXiv | Speech | |
speech-to-text-gpt3-unity | This is the repo I use Whisper and ChatGPT API from OpenAI in Unity. | Unity | Speech | |
Stable Speech | Stability AI's Text-to-Speech model. | Speech | ||
StableTTS | Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | Speech | ||
StyleTTS 2 | Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | arXiv | Speech | |
tortoise.cpp | tortoise.cpp: GGML implementation of tortoise-tts. | Speech | ||
TorToiSe-TTS | A multi-voice TTS system trained with an emphasis on quality. | Speech | ||
TTS Generation WebUI | TTS Generation WebUI (Bark, MusicGen, Tortoise, RVC, Vocos, Demucs). | Speech | ||
VALL-E | Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. | arXiv | Speech | |
VALL-E X | Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling | arXiv | Speech | |
Vocode | Vocode is an open-source library for building voice-based LLM applications. | Speech | ||
Voicebox | Text-Guided Multilingual Universal Speech Generation at Scale. | arXiv | Speech | |
VoiceCraft | Zero-Shot Speech Editing and Text-to-Speech in the Wild. | Speech | ||
Whisper | Whisper is a general-purpose speech recognition model. | Speech | ||
WhisperSpeech | An Open Source text-to-speech system built by inverting Whisper. | Speech | ||
X-E-Speech | Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | Speech | ||
XTTS | XTTS is a library for advanced Text-to-Speech generation. | Speech | ||
YourTTS | Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | arXiv | Speech | |
ZMM-TTS | Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | arXiv | Speech |
Source | Description | Game Engine | Type |
---|---|---|---|
Ludo.ai | Assistant for game research and design. | Analytics |