Music generation merges artistic expression with complex musical theory, making it a captivating endeavor. While many methods exist, most are confined by set genres, rules, and templates, thus stifling compositional diversity. Our project aims to break these molds by creating a system that turns varied text inputs, from simple keywords to detailed phrases, into musical compositions. We're using the advanced Stable Diffusion XL model, a diffusion model known for converting text into images, to generate detailed spectrogram images. This model will be fine-tuned with the SOTA LAION Audio 630k dataset, encompassing a vast spectrum of audio-text pairs. Our technique synthesizes spectrogram images from text prompts using the honed SDXL and then prepares them for audio playback. To ensure quality and originality in the generated music, we will employ cyanite.ai for music tagging and similarity analysis, and the Audio Quality Platform for detailed auditory evaluations. This isn't just a music generation exploration; it's a journey into a space where text prompts fuel unique musical creations.
-
Notifications
You must be signed in to change notification settings - Fork 0
DavesEmployee/RiffusionXL
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Riffusion using Stable Diffusion XL
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published