CoverGAN

CoverGAN is a set of tools and machine learning models designed to generate good-looking album covers based on users' audio tracks and emotions. Resulting covers are generated in vector graphics format (SVG).

Available emotions:

Anger
Comfortable
Fear
Funny
Happy
Inspirational
Joy
Lonely
Nostalgic
Passionate
Quiet
Relaxed
Romantic
Sadness
Serious
Soulful
Surprise
Sweet
Wary

The service is available on http://81.3.154.178:5001/covergan.

Service functionality

Generation of music covers by analyzing music and emotions
Several GAN models
SVG format
Possibility of rasterization
Insertion of readable captions
A large number of different fonts
Insertion of different color filters
SVG editor
Convenient change of colors
Style transfer from provided image
Saving images in any resolution

Weights

The pretrained weights can be downloaded from here
These weights should be placed into ./weights folder

Training

See this README for training details.

Testing using Docker

In this service two types of generator are available:

The first one creates the covers with abstract lines
The second one draws closed forms.

It is also possible to use one of two algorithms for applying inscriptions to the cover:

The first algorithm uses the captioner model
The second is a deterministic algorithm which searches for a suitable location

The service uses pretrained weights. See this section.

Building

Specify PyTorch version to install in Dockerfile.
Build the image running docker_build_covergan_service.sh file

Running

Start the container running docker_run_covergan_service.sh file

Testing

Go to http://localhost:5001 in the browser and enjoy!

Local testing

Install dependencies

Install suitable PyTorch version: pip install torch torchvision torchaudio
Install DiffVG
Install dependencies from this file

Running

Run

python3 ./eval.py \
  --audio_file="test.mp3" \
  --emotions=joy,relaxed \
  --track_artist="Cool Band" \
  --track_name="New Song"

The resulting .svg covers by default will be saved to ./gen_samples folder.

Examples of generated covers

See this examples folder.

captions/: a network that predicts aesthetically matching colors and positions for the captions (artist and track names).
colorer/: a network that predicts palettes for music covers.
docs/: folder with instructions on how to start training or testing models.
examples/: folder with simple music tracks, their generated covers, and with examples of original and clean datasets.
fonts/: folder with downloaded from Google Fonts fonts.
outer/: the primary GAN that generates vector graphics descriptions from audio files and user-specified emotions.
utils/: parts of code implementing various independent functionality and separated for convenient reuse.
weights/: folder where the best models were saved.
captioner_train.py: an entry point to trigger the Captioner network training.
covergan_train.py: an entry point to trigger the CoverGAN training.
eval.py: an entry point to trigger the primary flow as a command line tool.
service.py: the primary code flow for album cover generation.

Default structure of dataset folder:

audio/: default folder with music tracks (.flac or .mp3 format) for CoverGAN training.
checkpoint/: default folder where checkpoints and other intermediate files while training CoverGAN and Captioner Networks will be stored.
clean_covers/: default folder with covers on which captures were removed.
original_covers/: default folder with original covers.
plots/: the folder where the intermediate plots while training will be saved
emotions.json: file with emotion markup for train dataset.

Dependencies

The machine learning models rely on the popular PyTorch framework.
Differentiable vector graphics rendering is provided by diffvg, which needs to be built from source.
Audio feature extraction is based on Essentia, prebuilt pip packages are available.
Other Python library dependencies include Pillow, Matplotlib, SciPy, and Kornia.

Dataset

The full dataset contains of:

Audio tracks
Original covers
Cleaned covers
Fonts
Marked up emotions
Marked up rectangles for captioner model training

The dataset can be downloaded from here

Training using Docker with GPU

Build image running docker_build.sh
See these docs for more details about specified options while training networks.
Specify training command in covergan_training_command.sh
Start container running docker_run.sh

License

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoverGAN

Service functionality

Weights

Training

Testing using Docker

Building

Running

Testing

Local testing

Install dependencies

Running

Examples of generated covers

Contents

Default structure of dataset folder:

Dependencies

Dataset

Training using Docker with GPU

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
captions		captions
colorer		colorer
docs		docs
examples		examples
outer		outer
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
captioner_train.py		captioner_train.py
colorer_train.py		colorer_train.py
covergan_train.py		covergan_train.py
covergan_training_command.sh		covergan_training_command.sh
docker_build.sh		docker_build.sh
docker_run.sh		docker_run.sh
eval.py		eval.py
requirements.txt		requirements.txt
run_eval.py		run_eval.py
service.py		service.py
service_utils.py		service_utils.py

License

IzhanVarsky/CoverGAN

Folders and files

Latest commit

History

Repository files navigation

CoverGAN

Service functionality

Weights

Training

Testing using Docker

Building

Running

Testing

Local testing

Install dependencies

Running

Examples of generated covers

Contents

Default structure of dataset folder:

Dependencies

Dataset

Training using Docker with GPU

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages