Skip to content

The hybrid architecture is based on the idea that we could simply apply a GAN method (GANSpace) to another GAN model (GANSynth).

License

Notifications You must be signed in to change notification settings

SopiMlab/GANSpaceSynth

Repository files navigation

GANSpaceSynth

GANSpaceSynth is a hybrid architecture for audio synthesis with deep neural networks that applies the GANSpace method to the GANSynth model. Using dimensionality reduction (PCA), we organise the latent space of trained GANSynth models, obtaining controls for exploring the range of sounds that can be synthesised.

To perform GANSpace on GANSynth, we feed a large number of random latent vectors to the model and record the activations on an early convolutional layer of the network. We then use incremental PCA to compute the most significant directions in the activation space, as well as the global mean and standard deviation along each direction. When synthesizing, we give a coefficient for each direction, denoting how far to move along that direction (scaled by the standard deviation) starting from the global mean.

Examples

The diagram below shows spectrograms for sounds synthesized by a GANSpaceSynth model trained on recordings of "KET conversations" by Thomas Bjelkeborn and Koray Tahiroğlu. GANSpace was applied on the first convolutional layer conv0 with 4,194,304 random input samples. The coordinates indicate the position along the first two normalized GANSpace coordinates at which synthesis was performed.

Spectrograms

Below are links to the corresponding audio files and some descriptions of our interpretation of perceived audio characteristics.

(-1, -1)
muffled texture
(-0.5, -1)
 
(0, -1)
bright texture
(0.5, -1)
 
(1, -1)
airy texture
(-1, -0.5)
 
(-0.5, -0.5)
 
(0, -0.5)
 
(0.5, -0.5)
 
(1, -0.5)
 
(-1, 0)
quiet muffled texture
(-0.5, 0)
 
(0, 0)
quiet bright texture
(0.5, 0)
 
(1, 0)
quiet airy texture with whistling
(-1, 0.5)
 
(-0.5, 0.5)
 
(0, 0.5)
 
(0.5, 0.5)
 
(1, 0.5)
 
(-1, 1)
very quiet muffled texture
(-0.5, 1)
 
(0, 1)
very quiet bright texture
(0.5, 1)
 
(1, 1)
near silence with quiet whistling

Performance

Generation performance is more or less identical to GANSynth. Computing the linear combination of PCA coefficients adds some overhead, but it appears to be marginal compared to generation itself.

Using our test PC with

  • Intel Core i7-10170U CPU,
  • NVIDIA GTX 1080 Ti GPU (connected via Thunderbolt 3),

we are able to generate a batch of eight 4-second, 16000 Hz samples in

  • 1.2 seconds on CPU (generation rate 107 kHz),
  • 65 milliseconds on CPU (generation rate 2000 kHz).

Hallu

As an example application of GANSpaceSynth, we include the Hallu composition tool (hallu.pd). It allows composing pieces by defining paths through the latent space, interpolating between them, and stitching together audio samples synthesized at each point.

The below image shows how Hallu provides a controlled structure to interpolate between points in latent space.

Hallu

An example Hallu composition, created using a model trained by Oskar Koli on a dataset of ambient music, can be found here.

Organising the Latent Space using a Dimensionality Reduction

Another application of GANSpaceSynth is the AI-terity autonomous musical instrument (http://sopi.aalto.fi/alterity/) in which particular jump points in latent space are organised autonomously for real-time audio synthesis. The video linked here demonstrates the latent space exploration with GANSpaceSynth.

Installation

We have detailed tutorials for setting up GANSpaceSynth in a Conda environment. After finishing the tutorial, you can simply open the .pd files in this repository.

To run GANSpaceSynth, you will need:

Citation

@inproceedings{koray_tahiroglu_2021_5137902,
  author       = {Koray Tahiroğlu and
                  Miranda Kastemaa and
                  Oskar Koli},
  title        = {GANSpaceSynth: A Hybrid Generative Adversarial Network Architecture for Organising the Latent Space using a Dimensionality Reduction for Real-Time Audio Synthesis},
  booktitle    = {{Proceedings of the 2nd Joint Conference on AI 
                   Music Creativity}},
  year         = 2021,
  pages        = 10,
  publisher    = {AIMC},
  address      = {Online},
  month        = jul,
  venue        = {Online},
  doi          = {10.5281/zenodo.5137902},
  url          = {https://doi.org/10.5281/zenodo.5137902}
}

License

GANSpaceSynth, a hybrid GAN architecture using dimensionality reduction (PCA) to organise the latent space of trained GANSynth models, is provided by SOPI research group Aalto University, School of Arts, Design and Architecture 2021, under the GPL license GNU General Public License version 2 | Open Source Initiative We make use of the following open source projects:

  • pyext/flext (GPL v2)
  • Magenta (Apache 2.0)

Copyright (C) 2021 SOPI Research Group, Aalto University


This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. For further information please contact to Koray Tahiroglu; email: koray.tahiroglu@aalto.fi, mail: Aalto University School of Arts, Design and Architecture, Department of Media, room J101, Väre Otaniementie 14, 02150 Espoo, Finland


About

The hybrid architecture is based on the idea that we could simply apply a GAN method (GANSpace) to another GAN model (GANSynth).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages