# **GANSynth**

GANSynth is a state-of-the-art method for synthesizing high-fidelity and locally coherent audio using Generative Adversarial Networks (GANs). Hence the name GANSynth (GAN used for audio Synthesis).

Autoregressive models like WaveNets generate audio sequentially. On the contrary, GANSynth creates the whole sequence in parallel, synthesizing audio much faster on GPU runtime than real-time synthesis. It generates the entire audio clip from a single latent vector, allowing for easier release of global features like pitch and timbre (tone quality). It uses progressive GAN architecture. It eliminates the drawback of traditional GANs which struggle to synthesize locally coherent audio waveforms though they use global latent conditioning and efficient parallel sampling.

Are you interested in understanding the detailed workings of GANSynth? Refer to this page before proceeding!

https://analyticsindiamag.com/hands-on-guide-to-gansynth-an-adversarial-neural-audio-synthesis-technique/

## **Practical Implementation of GANSynth**

Here’s a demonstration of how GANSynth learns to produce musical notes of individual instruments as contained in the NSynth dataset (a large-sized qualitative dataset having annotated notes). The GAN learns to use its latent space for representing various instrument timbres. It synthesizes audio from MIDI files and interpolates between different instruments.

Step-wise explanation of the code is as follows:

Install Magenta (an open-source Python library, powered by Tensorflow)

In [None]:

# Install Magenta
print('Copying data from GCS...')
!rm -r /content/gansynth &>/dev/null
!mkdir /content/gansynth
!mkdir /content/gansynth/midi
!mkdir /content/gansynth/samples

# Get default MIDI (Bach Prelude)
#-o option provided with the curl command saves the downloaded file on your local machine with the name specified as the parameter.
!curl -o /content/gansynth/midi/bach.mid http://www.jsbach.net/midi/cs1-1pre.mid
MIDI_SONG_DEFAULT = '/content/gansynth/midi/bach.mid'
!curl -o /content/gansynth/midi/riff-default.mid http://storage.googleapis.com/magentadata/papers/gansynth/midi/arp.mid
MIDI_RIFF_DEFAULT = '/content/gansynth/midi/riff-default.mid'

!pip install -q -U magenta

Import required libraries and classes 

In [None]:

import os
import librosa
from magenta.models.nsynth.utils import load_audio
from magenta.models.gansynth.lib import flags as lib_flags
from magenta.models.gansynth.lib import generate_util as gu
from magenta.models.gansynth.lib import model as lib_model
from magenta.models.gansynth.lib import util
import matplotlib.pyplot as plt
import note_seq
from note_seq.notebook_utils import colab_play as play
import numpy as np
import tensorflow.compat.v1 as tf

tf.disable_v2_behavior()

Define a function for uploading .wav file

In [None]:
# File IO
#download = files.download

def upload():
  '''Upload a .wav file.'''
  filemap = files.upload()
  file_list = []
  for key, value in filemap.iteritems():
    fname = os.path.join('/content/gansynth/midi', key)
    with open(fname, 'w') as f:
      f.write(value)
      print('Writing {}'.format(fname))
    file_list.append(fname)
  return file_list

Define global variables

In [None]:
# GLOBALS
CKPT_DIR = 'gs://magentadata/models/gansynth/acoustic_only'
output_dir = '/content/gansynth/samples'
BATCH_SIZE = 16
SR = 16000

Create an output directory if it does not exist

In [None]:
# Make an output directory if it doesn't exist
OUTPUT_DIR = util.expand_path(output_dir)
if not tf.gfile.Exists(OUTPUT_DIR):
  tf.gfile.MakeDirs(OUTPUT_DIR)

Load the model

In [None]:
# Load the model
tf.reset_default_graph()
flags = lib_flags.Flags({
    'batch_size_schedule': [BATCH_SIZE],
    'tfds_data_dir': "gs://tfds-data/datasets",
})
model = lib_model.Model.load_from_path(CKPT_DIR, flags)

Define a function for loading MIDI file as a notesequence

In [None]:

# Helper functions
def load_midi(midi_path, min_pitch=36, max_pitch=84):
  """Load midi as a notesequence."""
  midi_path = util.expand_path(midi_path)
  ns = note_seq.midi_file_to_sequence_proto(midi_path)
  pitches = np.array([n.pitch for n in ns.notes])
  velocities = np.array([n.velocity for n in ns.notes])
  start_times = np.array([n.start_time for n in ns.notes])
  end_times = np.array([n.end_time for n in ns.notes])
  valid = np.logical_and(pitches >= min_pitch, pitches <= max_pitch)
  notes = {'pitches': pitches[valid],
           'velocities': velocities[valid],
           'start_times': start_times[valid],
           'end_times': end_times[valid]}
  return ns, notes

Create an attack, sustain and release amplitude envelope (these are the stages of envelope generator)

‘Attack’ is part of the envelope which represents time taken by the amplitude to reach its peak.’Sustain’ is the duration for which sound is held before it fades out.’Release’ is the final reduction in amplitude over time.

In [None]:
def get_envelope(t_note_length, t_attack=0.010, t_release=0.3, sr=16000):
  """Create an attack sustain release amplitude envelope."""
  t_note_length = min(t_note_length, 3.0)
  i_attack = int(sr * t_attack)
  i_sustain = int(sr * t_note_length)
  i_release = int(sr * t_release)
  i_tot = i_sustain + i_release  # attack envelope doesn't add to sound length
  envelope = np.ones(i_tot)
  # Linear attack
  envelope[:i_attack] = np.linspace(0.0, 1.0, i_attack)
  # Linear release
  envelope[i_sustain:i_tot] = np.linspace(1.0, 0.0, i_release)
  return envelope

Define a function to combine multiple notes from a single audio clip.

In [None]:
def combine_notes(audio_notes, start_times, end_times, velocities, sr=16000):
  """Combine audio from multiple notes into a single audio clip.

  Args:
    audio_notes: Array of audio [n_notes, audio_samples].
    start_times: Array of note starts in seconds [n_notes].
    end_times: Array of note ends in seconds [n_notes].
    sr: Integer, sample rate.

  Returns:
    audio_clip: Array of combined audio clip [audio_samples]
  """
  n_notes = len(audio_notes)
  clip_length = end_times.max() + 3.0
  audio_clip = np.zeros(int(clip_length) * sr)

  for t_start, t_end, vel, i in zip(start_times, end_times, velocities, range(n_notes)):
    # Generate an amplitude envelope
    t_note_length = t_end - t_start
    envelope = get_envelope(t_note_length)
    length = len(envelope)
    audio_note = audio_notes[i, :length] * envelope
    # Normalize
    audio_note /= audio_note.max()
    audio_note *= (vel / 127.0)
    # Add to clip buffer
    clip_start = int(t_start * sr)
    clip_end = clip_start + length
    audio_clip[clip_start:clip_end] += audio_note

  # Normalize
  audio_clip /= audio_clip.max()
  audio_clip /= 2.0
  return audio_clip

Define a function to plot spectrogram

In [None]:
# Plotting tools
def specplot(audio_clip):
  p_min = np.min(36)
  p_max = np.max(84)
  f_min = librosa.midi_to_hz(p_min)
  f_max = 2 * librosa.midi_to_hz(p_max)
  octaves = int(np.ceil(np.log2(f_max) - np.log2(f_min)))
  bins_per_octave = 36
  n_bins = int(bins_per_octave * octaves)
  C = librosa.cqt(audio_clip, sr=SR, hop_length=2048, fmin=f_min, n_bins=n_bins, bins_per_octave=bins_per_octave)
  power = 10 * np.log10(np.abs(C)**2 + 1e-6)
  plt.matshow(power[::-1, 2:-2], aspect='auto', cmap=plt.cm.magma)
  plt.yticks([])
  plt.xticks([])

print('And...... Done!')

Choose the Interpolation

These cells allow you to choose two latent vectors and interpolate between them over a MIDI clip.

Choose the MIDI file

This will allow you to choose the default uploaded MIDI file or upload a file of your choice as follows:

In [None]:
midi_file = "Arpeggio (Default)"

midi_path = MIDI_RIFF_DEFAULT
if midi_file == "Upload your own":
  try:
    file_list = upload()
    midi_path = file_list[0]
    ns, notes_2 = load_midi(midi_path)
  except Exception as e:
    print('Upload Cancelled')
else:
  # Load Default, but slow it down 30%
  ns, notes_2 = load_midi(midi_path)
  notes_2['start_times'] *= 1.3
  notes_2['end_times'] *= 1.3


print('Loaded {}'.format(midi_path))
note_seq.plot_sequence(ns)

Choose some random instruments to generate custom interpolation. 

Audio ‘interpolation’ means making the audio sound better.

In [None]:
number_of_random_instruments = 10
pitch_preview = 60
n_preview = number_of_random_instruments

pitches_preview = [pitch_preview] * n_preview
z_preview = model.generate_z(n_preview)

audio_notes = model.generate_samples_from_z(z_preview, pitches_preview)
for i, audio_note in enumerate(audio_notes):
  print("Instrument: {}".format(i))
  play(audio_note, sample_rate=16000)


Create a list of instruments to interpolate between

In [None]:
instruments = [0, 2, 4, 0]

Place each instrument at a specific point of time (from 0 to 1.0)

In [None]:
times = [0, 0.3, 0.6, 1.0]

Start and end times of synthesized audio

In [None]:
# Force endpoints
times[0] = -0.001
times[-1] = 1.0

Latent vectors of selected instruments

In [None]:
z_instruments = np.array([z_preview[i] for i in instruments])

End times for selected instruments

In [None]:
t_instruments = np.array([notes_2['end_times'][-1] * t for t in times])

 Get interpolated latent vectors for each note

In [None]:
z_notes = gu.get_z_notes(notes_2['start_times'], z_instruments, t_instruments)


Generate audio for each note

In [None]:
# Generate audio for each note
print('Generating {} samples...'.format(len(z_notes)))
audio_notes = model.generate_samples_from_z(z_notes, notes_2['pitches'])

Combine the audio samples of all instruments into a single audio clip

In [None]:
# Make a single audio clip
audio_clip = combine_notes(audio_notes,
                           notes_2['start_times'],
                           notes_2['end_times'],
                           notes_2['velocities'])

Play the synthesized audio

In [None]:
# Play the audio
print('\nAudio:')
play(audio_clip, sample_rate=SR)

Plot the spectrogram using spectrogram()

In [None]:
print('CQT Spectrogram:')
specplot(audio_clip) 

# **Related Articles:**

> * [GANSynth](https://analyticsindiamag.com/hands-on-guide-to-gansynth-an-adversarial-neural-audio-synthesis-technique/)

> * [Audio Visualizaton](https://analyticsindiamag.com/step-by-step-guide-to-audio-visualization-in-python/)

> * [VGG Sound Datasets](https://analyticsindiamag.com/guide-to-vgg-sound-datasets-for-visual-audio-recognition/)

> * [Voxceleb Datasets](https://analyticsindiamag.com/guide-to-voxceleb-datasets-for-visual-audio-of-human-speech/)

> * [FreeSound Datasets](https://analyticsindiamag.com/datasets-freesound-pytorch-research/)