# GAN Zeppelin
### This project uses a Generative Adversarial Network to generate new five second chunks of Led Zeppelin songs.
The Discriminator was trained on Led Zeppelin's entire discography split into five second chunks. Unfortunately, I cannot include the data here, but this can be done if you have the music already or with any other collection of songs.

Like many GANs, this project is meant to be fun and experimental. Most GANs that are able to make faithful recreations of what they are trying to generate do so on datasets with a large amount of samples and a relatively low complexity (tens of thousand of samples of 96x96 images). This dataset includes much more complicated data (One second of high-res sound data is a one dimensional array of 22,050 values, each of which is a very long float). To maximize effectiveness, the songs were split into smaller chunks, increasing the number of samples and decreasing the data complexity. This still falls very short of what would likely result in high quality generations, but it should be an insightful experiment and the code can be used to create GANs for other sound data projects.

Some resources I used to help create this project were as follows:
Keras implementation example: https://www.tensorflow.org/tutorials/generative/dcgan
jeffheaton GAN tutorial on GitHub: https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_07_2_Keras_gan.ipynb

In [1]:
#Imports
import tensorflow as tf
from tensorflow.keras.layers import Reshape, Dropout, Dense, Flatten, BatchNormalization, Activation
from tensorflow.keras.layers import LeakyReLU, Conv1D
from tensorflow.keras.models import Sequential
import numpy as np
import os 
import time
import librosa
import csv

In [3]:
#This try-catch block checks if the processed sound data exists
#If not, it finds a folder full of .wav files of Led Zeppelin songs, unpacks the wavs into arrays, and splits the songs into chunks

try:  
    dest_file = 'Zeppelin_Chunks.csv'
    with open(dest_file, newline='') as dest_f:
        reader = csv.reader(dest_f)
        data = [data for data in reader]
    X_train = np.asarray(data, dtype = float)

except:
    print("F")
    X_train_dir = 'C:/Users/Andy/Led_Zeppelin_WAV/'
    X_train = []
    
    clip_length = 22050*5
    
    def chunks(lst, n, dest):
        """Yield successive n-sized chunks from lst."""
        for i in range(0, len(lst), n):
            dest.append(list(lst[i:i + n]))     
     
    for song in os.listdir(X_train_dir):
        songname = f'{X_train_dir}{song}'
        sample, sr = librosa.load(songname, mono=True, duration=360)
        chunks(sample,clip_length,X_train)
    
    x = 0
    for i in X_train:
        num_zeros = i.count(0)
        pzeros = num_zeros/clip_length
        if len(i) != clip_length or pzeros >= 0.25:
            del X_train[x]
        x += 1
        
    X_train = np.array(X_train)
    np.savetxt('Zeppelin_Chunks.csv', X_train, delimiter=',', fmt='%d')

np.random.shuffle(X_train)


In [4]:
SEED_SIZE = 100 #This will affect how the generator behaves

DATA_PATH = 'C:/Users/Andy/Documents/'
EPOCHS = 100
BATCH_SIZE = 64

In [5]:
#Simple function to print time during training
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

In [6]:
#Define Generator

def build_generator(seed_size): #Accepts the seed as an input
    model = Sequential()

    model.add(Dense(2*2*128,activation="relu",input_dim=seed_size))
    model.add(Reshape((1,512)))

    model.add(Conv1D(256,kernel_size = 3, padding = 'same')) #Conv1D is used beause sound data is a 1D array
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU())
    
    model.add(Conv1D(128,kernel_size = 3, padding = 'same'))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU())
    
    model.add(Flatten())
    model.add(Dense(X_train[0].shape[0]))
    model.add(Activation("tanh"))

    return model

In [7]:
#Define Discriminator

def build_discriminator(clip_length): #Accepts actual chunks of Zeppelin Songs and generated samples
    model = Sequential()
    
    model.add(Dense(32, activation='relu', input_dim=((clip_length))))
    model.add(Reshape((1,32)))

    model.add(Dropout(0.25))
    model.add(Conv1D(64, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    return model

In [8]:
#Build generator and check the summary to see if it looks right
generator = build_generator(SEED_SIZE)

generator.summary()

noise = tf.random.normal([1, SEED_SIZE])

generated_clip = generator(noise, training=False)

clip_length = X_train[0].shape[0]

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 512)               51712     
_________________________________________________________________
reshape (Reshape)            (None, 1, 512)            0         
_________________________________________________________________
conv1d (Conv1D)              (None, 1, 256)            393472    
_________________________________________________________________
batch_normalization (BatchNo (None, 1, 256)            1024      
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 1, 256)            0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 1, 128)            98432     
_________________________________________________________________
batch_normalization_1 (Batch (None, 1, 128)            5

In [9]:
#Build Discriminator
discriminator = build_discriminator(clip_length)

discriminator.summary()

decision = discriminator(generated_clip)

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 32)                3528032   
_________________________________________________________________
reshape_1 (Reshape)          (None, 1, 32)             0         
_________________________________________________________________
dropout (Dropout)            (None, 1, 32)             0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1, 64)             6208      
_________________________________________________________________
batch_normalization_2 (Batch (None, 1, 64)             256       
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 1, 64)             0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 1, 64)            

In [10]:
#The loss functions and optimizers for the generator and discriminator must be defined seperately
def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-5)

In [11]:
#Here are the training and train step functions taken from the TensorFlow tutorials website. Learn more at the link provided.
@tf.function
def train_step(clips):
  seed = tf.random.normal([BATCH_SIZE, SEED_SIZE])

  with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
    generated_images = generator(seed, training=True)

    real_output = discriminator(tf.reshape(clips,(1,clip_length)), training=True)
    fake_output = discriminator(generated_images, training=True)

    gen_loss = generator_loss(fake_output)
    disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
  return gen_loss,disc_loss

def train(dataset, epochs):
  start = time.time()

  for epoch in range(epochs):
    epoch_start = time.time()

    gen_loss_list = []
    disc_loss_list = []

    for image_batch in dataset:
      t = train_step(image_batch)
      gen_loss_list.append(t[0])
      disc_loss_list.append(t[1])

    g_loss = sum(gen_loss_list) / len(gen_loss_list)
    d_loss = sum(disc_loss_list) / len(disc_loss_list)

    epoch_elapsed = time.time()-epoch_start
    epoch_elapsed = hms_string(epoch_elapsed)
    print (f'Epoch {epoch+1}, gen loss={g_loss},disc loss={d_loss},'\
           f'{epoch_elapsed}')

  elapsed = time.time()-start
  print (f'Training time: {hms_string(elapsed)}')

In [None]:
#Now it is time to train the model, save it and create some samples
train(X_train, EPOCHS)

generator.save(os.path.join(DATA_PATH,"lz_generator.h5"))

example_output = generator.predict(tf.random.normal([1, SEED_SIZE]))
example_output = example_output.reshape(example_output.shape[1],1)
srate = 22050
librosa.output.write_wav(os.path.join(DATA_PATH,'lz_gen.wav'), example_output, srate)

I have included some results in the GitHub repository. They are very staticky, but if you turn up your speakers, you can hear guitar notes and some ethereal vocal "whoah" sounds. To make a successful GAN on sound data, one would need far more samples and more similar samples, but this code can be used to build other GAN's or classifiers on sound data.