HOW TO RUN:

First, you'll need a Gmail account because the file management uses Google Drive. You'll also need a Google Colab Pro account ($10/month,) which gives you access to Google's high-RAM machines in the cloud. You can get to Colab Pro from the Google account settings icon.

Once you have a Colab Pro account, you'll need to set your preference for high-RAM machines. Go to the Runtime menu, select "Change runtime type," then selct "High-RAM" from the "Runtime shape" menu.

Run all the code-block cells top-to-bottom (except as noted.)

Some cells will execute immediately. Others will take a few minutes, and some will take hours. The whole process will take about a day, but it doesn't require continuous attention.

If you get a memory error or other crash, just restart--Runtime menu/Factory reset runtime, then click the arrows again from the top. Hopefully you'll be assigned a better machine.



In [None]:
!nvidia-smi -L

Mount Google Drive. You'll be asked to enter an autorization code.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Prepare the environment.

In [None]:
!pip install git+https://github.com/openai/jukebox.git

In [None]:
import jukebox
import torch as t
import librosa
import os
from IPython.display import Audio
from jukebox.make_models import make_vqvae, make_prior, MODELS, make_model
from jukebox.hparams import Hyperparams, setup_hparams
from jukebox.sample import sample_single_window, _sample, \
                           sample_partial_window, upsample, \
                           load_prompts
from jukebox.utils.dist_utils import setup_dist_from_mpi
from jukebox.utils.torch_utils import empty_cache
rank, local_rank, device = setup_dist_from_mpi()

The line ```hps.name = '/content/gdrive/My Drive/samples'``` below determines where the algorithm will save the final rendered audio files. If you use the default setting, a folder for the renders called "samples" will be created at the root level of your Google Drive. If you want to harvest the final files from a different folder on your Drive then you can alter the file path.

If you are going to do multiple runs of this notebook, then you'll need to specify separate output folders for each run. Otherwise, the files from the runs will get intermingled.


In [None]:
model = '5b_lyrics' # or '5b' or '1b_lyrics'
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model in ('5b', '5b_lyrics') else 8
# Specifies the directory to save the sample in.
# We set this to the Google Drive mount point.
hps.name = '/content/gdrive/My Drive/samples'
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
max_batch_size = 3 if model in ('5b', '5b_lyrics') else 16
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]

vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

The ```audio_file = '/content/gdrive/My Drive/primer.wav'``` line determines where the algorithm looks to find the input loop. If you want to use the default setting, name your input loop ```primer.wav``` and put it in the root level of your Google Drive. Otherwise, you can use a custom directory or file name by changing the file path.

You'll need to enter the exact value of the input loop's length to replace ```12```  where it says ```prompt_length_in_seconds=12```, to as many decimal places as possible. This will ensure that any loops the algorithm makes are in sync with each other.

In [None]:
# Prime song creation using an arbitrary audio sample.
mode = 'primed'
codes_file=None
# Specify an audio file here.
audio_file = '/content/gdrive/My Drive/primer.wav'
# Specify how many seconds of audio to prime on.
prompt_length_in_seconds=12

<font color="red">Only run the cell below in the event of a memory error or other crash.</font> This will restore the process from where you left off.

In [None]:
if os.path.exists(hps.name):
  # Identify the lowest level generated and continue from there.
  for level in [1, 2]:
    data = f"{hps.name}/level_{level}/data.pth.tar"
    if os.path.isfile(data):
      mode = 'upsample'
      codes_file = data
      print('Upsampling from level '+str(level))
      break
print('mode is now '+mode)

Set hyperparameters.

In [None]:
sample_hps = Hyperparams(dict(mode=mode, codes_file=codes_file, audio_file=audio_file, prompt_length_in_seconds=prompt_length_in_seconds))

Replace the ```50```  in ```sample_length_in_seconds = 50``` with the exact desired length of your final renders, to as many decimal places as possible. This number should be an integer multiple of the input loop's length in order to generate complete loops, and less than about 90 seconds to make sure the process finishes in a day. 




In [None]:
sample_length_in_seconds = 50          # Full length of musical sample to generate - we find songs in the 1 to 4 minute
                                       # range work well, with generation time proportional to sample length.  
                                       # This total length affects how quickly the model 
                                       # progresses through lyrics (model also generates differently
                                       # depending on if it thinks it's in the beginning, middle, or end of sample)
hps.sample_length = (int(sample_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
assert hps.sample_length >= top_prior.n_ctx*top_prior.raw_to_tokens, f'Please choose a larger sampling rate'

The cell below controls the artist model, the genre model, and lyrics. Replace  ```Rick Astley``` with an <a href="https://github.com/openai/jukebox/blob/master/jukebox/data/ids/v2_artist_ids.txt" windown="_blank">artist</a> whose style you want the algorithm to model, and replace ```Pop```  with the <a href="https://github.com/openai/jukebox/blob/master/jukebox/data/ids/v2_genre_ids.txt" windown="_blank">genre</a> you want the algorithm to model. Substitute the quoted text after ```lyrics =``` with your lyrics.

In [None]:
# Note: Metas can contain different prompts per sample.
# By default, all samples use the same prompt.
metas = [dict(artist = "Rick Astley",
            genre = "Pop",
            total_length = hps.sample_length,
            offset = 0,
            lyrics = """We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy

I just wanna tell you how I'm feeling
Gotta make you understand

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
""",
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

Optionally adjust the ```sampling temperature``` (best if kept within 0.97-1.0.) 

In [None]:
sampling_temperature = .98

lower_batch_size = 16
max_batch_size = 3 if model in ('5b', '5b_lyrics') else 16
lower_level_chunk_size = 32
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
                        chunk_size=lower_level_chunk_size),
                    dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
                         chunk_size=lower_level_chunk_size),
                    dict(temp=sampling_temperature, fp16=True, 
                         max_batch_size=max_batch_size, chunk_size=chunk_size)]

Now we're ready to sample from the model. We'll generate the top level (2) first, followed by the first upsampling (level 1), and the second upsampling (0). After each level, we decode to raw audio and save the audio files.   

This next cell will take a while (approximately 10 minutes per 20 seconds of music sample.)

In [None]:
if sample_hps.mode == 'ancestral':
  zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cuda') for _ in range(len(priors))]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'upsample':
  assert sample_hps.codes_file is not None
  # Load codes.
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cuda() for z in data['zs']]
  assert zs[-1].shape[0] == hps.n_samples, f"Expected bs = {hps.n_samples}, got {zs[-1].shape[0]}"
  del data
  print('Falling through to the upsample step later in the notebook.')
elif sample_hps.mode == 'primed':
  assert sample_hps.audio_file is not None
  audio_files = sample_hps.audio_file.split(',')
  duration = (int(sample_hps.prompt_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
  x = load_prompts(audio_files, duration, hps)
  zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0])
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
else:
  raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

We are now done with the large top_prior model, and instead load the upsamplers.

In [None]:
# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization during the upsampling stage). For a hosted runtime, 
# we'll need to go ahead and delete the top_prior if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None
upsamplers = [make_prior(setup_hparams(prior, dict()), vqvae, 'cpu') for prior in priors[:-1]]
labels[:2] = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in upsamplers]

Please note: this next upsampling step will take several hours. As the upsampling is completed, samples will appear in the Files tab (accessible via the folder icon at the left of the notebook,) in whatever output folder was set above. Level 1 is the partially upsampled version, and then Level 0 is fully completed version.

In [None]:
zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

Now you can harvest the final rendered files in Google Drive (or listen below.) There should be three .wav files in the ```level_0``` folder, all of which are variations on the input loop. Don't harvest the files from the  ```level_1``` or  ```level_2``` folders. They are just lower resolution versions of the  ```level_0``` files.

In [None]:
#Play render 1

Audio(f'{hps.name}/level_0/item_0.wav')

In [None]:
#Play render 2

Audio(f'{hps.name}/level_0/item_1.wav')

In [None]:
#Play render 3

Audio(f'{hps.name}/level_0/item_2.wav')

In [None]:
# Clean up cache

del upsamplers
empty_cache()