IMPORTANT NOTE ON SYSTEM REQUIREMENTS:

Old text: [If you are connecting to a hosted runtime, make sure it has a P100 GPU (optionally run !nvidia-smi to confirm). Go to Edit>Notebook Settings to set this.

CoLab may first assign you a lower memory machine if you are using a hosted runtime.  If so, the first time you try to load the 5B model, it will run out of memory, and then you'll be prompted to restart with more memory (then return to the top of this CoLab).  If you continue to have memory issues after this (or run into issues on your own home setup), switch to the 1B model.

If you are using a local GPU, we recommend V100 or P100 with 16GB GPU memory for best performance. For GPU’s with less memory, we recommend using the 1B model and a smaller batch size throughout.]  

Edit: If you're a free Colab user, make sure you're assigned a T4 or P100 GPU. K80 will not work.



In [None]:
#@title Connect!
!nvidia-smi -L

Mount Google Drive to save sample levels as they are generated.

In [None]:
#@title Connect Google Drive
from google.colab import drive
drive.mount('/content/gdrive')

Prepare the environment.

In [None]:
#@title Prepare the enviroment
!pip install git+https://github.com/craftmine1000/jukebox-opt.git
###### autosave start
import os
from glob import glob

filex = "/usr/local/lib/python3.7/dist-packages/jukebox/sample.py"
fin = open(filex, "rt")
data = fin.read()
fin.close()

newtext = '''import fire

from termcolor import colored
from datetime import datetime'''
data = data.replace('import fire',newtext)

newtext = '''starts = get_starts(total_length, prior.n_ctx, hop_length)
        counterr = 0
        for start in starts:'''
data = data.replace('for start in get_starts(total_length, prior.n_ctx, hop_length):',newtext)

newtext = '''counterr += 1
            datea = datetime.now()		
            zs = sample_single_window(zs, labels, sampling_kwargs, level, prior, start, hps)			
            dateb = datetime.now()
            timex = ((dateb-datea).total_seconds()/60.0)*(len(starts)-counterr)
            print(f"Step " + colored(counterr,'blue') + "/" + colored( len(starts),'red') + " ~ estimated remaining minutes: " + (colored('???','yellow'), colored(timex,'magenta'))[counterr >1])
            x = prior.decode(zs[level:], start_level=level, bs_chunks=zs[level].shape[0])
            logdir = f"{hps.name}/level_{level}"
            if not os.path.exists(logdir):
                os.makedirs(logdir)
            t.save(dict(zs=zs, labels=labels, sampling_kwargs=sampling_kwargs, x=x), f"{logdir}/data.pth.tar")
            save_wav(logdir, x, hps.sr)'''
data = data.replace('zs = sample_single_window(zs, labels, sampling_kwargs, level, prior, start, hps)',newtext)

fin = open(filex, "wt")
fin.write(data)
fin.close()
###### autosave end
import jukebox
import torch as t
import librosa
import os
from IPython.display import Audio
from jukebox.make_models import make_vqvae, make_prior, MODELS, make_model
from jukebox.hparams import Hyperparams, setup_hparams
from jukebox.sample import sample_single_window, _sample, \
                           sample_partial_window, upsample, \
                           load_prompts
from jukebox.utils.dist_utils import setup_dist_from_mpi
from jukebox.utils.torch_utils import empty_cache
rank, local_rank, device = setup_dist_from_mpi()

Customize the cell below if needed.
The current settings will generate one sample at 32000 Hz 

In [None]:
model = '5b_lyrics' # or '5b' or '1b_lyrics'
hps = Hyperparams()
hps.sr = 32000
hps.n_samples = 1 if model in ('5b', '5b_lyrics') else 8
# We set this to the Google Drive mount point.
hps.name = '/content/gdrive/My Drive/samplestest444'# Don't Edit this cell!
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
max_batch_size = 2 if model in ('5b', '5b_lyrics') else 16
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]


In [None]:
#@title Do some more things
#@markdown **It'll prime off of the primer.wav file in your google drive**
vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)
# Prime song creation using an arbitrary audio sample.
mode = 'primed'
codes_file=None
# Specify an audio file here.
audio_file = '/content/gdrive/My Drive/primer.wav'
# Specify how many seconds of audio to prime on.
prompt_length_in_seconds=6
sample_hps = Hyperparams(dict(mode=mode, codes_file=codes_file, audio_file=audio_file, prompt_length_in_seconds=prompt_length_in_seconds))

Specify your choice of artist, genre, lyrics, and length of musical sample. 

IMPORTANT: The sample length is crucial for how long your sample takes to generate. Generating a shorter sample takes less time. You are limited to 12 hours on the Google Colab free tier. A 50 second sample should be short enough to fully generate after 12 hours of processing. 

In [None]:
#@title Set the sample seconds
sample_length_in_seconds = 36 #@param {type:"slider", min:36, max:120, step:1}          
                                       # Full length of musical sample to generate - we find songs in the 1 to 4 minute
                                       # range work well, with generation time proportional to sample length.  
                                       # This total length affects how quickly the model 
                                       # progresses through lyrics (model also generates differently
                                       # depending on if it thinks it's in the beginning, middle, or end of sample)
hps.sample_length = (int(sample_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
assert hps.sample_length >= top_prior.n_ctx*top_prior.raw_to_tokens, f'Please choose a larger sampling rate'

Edit the metadata in the next cell, 
[Use this link](https://github.com/openai/jukebox/tree/master/jukebox/data/ids) to view all the artists and genres. (This notebook uses v2)

In [None]:
# Note: Metas can contain different prompts per sample.
# By default, all samples use the same prompt.
metas = [dict(artist = "Unknown",
            genre = "Unknown",
            total_length = hps.sample_length,
            offset = 0,
            lyrics = """Tonight we dance around the flame
Then we get to play the spirit game
Spirit names we shout out loud
Shake the thunder from the spirit cloud
""",
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

In [None]:
#@title Other Things
sampling_temperature = .98

lower_batch_size = 16
max_batch_size = 3 if model in ('5b', '5b_lyrics') else 16
lower_level_chunk_size = 32
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
                        chunk_size=lower_level_chunk_size),
                    dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
                         chunk_size=lower_level_chunk_size),
                    dict(temp=sampling_temperature, fp16=True, 
                         max_batch_size=max_batch_size, chunk_size=chunk_size)]

Level 2 is the completely raw audio. generate this one first.

In [None]:
#@title Generate Raw Audio (Level 2)
if sample_hps.mode == 'ancestral':
  zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cpu') for _ in range(len(priors))]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'upsample':
  assert sample_hps.codes_file is not None
  # Load codes.
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cuda() for z in data['zs']]
  assert zs[-1].shape[0] == hps.n_samples, f"Expected bs = {hps.n_samples}, got {zs[-1].shape[0]}"
  del data
  print('Falling through to the upsample step later in the notebook.')
elif sample_hps.mode == 'primed':
  assert sample_hps.audio_file is not None
  audio_files = sample_hps.audio_file.split(',')
  duration = (int(sample_hps.prompt_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
  x = load_prompts(audio_files, duration, hps)
  zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0])
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
else:
  raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

Please note: this next upsampling step will take several hours.  At the free tier, Google CoLab lets you run for 12 hours.  As the upsampling is completed, samples will appear in the Files tab (you can access this at the left of the CoLab), under "samples" (or whatever hps.name is currently).  Level 1 is the partially upsampled version, and then Level 0 is fully completed.

In [None]:
#@title Upsample Audio (Level 1 to 0)
# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization during the upsampling stage). For a hosted runtime, 
# we'll need to go ahead and delete the top_prior if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None
upsamplers = [make_prior(setup_hparams(prior, dict()), vqvae, 'cpu') for prior in priors[:-1]]
labels[:2] = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in upsamplers]

zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

Listen to your final samples!

In [None]:
#@title View Final Sample
del upsamplers
empty_cache()
Audio(f'{hps.name}/level_0/item_0.wav')
