HOW TO RUN:

First, you'll need a Gmail account because the file management uses Google Drive. You'll also need a Google Colab Pro account ($10/month,) which gives you access to Google's high-RAM machines in the cloud. You can get to Colab Pro from the Google account settings icon.

Once you have a Colab Pro account, you'll need to set your preference for high-RAM machines. Go to the Runtime menu, select "Change runtime type," then selct "High-RAM" from the "Runtime shape" menu.

Run all the code-block cells top-to-bottom (except as noted.) A good way of telling if the cell has completed executing is that the browser icon will turn yellow. If it's still busy, it'll be grey.

Some cells will execute immediately. Others will take a few minutes, and some will take hours. The whole process will take about a day, but it doesn't require continuous attention.

If you get a memory error or other crash, just restart--Runtime menu/Factory reset runtime, then click the arrows again from the top. Hopefully you'll be assigned a better machine.



In [1]:
!nvidia-smi -L

GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-6c142ef6-cf38-80bf-c705-7bf79473b4ed)


Mount Google Drive. You'll be asked to enter an autorization code.

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


Prepare the environment.

In [3]:
!pip install git+https://github.com/openai/jukebox.git

Collecting git+https://github.com/openai/jukebox.git
  Cloning https://github.com/openai/jukebox.git to /tmp/pip-req-build-85zcwpg_
  Running command git clone -q https://github.com/openai/jukebox.git /tmp/pip-req-build-85zcwpg_
Collecting fire==0.1.3
  Downloading fire-0.1.3.tar.gz (33 kB)
Collecting tqdm==4.45.0
  Downloading tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
[K     |████████████████████████████████| 60 kB 9.2 MB/s 
Collecting unidecode==1.1.1
  Downloading Unidecode-1.1.1-py2.py3-none-any.whl (238 kB)
[K     |████████████████████████████████| 238 kB 19.2 MB/s 
[?25hCollecting numba==0.48.0
  Downloading numba-0.48.0-1-cp37-cp37m-manylinux2014_x86_64.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 64.0 MB/s 
[?25hCollecting librosa==0.7.2
  Downloading librosa-0.7.2.tar.gz (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 50.9 MB/s 
[?25hCollecting mpi4py>=3.0.0
  Downloading mpi4py-3.1.1.tar.gz (2.4 MB)
[K     |████████████████████████████████| 2

In [4]:
import jukebox
import torch as t
import librosa
import os
from IPython.display import Audio
from jukebox.make_models import make_vqvae, make_prior, MODELS, make_model
from jukebox.hparams import Hyperparams, setup_hparams
from jukebox.sample import sample_single_window, _sample, \
                           sample_partial_window, upsample, \
                           load_prompts
from jukebox.utils.dist_utils import setup_dist_from_mpi
from jukebox.utils.torch_utils import empty_cache
rank, local_rank, device = setup_dist_from_mpi()

Using cuda True


By default, a folder for the final rendered audio files called "samples" will be created at the root level of your Google Drive. If you want to harvest the final files from a different folder on your Drive then you can alter the **OUTPUT_PATH** field below. You can find the path of your target folder by clicking the folder icon on the left, navigating to the folder, and selecting "Copy path."

<b>Note:</b> If you are going to do multiple runs of this notebook, then you'll need to specify separate output folders for each run. Otherwise, the files from the runs will get intermingled.


In [5]:
OUTPUT_PATH = '/content/gdrive/MyDrive/******test' #@param {type: "string"}

model = '5b_lyrics' # or '5b' or '1b_lyrics'
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model in ('5b', '5b_lyrics') else 8
# Specifies the directory to save the sample in.
# We set this to the Google Drive mount point.
hps.name = OUTPUT_PATH
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
max_batch_size = 3 if model in ('5b', '5b_lyrics') else 16
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]

vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/vqvae.pth.tar https://openaipublic.azureedge.net/jukebox/models/5b/vqvae.pth.tar
Restored from /root/.cache/jukebox/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
0: Converting to fp16 params
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b_lyrics/prior_level_2.pth.tar https://openaipublic.azureedge.net/jukebox/models/5b_lyrics/prior_level_2.pth.tar
Restored from /root/.cache/jukebox/models/5b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode


By default, the input loop should be named "primer.wav" and placed in the root level of your Google Drive. Otherwise, you can use a custom directory or file name by changing the **INPUT_PATH** field below and copying the target path as described above.

Enter the exact value of the input loop's length (in seconds) in the **LOOP_LENGTH** field, to as many decimal places as possible. This will ensure that any loops the algorithm makes are in sync with each other.

In [6]:
INPUT_PATH = '/content/gdrive/My Drive/primer.wav' #@param {type: "string"}
LOOP_LENGTH = 6 #@param {type:"number"}

# Prime song creation using an arbitrary audio sample.
mode = 'primed'
codes_file=None
# Specify an audio file here.
audio_file = INPUT_PATH
# Specify how many seconds of audio to prime on.
prompt_length_in_seconds=LOOP_LENGTH

<font color="red">Only run the cell below in the event of a memory error or other crash.</font> This will restore the process from where you left off.

In [None]:
if os.path.exists(hps.name):
  # Identify the lowest level generated and continue from there.
  for level in [1, 2]:
    data = f"{hps.name}/level_{level}/data.pth.tar"
    if os.path.isfile(data):
      mode = 'upsample'
      codes_file = data
      print('Upsampling from level '+str(level))
      break
print('mode is now '+mode)

Set hyperparameters.

In [7]:
sample_hps = Hyperparams(dict(mode=mode, codes_file=codes_file, audio_file=audio_file, prompt_length_in_seconds=prompt_length_in_seconds))

Enter the exact desired length of your final renders, to as many decimal places as possible, in the **RENDER_LENGTH** field below. This number should be an integer multiple of the input loop's length in order to generate complete loops, and less than about 90 seconds to make sure the generation process finishes in a day. 




In [8]:
RENDER_LENGTH = 48 #@param {type:"number"}

sample_length_in_seconds = RENDER_LENGTH          # Full length of musical sample to generate - we find songs in the 1 to 4 minute
                                       # range work well, with generation time proportional to sample length.  
                                       # This total length affects how quickly the model 
                                       # progresses through lyrics (model also generates differently
                                       # depending on if it thinks it's in the beginning, middle, or end of sample)
hps.sample_length = (int(sample_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
assert hps.sample_length >= top_prior.n_ctx*top_prior.raw_to_tokens, f'Please choose a larger sampling rate'

Enter an <a href="https://github.com/openai/jukebox/blob/master/jukebox/data/ids/v2_artist_ids.txt" windown="_blank">artist</a> whose style you want the algorithm to model, and a <a href="https://github.com/openai/jukebox/blob/master/jukebox/data/ids/v2_genre_ids.txt" windown="_blank">genre</a> you want the algorithm to model. Be sure to copy the artist name from OpenAI Github list exactly--some of the spellings are a bit unusual. Enter any lyrics for Jukebox to sing.

In [9]:
ARTIST = "james_taylor" #@param {type: "string"}
GENRE = "alternative" #@param {type: "string"}
LYRICS = "blah blah" #@param {type: "string"}


# Note: Metas can contain different prompts per sample.
# By default, all samples use the same prompt.
metas = [dict(artist = ARTIST,
            genre = GENRE,
            total_length = hps.sample_length,
            offset = 0,
            lyrics = LYRICS,
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

sampling_temperature = .98

lower_batch_size = 16
max_batch_size = 3 if model in ('5b', '5b_lyrics') else 16
lower_level_chunk_size = 32
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
                        chunk_size=lower_level_chunk_size),
                    dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
                         chunk_size=lower_level_chunk_size),
                    dict(temp=sampling_temperature, fp16=True, 
                         max_batch_size=max_batch_size, chunk_size=chunk_size)]

Now we're ready to sample from the model. We'll generate the top level (2) first, followed by the first upsampling (level 1,) and then the second upsampling (level 0.) After each level, we decode to raw audio and save the audio files.   

This next cell will take a while (approximately 10 minutes per 20 seconds of music sample.)

In [10]:
if sample_hps.mode == 'ancestral':
  zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cuda') for _ in range(len(priors))]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'upsample':
  assert sample_hps.codes_file is not None
  # Load codes.
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cuda() for z in data['zs']]
  assert zs[-1].shape[0] == hps.n_samples, f"Expected bs = {hps.n_samples}, got {zs[-1].shape[0]}"
  del data
  print('Falling through to the upsample step later in the notebook.')
elif sample_hps.mode == 'primed':
  assert sample_hps.audio_file is not None
  audio_files = sample_hps.audio_file.split(',')
  duration = (int(sample_hps.prompt_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
  x = load_prompts(audio_files, duration, hps)
  zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0])
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
else:
  raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

Sampling level 2
Sampling 8192 tokens for [0,8192]. Conditioning on 2067 tokens
Primed sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
130/130 [00:23<00:00,  5.50it/s]
6125/6125 [08:40<00:00, 11.78it/s]
Sampling 8192 tokens for [1024,9216]. Conditioning on 7168 tokens
Primed sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
448/448 [01:48<00:00,  4.14it/s]
1024/1024 [01:43<00:00,  9.87it/s]
Sampling 8192 tokens for [2048,10240]. Conditioning on 7168 tokens
Primed sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
448/448 [01:48<00:00,  4.14it/s]
1024/1024 [01:43<00:00,  9.87it/s]
Sampling 8192 tokens for [3072,11264]. Conditioning on 7168 tokens
Primed sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
448/448 [01:48<00:00,  4.14it/s]
1024/1024 [01:43<00:00,  9.88it/s]
Sampling 8192 tokens for [4096,12288]. Conditioning on 7168 tokens
Primed sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
448/448 [01:48<00:00,  4.14it/s]
1024/1024 [01:43<00:00,  9.88it/s]
Sampling

We are now done with the large top_prior model, and instead load the upsamplers.

In [11]:
# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization during the upsampling stage). For a hosted runtime, 
# we'll need to go ahead and delete the top_prior if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None
upsamplers = [make_prior(setup_hparams(prior, dict()), vqvae, 'cpu') for prior in priors[:-1]]
labels[:2] = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in upsamplers]

Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/prior_level_0.pth.tar https://openaipublic.azureedge.net/jukebox/models/5b/prior_level_0.pth.tar
Restored from /root/.cache/jukebox/models/5b/prior_level_0.pth.tar
0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from azure
Running  wget -O /root/.cache/jukebox/models

**Note:** this next upsampling step will take several hours.

In [None]:
zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

Sampling level 1
Sampling 8192 tokens for [0,8192]. Conditioning on 8192 tokens
Sampling 8192 tokens for [4096,12288]. Conditioning on 4172 tokens
Primed sampling 3 samples with temp=0.99, top_k=0, top_p=0.0
131/131 [00:10<00:00, 12.06it/s]
4020/4020 [03:29<00:00, 19.22it/s]
Sampling 8192 tokens for [8192,16384]. Conditioning on 4096 tokens
Primed sampling 3 samples with temp=0.99, top_k=0, top_p=0.0
128/128 [00:10<00:00, 12.48it/s]
438/4096 [00:21<03:09, 19.34it/s]

Now you can harvest the final rendered files in Google Drive (or listen below.) There should be three .wav files in the ```level_0``` folder, all of which are variations on the input loop. Don't harvest the files from the  ```level_1``` or  ```level_2``` folders. They are just lower resolution versions of the  ```level_0``` files.

In [None]:
#Play render 1

Audio(f'{hps.name}/level_0/item_0.wav')

In [None]:
#Play render 2

Audio(f'{hps.name}/level_0/item_1.wav')

In [None]:
#Play render 3

Audio(f'{hps.name}/level_0/item_2.wav')

In [None]:
# Clean up cache

del upsamplers
empty_cache()