# Video Killed The Radio Star ...Diffusion.

Notebook by David Marx ([@DigThatData](https://twitter.com/digthatdata))

Shared under MIT license


## FAQ

**What is this?**

Point this notebook at a youtube url and it'll make a music video for you.

**How does this animation technique work?**

For each text prompt you provide, the notebook will...

1. Generate an image based on that text prompt (using stable diffusion)
2. Use the generated image as the `init_image` to recombine with the text prompt to generate variations similar to the first image. This produces a sequence of extremely similar images based on the original text prompt
3. Images are then intelligently reordered to find the smoothest animation sequence of those frames
3. This image sequence is then repeated to pad out the animation duration as needed

The technique demonstrated in this notebook was inspired by a [video](https://www.youtube.com/watch?v=WJaxFbdjm8c) created by Ben Gillin.

**How are lyrics transcribed?**

This notebook uses openai's recently released 'whisper' model for performing automatic speech recognition. 
OpenAI was kind enough to offer several different sizes of this model which each have their own pros and cons. 
This notebook uses the largest whisper model for transcribing the actual lyrics. Additionally, we use the 
smallest model for performing the lyric segmentation. Neither of these models is perfect, but the results 
so far seem pretty decent.

The first draft of this notebook relied on subtitles from youtube videos to determine timing, which was
then aligned with user-provided lyrics. Youtube's automated captions are powerful and I'll update the
notebook shortly to leverage those again, but for the time being we're just using whisper for everything
and not referencing user-provided captions at all.

**Something didn't work quite right in the transcription process. How do fix the timing or the actual lyrics?**

The notebook is divided into several steps. Between each step, a "storyboard" file is updated. If you want to
make modifications, you can edit this file directly and those edits should be reflected when you next load the
file. Depending on what you changed and what step you run next, your changes may be ignored or even overwritten.
Still playing with different solutions here.

**Can I provide my own images to 'bring to life' and associate with certain lyrics/sequences?**

Yes, you can! As described above: you just need to modify the storyboard. Will describe this functionality in
greater detail after the implementation stabilizes a bit more.

**This gave me an idea and I'd like to use just a part of your process here. What's the best way to reuse just some of the machinery you've developed here?**

Most of the functionality in this notebook has been offloaded to library I published to pypi called `vktrs`. I strongly encourage you to import anything you need 
from there rather than cutting and pasting function into a notebook. Similarly, if you have ideas for improvements, please don't hesitate to submit a PR!

**How can I support your work or work like it?**

This notebook was made possible thanks to ongoing support from [stability.ai](https://stability.ai/). The best way to support my work is to share it with your friends, [report bugs](https://github.com/dmarx/video-killed-the-radio-star/issues/new), [suggest features](https://github.com/dmarx/video-killed-the-radio-star/discussions) or to donate to open source non-profits :) 

In [None]:
%%capture
# @title # 0. üõ†Ô∏è Setup
!pip install vktrs[api,hf]

!pip install git+https://github.com/openai/whisper

# these are only needed for hf
!pip install "ipywidgets>=7,<8"
!sudo apt -qq install git-lfs
!git config --global credential.helper store

!pip install panel

In [None]:
# @markdown # üìä Check GPU Status

from vktrs.utils import gpu_info

gpu_info()

In [None]:
# @title # 1. üîë Provide your API Key
# @markdown Running this cell will prompt you to enter your API Key below. 

# @markdown To get your API key, visit https://beta.dreamstudio.ai/membership

# @markdown ---

# @markdown A note on security best practices: **don't publish your API key.**

# @markdown We're using a form field designed for sensitive data like passwords.
# @markdown This notebook does not save your API key in the notebook itself,
# @markdown but instead loads your API Key into the colab environment. This way,
# @markdown you can make changes to this notebook and share it without concern
# @markdown that you might accidentally share your API Key. 
# @markdown 

use_stability_api = False # @param {type:'boolean'}
mount_gdrive = False # @param {type:'boolean'}

import os
from pathlib import Path

os.environ['XDG_CACHE_HOME'] = os.environ.get(
    'XDG_CACHE_HOME',
    str(Path('~/.cache').expanduser())
)
if mount_gdrive:
    from google.colab import drive
    drive.mount('/content/drive')
    Path('/content/drive/MyDrive/AI/models/.cache/').mkdir(parents=True, exist_ok=True) 
    # This rm+ln solution is not great. Be careful not to run this locally. 
    # Low risk, but could be annoying    
    !rm -rf /root/.cache
    !ln -sf /content/drive/MyDrive/AI/models/.cache/ /root/
    # Following line will be sufficient pending merge of https://github.com/openai/whisper/pull/257
    os.environ['XDG_CACHE_HOME']='/content/drive/MyDrive/AI/models/.cache'

if use_stability_api:
    import os, getpass
    os.environ['STABILITY_KEY'] = getpass.getpass('Enter your API Key')
else:
    try:
        from google.colab import output
        output.enable_custom_widget_manager()
    except ImportError:
        # assume local use
        pass
    
    from huggingface_hub import notebook_login

    # to do: if gdrive mounted, check for API token... somewhere on drive?
    # looks like we should be able to find the token through an environment variable
    notebook_login()

from omegaconf import OmegaConf
from pathlib import Path

model_dir_str=str(Path(os.environ['XDG_CACHE_HOME']))
proj_root_str = '${active_project}'
if mount_gdrive:
    proj_root_str = '/content/drive/MyDrive/AI/VideoKilledTheRadioStar/${active_project}'
    #model_dir_str = '/content/drive/MyDrive/AI/models' # hf prob needs addl subfolder

# notebook config
cfg = OmegaConf.create({
    'active_project':'default_workspace',
    'project_root':proj_root_str,
    'gdrive_mounted':mount_gdrive,
    'use_stability_api':use_stability_api,
    'model_dir':model_dir_str,
    'output_dir':'${active_project}/frames'
})

#Path(cfg.project_root).mkdir(parents=True, exist_ok=True)
#with open(Path(cfg.project_root) / 'config.yaml','w') as fp:
# This is just the WORKSPACE config. Better to just leave it in cwd
with open('config.yaml','w') as fp:
    OmegaConf.save(config=cfg, f=fp.name)

In [None]:
# @title # 2. üìã Audio processing parameters

from omegaconf import OmegaConf
from pathlib import Path

workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
use_stability_api = workspace.use_stability_api
model_dir = workspace.model_dir

root = workspace.project_root
root = Path(root)
root.mkdir(parents=True, exist_ok=True)

# this needs to not be in the same cell as the login.
# some sort of stupid race condition.
if not use_stability_api:
    # init hf here to download models
    from vktrs.hf import HfHelper
    try:
        hf_helper = HfHelper(
            download=False,
            model_path=str(Path(model_dir) / 'huggingface' / 'diffusers')
        )
    except:
        hf_helper = HfHelper(
            download=True,
            model_path=str(Path(model_dir) / 'huggingface' / 'diffusers')
        )

import datetime as dt
from itertools import chain, cycle
import json
import os

import re
import string
from subprocess import Popen, PIPE
import textwrap
import time
import warnings

from IPython.display import display
import numpy as np
from tqdm.autonotebook import tqdm

import tokenizations
import webvtt


# to do: use project name to name file
# to do: separate global params defined here from the storyboard object.
#        users will not anticipate that updates here will destroy their work
storyboard = OmegaConf.create()

storyboard.params = dict(

    # all this does is make it so each of the following lines can be preceded with a comma
    # otw the first parameter would be offset from the other in the colab form
    _=""

    , video_url = 'https://www.youtube.com/watch?v=REojIUxX4rw' # @param {type:'string'}
    , audio_fpath = '' # @param {type:'string'}
    , whisper_seg = True # @param {type:'boolean'}

    #, use_stability_api = use_stability_api
)


if not storyboard.params.audio_fpath:
    storyboard.params.audio_fpath = None


# @markdown `video_url` - URL of a youtube video to download as a source for audio and potentially for text transcription as well.

# @markdown `whisper_seg` - Whether or not to use openai's whisper model for lyric segmentation. This is currently the only option, but that will change in a few days.


##################
# markdown `audio_fpath` - Optionally provide an audio file instead of relying on a youtube download. Name it something other than 'audio.mp3', 
# markdown                 otherwise it might get overwritten accidentally.
##################


storyboard_fname = root / 'storyboard.yaml'
with open(storyboard_fname,'wb') as fp:
    OmegaConf.save(config=storyboard, f=fp.name)



In [None]:
%%capture

# @title # 3. üì• Download audio from youtube

from vktrs.utils import get_audio_duration_seconds
from vktrs.youtube import (
    YoutubeHelper,
    parse_timestamp,
    vtt_to_token_timestamps,
    srv2_to_token_timestamps,
)

from omegaconf import OmegaConf
from pathlib import Path

workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

video_url = storyboard.params.video_url

if video_url:
    # check if user provided an audio filepath (or we already have one from youtube) before attempting to download
    if storyboard.params.get('audio_fpath') is None:
        helper = YoutubeHelper(
            video_url,
            ydl_opts = {
                'outtmpl':{'default':str( root / f"ytdlp_content.%(ext)s" )},
                'writeautomaticsub':True,
                'subtitlesformat':'srv2/vtt'
                },
        )

        # estimate video end
        video_duration = dt.timedelta(seconds=helper.info['duration'])
        storyboard.params['video_duration'] = video_duration.total_seconds()

        audio_fpath = str( root / 'audio.mp3' )
        input_audio = helper.info['requested_downloads'][-1]['filepath']
        !ffmpeg -y -i "{input_audio}" -acodec libmp3lame {audio_fpath}

        # to do: write audio and subtitle paths/meta to storyboard
        storyboard.params.audio_fpath = audio_fpath

        if False:
            subtitle_format = helper.info['requested_subtitles']['en']['ext']
            subtitle_fpath = helper.info['requested_subtitles']['en']['filepath']

            if subtitle_format == 'srv2':
                with open(subtitle_fpath, 'r') as f:
                    srv2_xml = f.read() 
                token_start_times = srv2_to_token_timestamps(srv2_xml)
                # to do: handle timedeltas...
                #storyboard.params.token_start_times = token_start_times

            elif subtitle_format == 'vtt':
                captions = webvtt.read(subtitle_fpath)
                token_start_times = vtt_to_token_timestamps(captions)
                # to do: handle timedeltas...
                #storyboard.params.token_start_times = token_start_times

            # If unable to download supported subtitles, force use whisper
            else:
                storyboard.params.whisper_seg = True


# estimate video end
if storyboard.params.get('video_duration') is None:
    # estimate duration from audio file
    audio_fpath = storyboard.params['audio_fpath']
    storyboard.params['video_duration'] = get_audio_duration_seconds(audio_fpath)

if storyboard.params.get('video_duration') is None:
    raise RuntimeError('unable to determine audio duration. was a video url or path to a file supplied?')

# force use
storyboard.params.whisper_seg = True

with open(storyboard_fname,'wb') as fp:
    OmegaConf.save(config=storyboard, f=fp.name)

whisper_seg = storyboard.params.whisper_seg

In [None]:
# @title # 4. üí¨ Transcribe and segment speech using whisper

import gc
from omegaconf import OmegaConf
from pathlib import Path
import time

import tokenizations
from vktrs.utils import remove_punctuation
import whisper

workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

whisper_seg = storyboard.params.whisper_seg

if whisper_seg:
    from vktrs.asr import (
        #whisper_lyrics,
        #whisper_transcribe,
        #whisper_align,
        whisper_transmit_meta_across_alignment,
        whisper_segment_transcription,
    )

    #prompt_starts = whisper_lyrics(audio_fpath=storyboard.params.audio_fpath)

    audio_fpath = storyboard.params.audio_fpath
    #whispers = whisper_transcribe(audio_fpath)

    segmentation_model = 'tiny'
    transcription_model = 'large'

    storyboard.params.whisper = dict(
        segmentation_model = segmentation_model
        ,transcription_model = transcription_model
    )

    whispers = {
        #'tiny':None, # 5.83 s
        #'large':None # 3.73 s
    }
    # accelerated runtime required for whisper
    # to do: pypi package for whisper

    # to do: use transcripts we've already built if we have them
    #scripts = storyboard.params.whisper.get('transcriptions')
    
    for k in set([segmentation_model, transcription_model]):
        #if k in scripts:

        options = whisper.DecodingOptions(
            language='en',
        )
        # to do: be more proactive about cleaning up these models when we're done with them
        model = whisper.load_model(k).to('cuda')
        start = time.time()
        print(f"Transcribing audio with whisper-{k}")
        
        # to do: calling transcribe like this unnecessarily re-processes audio each time.
        whispers[k] = model.transcribe(audio_fpath) # re-processes audio each time, ~10s overhead?
        print(f"elapsed: {time.time()-start}")
        del model
        gc.collect()
    
    #######################
    # save transcriptions #
    #######################

    transcriptions = {}
    transcription_root = root / 'whispers'
    transcription_root.mkdir(parents=True, exist_ok=True)
    for k in whispers:
        outpath = str( transcription_root / f"{k}.vtt" )
        transcriptions[k] = outpath
        with open(outpath,'w') as f:
            # to do: upstream PR to control verbosity
            whisper.utils.write_vtt(
                whispers[k]["segments"], # ...really?
                file=f
            )
    storyboard.params.whisper.transcriptions = transcriptions

    #tiny2large, large2tiny, whispers_tokens = whisper_align(whispers)
    # sanitize and tokenize
    whispers_tokens = {}
    for k in whispers:
        whispers_tokens[k] = [
        remove_punctuation(tok) for tok in whispers[k]['text'].split()
        ]

    # align sequences
    tiny2large, large2tiny = tokenizations.get_alignments(
        whispers_tokens[segmentation_model], #whispers_tokens['tiny'],
        whispers_tokens[transcription_model] #whispers_tokens['large']
    )
    #return tiny2large, large2tiny, whispers_tokens

    token_large_index_segmentations = whisper_transmit_meta_across_alignment(
        whispers,
        large2tiny,
        whispers_tokens,
    )
    prompt_starts = whisper_segment_transcription(
        token_large_index_segmentations,
    )


    #return prompt_starts
    #storyboard.prompt_starts = prompt_starts
    # to do: deal with these td objects
    with open(storyboard_fname) as fp:
        OmegaConf.save(config=storyboard, f=fp.name)

######################################################

# title # 4.b (optional) Review/Modify transcription

# markdown Run this cell for an opportunity to review and modify the
# markdown transcription.

import pandas as pd
import panel as pn

# https://panel.holoviz.org/reference/widgets/Tabulator.html
pn.extension('tabulator') # I don't know that specifying 'tabulator' here is even necessary...

tabulator_formatters = {
    #'float': {'type': 'progress', 'max': 10},
    'bool': {'type': 'tickCross'}
}

df = pd.DataFrame(prompt_starts).rename(
    columns={
        'ts':'Timestamp (sec)',
        'prompt':'Lyric',
    }
)

if 'td' in df:
  del df['td']

import copy
df_pre = copy.deepcopy(df)
pn.widgets.Tabulator(df, formatters=tabulator_formatters)

In [None]:
# @title # 5. üßÆ Math

# update prompt_starts if any changes were made above
import numpy as np
if not np.all(df_pre.values == df.values):
    df_pre = copy.deepcopy(df)
    for i, rec in enumerate(prompt_starts):
        rec['ts'] = df['Timestamp (sec)']
        rec['td'] = dt.timedelta(rec['ts'])
        rec['prompt'] = df['Lyric']

############################################

workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

### This cell computes how many frames are needed for each segment
### based on the start times for each prompt

import datetime as dt
#fps = storyboard.params.fps


# @markdown `fps` - Frames-per-second of generated animations

fps = 12 # @param {type:'integer'}
storyboard.params.fps = fps

ifps = dt.timedelta(seconds=1/fps)

# estimate video end
video_duration = storyboard.params['video_duration']

# dummy prompt for last scene duration
prompt_starts.append({'td':dt.timedelta(seconds=video_duration)})

# make sure we respect the duration of the previous phrase
frame_start=dt.timedelta(seconds=0)
prompt_starts[0]['anim_start']=frame_start
for i, rec in enumerate(prompt_starts[1:], start=1):
  rec_prev = prompt_starts[i-1]
  k=0
  while (rec_prev['anim_start'] + k*ifps) < rec['td']:
    k+=1
  k-=1
  rec_prev['frames'] = k
  rec_prev['anim_duration'] = k*ifps
  frame_start+=k*ifps
  rec['anim_start']=frame_start

# make sure we respect the duration of the previous phrase
# to do: push end time into a timedelta and consider it... somewhere near here
for i, rec1 in enumerate(prompt_starts):
    rec0 = prompt_starts[i-1]
    rec0['duration'] = rec1['td'] - rec0['td']

# drop the dummy frame
prompt_starts = prompt_starts[:-1]

# to do: given a 0 duration prompt, assume its duration is captured in the next prompt 
#        and guesstimate a corrected prompt start time and duration 


### checkpoint the processing work we've done to this point

import copy

prompt_starts_copy = copy.deepcopy(prompt_starts)

for rec in prompt_starts_copy:
    for k,v in list(rec.items()):
        if isinstance(v, dt.timedelta):
            rec[k] = v.total_seconds()

        # flush image objects if they're there, they anger omegaconf
        if k in ('frame0','variations','images', 'images_raw'):
            rec.pop(k)

storyboard.prompt_starts = prompt_starts_copy

# to do: deal with these td objects
#storyboard_fname = 'storyboard.yaml'
with open(storyboard_fname) as fp:
    OmegaConf.save(config=storyboard, f=fp.name)

In [None]:
# @title # 6. üé® Generate init images

import copy
import datetime as dt
from omegaconf import OmegaConf
from pathlib import Path
import random
import string
from tqdm.autonotebook import tqdm

import PIL

from vktrs.tsp import (
    tsp_permute_frames,
    batched_tsp_permute_frames,
)

from vktrs.utils import (
    add_caption2image,
    save_frame,
    remove_punctuation,
)


workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

prompt_starts = storyboard.prompt_starts
use_stability_api = workspace.use_stability_api
model_dir = workspace.model_dir

if use_stability_api:
    from vktrs.api import get_image_for_prompt
else:
    from vktrs.hf import HfHelper
    helper = HfHelper(
        download=False,
        model_path=str(Path(model_dir) / 'huggingface' / 'diffusers')
        )

    # I give up.
    def get_image_for_prompt(*args, **kargs):
        result = helper.get_image_for_prompt(*args, **kargs)
        return result.images
    """
    def get_image_for_prompt(*args, **kargs):
        nsfw_regens = storyboard.params.nsfw_regens
        if nsfw_regens < 0:
            return helper.get_image_for_prompt(*args, **kargs).images
        while nsfw_regens > 0:
            result = helper.get_image_for_prompt(*args, **kargs)
            #if not any(result.nsfw_content_detected):
            if hasattr(result, 'nsfw_content_detected') is None:
                return result.images
            print("NSFW filter triggered. Attempting to regenerate images...")
            nsfw_regens -= 1
        raise RuntimeError(
            "Regenerations maxed out. Halting progress."
            "Please modify your prompt or disable the nsfw classifier."
            "The classifier can be disabled by setting the regens "
            "parameter to a negative value."
        )
        """


def get_variations_w_init(prompt, init_image, **kargs):
    return list(get_image_for_prompt(prompt=prompt, init_image=init_image, **kargs))

def get_close_variations_from_prompt(prompt, n_variations=2, image_consistency=.7):
    """
    prompt: a text prompt
    n_variations: total number of images to return
    image_consistency: float in [0,1], controls similarity between images generated by the prompt.
                        you can think of this as controlling how much "visual vibration" there will be.
                        - 0=regenerate each iandely identical
    """
    images = list(get_image_for_prompt(prompt))
    for _ in range(n_variations - 1):
        img = get_variations_w_init(prompt, images[0], start_schedule=(1-image_consistency))[0]
        images.append(img)
    return images


d_ = dict(
    _=''
    , theme_prompt = "extremely detailed, painted by ralph steadman and radiohead, beautiful, wow" # @param {type:'string'}

    , height = 512 # @param {type:'integer'}
    , width = 512 # @param {type:'integer'}
    , display_frames_as_we_get_them = True # @param {type:'boolean'}

    #, nsfw_regens = 3 # @param {type:'integer'}
)


# @markdown `theme_prompt` - Text that will be appended to the end of each lyric, useful for e.g. applying a consistent aesthetic style

# @markdown `display_frames_as_we_get_them` - Displaying frames will make the notebook slightly slower


regenerate_init_images = False
if d_['theme_prompt'] != storyboard.params.get('theme_prompt'):
    regenerate_init_images = True

storyboard.params.update(d_)

if regenerate_init_images:
    for rec in prompt_starts:
        rec['frame0_fpath'] = None
        rec['variations_fpaths'] = None
        rec['images_fpaths'] = None

theme_prompt = storyboard.params.theme_prompt
display_frames_as_we_get_them = storyboard.params.display_frames_as_we_get_them
height = storyboard.params.height
width = storyboard.params.width


# to do: move this up to run params
#proj_name = 'test'
proj_name = workspace.active_project

print("Ensuring each prompt has an associated image")
for idx, rec in enumerate(prompt_starts):
    print(
        f"[{rec['anim_start']} | {rec['ts']}] [{rec['duration']} | {rec['anim_duration']}] - {rec['frames']} - {rec['prompt']}"
    )
    lyric = rec['prompt']
    prompt = f"{lyric}, {theme_prompt}"
    if rec.get('frame0_fpath') is None:
        init_image = list(get_image_for_prompt(
              prompt, 
              height=height,
              width=width,
              )
          )[0]
        rec['frame0_fpath'] = save_frame(
            init_image,
            idx,
            #root_path=Path('./frames') / proj_name,
            #name=proj_name, ## to do.... uh... i dunno
            root_path = root / 'frames', # to do: this field should accept a string as well
            name='anchor',
            )

        if display_frames_as_we_get_them:
            print(lyric)
            display(init_image)

########################
# update config

prompt_starts_copy = copy.deepcopy(prompt_starts)

for rec in prompt_starts_copy:
    for k,v in list(rec.items()):
        if isinstance(v, dt.timedelta):
            rec[k] = v.total_seconds()
        # flush images for now
        if k in ('frame0','variations','images', 'images_raw'):
            rec.pop(k)

storyboard.prompt_starts = prompt_starts_copy

# to do: deal with these td objects
#storyboard_fname = 'storyboard.yaml'
with open(storyboard_fname) as fp:
    OmegaConf.save(config=storyboard, f=fp.name)

In [None]:
# @title # 7. üöÄ Generate animation frames

from omegaconf import OmegaConf
from PIL import Image

import copy
import datetime as dt
from itertools import cycle

# reload config
workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

prompt_starts = OmegaConf.to_container(storyboard.prompt_starts, resolve=True)


# `nsfw_regens` - Max number of times to attempt regenerating an image after triggering the NSFW classifier (huggingface only, see [Open RAIL-M restrictions](https://huggingface.co/spaces/CompVis/stable-diffusion-license))

# @markdown `n_variations` - How many unique variations to generate for a given text prompt. This determines the frequency of the visual "pulsing" effect

# @markdown `image_consistency` - controls similarity between images generated by the prompt.
# @markdown - 0: ignore the init image
# @markdown - 1: true as possible to the init image

# @markdown `add_caption` - Whether or not to overlay the prompt text on the image

# @markdown `optimal_ordering` - Intelligently permutes animation frames to provide a smoother animation.

# @markdown `max_video_duration_in_seconds` - Early stopping if you don't want to generate a video the full duration of the provided audio. Default = 5min.


d_ = dict(
    _=''

    , n_variations=5 # @param {type:'integer'}
    , image_consistency=0.8 # @param {type:"slider", min:0, max:1, step:0.01}  
    , add_caption = False # @param {type:'boolean'}
    , optimal_ordering = True # @param {type:'boolean'}
    , max_video_duration_in_seconds = 300 # @param {type:'integer'}

    # this parameter is currently not exposed in the form
    , max_variations_per_opt_pass = 15
)

storyboard.params.update(d_)
storyboard.params.max_frames = storyboard.params.fps * storyboard.params.max_video_duration_in_seconds

print(f"Max total frames: {storyboard.params.max_frames}")
#print(f"Max API requests: {int(max_frames/repeat)}")

if storyboard.params.optimal_ordering:

    opt_batch_size = storyboard.params.n_variations
    while opt_batch_size > storyboard.params.max_variations_per_opt_pass:
        opt_batch_size /= 2
    print(f"Frames per re-ordering batch: {opt_batch_size}")
    storyboard.params.opt_batch_size = opt_batch_size


add_caption = storyboard.params.get('add_caption')
optimal_ordering = storyboard.params.optimal_ordering
display_frames_as_we_get_them = storyboard.params.display_frames_as_we_get_them
image_consistency = storyboard.params.image_consistency
max_frames = storyboard.params.max_frames
max_variations_per_opt_pass = storyboard.params.max_variations_per_opt_pass
n_variations = storyboard.params.n_variations
theme_prompt = storyboard.params.get('theme_prompt')


# load init_images and generate variations as needed
# to do: use SDK args to request multiple images in single request...
frames = []
print("Fetching variations")
for idx, rec in enumerate(prompt_starts):
    images = []
    images_fpaths = rec.get('images_fpaths')
    curr_variation_count = 0 if images_fpaths is None else len(images_fpaths)
    if curr_variation_count < n_variations:
        lyric = rec['prompt']
        prompt = f"{lyric}, {theme_prompt}"

        init_image = Image.open(rec['frame0_fpath'])
        n_variations = rec.get('n_variations', storyboard.params.n_variations)
        n_variations = min(n_variations, rec['frames']) # don't generate variations we won't use
        n_variations -= curr_variation_count  # only generate variations we need
        for _ in range(n_variations - 1):
            img = get_variations_w_init(prompt, init_image, start_schedule=(1-image_consistency))[0]
            images.append(img)

        # to do: collect images in a separate object to facilitate storyboard updates
        rec['variations'] = images
        images = [init_image] + images

        rec['variations_fpaths'] = [
            save_frame(
                img,
                idx,
                root_path= root / 'frames', #Path('./frames') / proj_name,
                #name=proj_name, ## need to make sure each image gets a unique name
            ) for j, img in enumerate(rec['variations'])
        ]

        # to do: persist the ordering in the storyboard
        if optimal_ordering:
            images = batched_tsp_permute_frames(
                images,
                max_variations_per_opt_pass
            )
        rec['images'] = rec['images_raw'] = images

        if add_caption:
            rec['images'] = [add_caption2image(im, rec['prompt']) for im in rec['images']]
        
        rec['images_fpaths'] = [
            save_frame(
                img,
                idx,
                root_path=Path('./frames') / proj_name,
                #name=proj_name, ## need to make sure each image gets a unique name
            ) for j, img in enumerate(rec['images'])
        ]
    else:
        # load frames if we've already generated them
        for im_fpath in rec['images_fpaths']:
            im = Image.open(im_fpath)
            images.append(im)
        rec['images'] = images

    if display_frames_as_we_get_them:
        print(rec['prompt'])
        for im in rec['images']:
            display(im)

    #images *= repeat
    sequence = []
    frame_factory = cycle(rec['images'])
    while len(sequence) < rec['frames']:
        sequence.append(next(frame_factory))
    frames.extend(sequence)
    if len(frames) >= max_frames:
        break

########################
# update config

prompt_starts_copy = copy.deepcopy(prompt_starts)

for rec in prompt_starts_copy:
    for k,v in list(rec.items()):
        if isinstance(v, dt.timedelta):
            rec[k] = v.total_seconds()
        # flush images for now
        if k in ('frame0','variations','images', 'images_raw'):
            rec.pop(k)

storyboard.prompt_starts = prompt_starts_copy

# to do: deal with these td objects
#storyboard_fname = 'storyboard.yaml'
with open(storyboard_fname) as fp:
    OmegaConf.save(config=storyboard, f=fp.name)

In [None]:
# @title # 8. üé• Compile your video!

from subprocess import Popen, PIPE

from omegaconf import OmegaConf
from tqdm.autonotebook import tqdm

# reload config
workspace = OmegaConf.load('config.yaml')
OmegaConf.resolve(workspace)
root = Path(workspace.project_root)

storyboard_fname = root / 'storyboard.yaml'
storyboard = OmegaConf.load(storyboard_fname)

fps = storyboard.params.fps
input_audio = storyboard.params.audio_fpath

output_filename = 'output.mp4' # @param {type:'string'}
output_filename = str( root / output_filename )
storyboard.params.output_filename = output_filename

# to do: read frames and variations back into memory. This should be the last cell that gets run, so we need to 
# update state wrt any user interventions in the storyboard object. actually, should probably do the text overlay step here


cmd_in = ['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-']
cmd_out = ['-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '1', '-preset', 'veryslow', '-shortest', output_filename]

if input_audio:
  cmd_in += ['-i', str(input_audio)]

cmd = cmd_in + cmd_out

p = Popen(cmd, stdin=PIPE)
#for im in tqdm(chain(frames)):
for im in tqdm(frames):
  im.save(p.stdin, 'PNG')
p.stdin.close()

print("Encoding video...")
p.wait()
print("Video complete.")
print(f"Video saved to: {storyboard.params.output_filename}")

In [None]:
# @title # 9. üì∫ Enjoy your animation!

output_filename = storyboard.params.output_filename

download_video = False # @param {type:'boolean'}
compress_video = False # @param {type:'boolean'}

# @markdown Compressing to `*.tar.gz`` format can reduce filesize, which in turn reduces
# @markdown your download time. You may need to install additional software
# @markdown to "decompress" the file after downloading to view your video.


#  NB: only embed short videos
embed_video_in_notebook = False

if compress_video:
    uncompressed_fname = output_filename
    output_filename = f"{output_filename}.tar.gz"
    print(f"Compressing to: {output_filename}")
    !tar -czvf {output_filename} {uncompressed_fname}

if download_video:
    from google.colab import files
    files.download(output_filename)

if embed_video_in_notebook:
    from IPython.display import display, Video
    display(Video(output_filename, embed=True))

## ‚öñÔ∏è I put on my robe and lawyer hat

### Notebook license

This notebook and the accompanying [git repository](https://github.com/dmarx/video-killed-the-radio-star/) and its contents are shared under the MIT license.

<!-- Note to self: lawyers should really be forced to use some sort of markup or pseudocode to eliminate ambiguity 

...oh shit, if laws were actually described in code, we could just run queries against it
-->

```
MIT License

Copyright (c) 2022 David Marx

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

### DreamStudio API TOS

The default behavior of this notebook uses the [DreamStudio](https://beta.dreamstudio.ai/) API to generate images. Users of the DreamStudio API are subject to the DreamStudio usage terms: https://beta.dreamstudio.ai/terms-of-service

### Stable Diffusion

As of the date of this writing (2022-09-29), all publicly available model checkpoints are subject to the restrictions of the Open RAIL license: https://huggingface.co/spaces/CompVis/stable-diffusion-license. 

