# Script Demo

This is an early of UCSD's markdown to lecture pipeline through HeyGen. This pipeline takes you from a text document, or script, all the way to a finished video. The way it works is you edit a $\verb|.md|$ found in the notebook scripts folder and feed the name to the second code cell in this notebook. An example and discussion of the syntax can be found in $\verb|examples/example.md|$ or $\verb|syntax.md|$. A list of voice and avatar ids is available in the files $\verb|avatar_options/voices.txt|$ and $\verb|avatar_options/avatars.txt|$.

Supported syntax:

    - Avatar: Sets chacteristics of the avatar: id, voice_id, position, scale, style, cbc and bc. Inovked with ()

    - Composition: can set picture in picture or avatar-only composition type. Invoked with []
        
    - Transition: any transition including concatenation, any duration. Invoked with {}

        Ex: {0.5, wipeleft} This will invoke a wipeleft transition that lasts 0.5 seconds

To invoke a transition simply introduce a new line or use a {} command. [] or () commands in the middle of the line will trigger a midline cut like so:
        
    {hlwind, 3.0} [style:circle, background:#F5E3A2, position:(0.75;0.75)]This is a different video. If everything worked, it's been stitched together with the previous video with a visual and audio cross fade. The only clip composition that is currently supported is the picture in picture made by heygen with default a setting of placing the avatar at half scale in the bottom right of the slide. [type: avatar-only] (style:closeUp, position:0.25;0.75) Slides advance in order with the each paragraph.Future versions of this program will expand on these functionalities. I will figure out how to import a PDF instead of a series of images. I will support more complex clip compositions such as side by side, avatar-only, slide-only and [type:pip] (style:normal, position:0.75;0.75)imported media as well as more advanced transitions like sudden changes mid-sentence or wipes and slides or cross dissolves. The markdown syntax will obviously evolve concurrently.

    The [type: avatar-only] (style:closeUp, position:0.25;0.75) is invoked without a {} and without a new line. In this case the script grabs the text before the mid-sentence cut (or mid-clip cut) and adds it to the text of the new clip. In this case there are three videos in one paragraph. They all have different compositions but exactly the same text. That means they produce the same words at the same times, meaning we can use the caption files to choose when to cut the clips. Right now I use the caption files made by HeyGen which don't have word level precision. The script tries to find a best match and cut there. For example, the first cut will try to find the closest timestamp to the phrase "Slides advance in order with" in the captions and cut there. The videos are then trimmed and concatenated together. This gives gives the video a more organic feel, I think. I do not currently support switching between voices in a midline cut. It does not work without word-level timestamp information and my best solution so far is to load up a heavy openAI model, which isn't fast, efficient or cost-effective for demo purposes.


Please note that HeyGen seems to have caught on to my strategy of generating an arbitrary number of free trial api keys by using Apple's hide my email function. Calls from those APIs made with a jupyter notebook seem to take forever (I mean about 20 minutes) whereas on my laptop they take about 2 minutes. I need a non-trial APi key to keep doing this at scale. Keep that in mind when using the tool

## Important!

Please run the cell below before anything with ffmpeg. This is how you install ffmpeg on a binder notebook, thanks to [stack overflow](https://stackoverflow.com/questions/72217039/ffmpeg-and-jupyter-notebooks)

In [None]:
exist = !which ffmpeg
if not exist:
  !curl https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz -o ffmpeg.tar.xz \
     && tar -xf ffmpeg.tar.xz && rm ffmpeg.tar.xz
  ffmdir = !find . -iname ffmpeg-*-static
  path = %env PATH
  path = path + ':' + ffmdir[0]
  %env PATH $path
print('')
!which ffmpeg
print('Done!')

Ok, we're ready to get started. Enter the name of the file you wish to use below. I recommend using one of the files in the notebook_scripts folder. I will not being using those API keys so they are less likely to have hit their 5-clips-a-day limit. You can also use any file with your own API key if you have one.

In [None]:
filepath = "./notebook_scripts/notebook_script5.md" # "./notebook_scripts/notebook_script1.md"

In [None]:
! rm -rf *.mp4 *.jpg *.ass .
! mkdir assets
from parse import parse_from_file
from upload import upload_script, parse_upload_response, get_slides, get_avatar_clips
from compose import compose_scenes
import sys
import urllib.request
from transition import transitions
import ffmpeg
import time

script = parse_from_file(filepath)
if script:
    responses = upload_script(script)

    # parse the response content into the scenes - literally just the avatar video ids
    script = parse_upload_response(responses, script)

    # get the slides 
    script = get_slides(script, "./assets/")

    # then go get the links from the videos and download the clips. hopefully they've rendered by now
    #time.sleep(1500)
    script = get_avatar_clips(script, "./assets/")
    print(script)

    # compose the scenes
    script = compose_scenes(script)
    # transitions
    (script, v, a, v_d, a_d) = transitions(script)
    # output video
    ffmpeg.output(v,a, script[0]["Lecture Name"]+".mp4", vcodec="h264", pix_fmt='yuv420p', crf=18, preset="veryslow", **{'b:a': '192k'}).run(overwrite_output=True)
    
    # presumably response has the URL of the pending video. for each of the clips get the url. for each one, download it.
    # can't do this section without higher API limit yet
    #print(responses)
else:
    print(script)