In [None]:
{
    "nbformat": 4,
    "nbformat_minor": 0,
    "metadata": {
        "colab": {
            "name": "maua-stylegan2-audioreactive.ipynb",
            "private_outputs": true,
            "provenance": [],
            "collapsed_sections": [
                "MuglreF3FyOL",
                "Nfo33A6k0Mmg",
                "B7-Fzxc-50LQ",
                "-awYvRYAO2IM"
            ],
            "toc_visible": true,
            "machine_shape": "hm",
            "authorship_tag": "ABX9TyPNfCWcsaxpIfrkXrIdSb+j",
            "include_colab_link": true
        },
        "kernelspec": {
            "name": "python3",
            "display_name": "Python 3"
        },
        "language_info": {
            "name": "python"
        },
        "accelerator": "GPU"
    },
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "view-in-github",
                "colab_type": "text"
            },
            "source": [
                "<a href=\"https://colab.research.google.com/github/dvschultz/ml-art-colabs/blob/master/maua_stylegan2_audioreactive.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "wnJVFJR-uiI5"
            },
            "source": [
                "# Audioreactive video with StyleGAN\n",
                "\n",
                "This notebook will walk thru the various tools and techniques in Hans Brouwer’s **Audio-reactive Latent Interpolations with StyleGAN**.\n",
                "\n",
                "Hans has a great number of resources that were hugely helpful in learning about what’s below: [Research Paper](https://wavefunk.xyz/assets/audio-reactive-stylegan/paper.pdf) | [Blog Post](https://wavefunk.xyz/audio-reactive-stylegan) | https://github.com/JCBrouwer/maua-stylegan2/\n",
                "\n",
                "Hans already has a [really nice notebook](https://colab.research.google.com/drive/1Ig1EXfmBC01qik11Q32P0ZffFtNipiBR#scrollTo=fOde375CrLZ0) but I wanted to walk through my process and tools in my own way.\n",
                "\n",
                "\n",
                "\n",
                "---\n",
                "\n",
                "**The best way to understand this notebook is to follow along with the video tutorial created in conjuction with it.**\n",
                "\n",
                "This notebook was created for my Advanced StyleGAN class taught in the summer of 2021. If you find this notebook and content useful, please consider signing up for my [Patreon](https://www.patreon.com/bustbright) or [YouTube channel](https://www.youtube.com/channel/UCaZuPdmZ380SFUMKHVsv_AA/join). You can also send me a one-time payment on [Venmo](https://venmo.com/Derrick-Schultz).\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "-pXimik0u2IX"
            },
            "source": [
                "Let’s start. Did we get a V100? Hopefully! But a P100 will work too. Anything else and we might run into issues. (Yes, you should just pay for Colab Pro, it’s $10/month.)"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "fDWD-zaj4--f"
            },
            "source": [
                "!nvidia-smi -L"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "mNCpXVBuu7Cz"
            },
            "source": [
                "Install Hans Brouwer’s repo and related libraries"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "gz5C38_v1HaC"
            },
            "source": [
                "!git clone https://github.com/dvschultz/maua-stylegan2\n",
                "%cd maua-stylegan2/\n",
                "!pip install ninja madmom kornia ffmpeg-python cython"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "yV6PWMSz5WLR"
            },
            "source": [
                "## Upload model and audio files\n",
                "\n",
                "Your model file must be in the format of a Rosinality `.pt` model. If you trained your model from a Tensorflow repo or the newer ADA PyTorch model you can convert your model to this version in [this notebook](https://colab.research.google.com/github/dvschultz/stylegan2-ada-pytorch/blob/main/SG2_ADA_PT_to_Rosinality.ipynb).\n",
                "\n",
                "For audio, I recommend uploading `.wav` files. You may also want to look at a tool like Spleeter or Demucs to separate layers of a song. A short demo of Demucs follows this section.\n",
                "\n",
                "If you don’t have a model to use, you can download a version of my model trained using the art of [Frea Buckler](https://www.instagram.com/freabuckler/). **Please do not use this model for any commercial work.**"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "bMFrXeq_pYQt"
            },
            "source": [
                "# Derrick Schultz's Frea Buckler network (from awesome-pretrained-stylegan2)\n",
                "!gdown https://drive.google.com/u/0/uc?id=1YzZemZAp7BVW701_BZ7uabJWJJaS2g7v"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "Jk9C6ixEmOLj"
            },
            "source": [
                "!gdown --id 1huJHdsDlj6x50j_uI1wvsIY8zW6O4lVb -O /content/freagan.pt\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "iEaOnTThzMTd"
            },
            "source": [
                "The audio files used in this demo were made available as a part of a remix contest. I don’t believe they are available any longer, sorry."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "WU8jRnX4UT88"
            },
            "source": [
                "### Demucs\n",
                "\n",
                "Demucs is a machine learning model from Facebook that does audio source seperation. That means it will take an audio file and split it into separate files for drums, bass, melody, and vocals. I find the results fairly hit or miss but try it out.\n",
                "\n",
                "A better option would be to find music that is available as \"stems\": recordings that are separated during the recording process and have much higher fidelity. You can often find stems for electronic music on Beatport, Splice, or other remix contest sites.\n",
                "\n",
                "Demucs is pretty straightforward to use so I’ll include it here. "
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "-lJj-20iUiVQ"
            },
            "source": [
                "!pip install demucs musdb museval"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "TUchmZKXUr85"
            },
            "source": [
                "!python -m demucs.separate -h"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "xqvJy4nbUfiY"
            },
            "source": [
                "!python -m demucs.separate /content/IKnowU_all-60s.wav -n demucs48_hq --shifts 10"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "pVH4mGID_HWz"
            },
            "source": [
                "## Visualize Audio\n",
                "\n",
                "Let’s first load our audio using a library called librosa."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "MuglreF3FyOL"
            },
            "source": [
                "### Aside: Shorten Audio\n",
                "\n",
                "The longer your audio clip the longer it takes to render the video. While experimenting with different settings you may want to use a shorter clip of audio to use. We can use ffmpeg to create a shorter section.\n",
                "\n",
                "Don’t make your audio too short. Something 30-60 seconds is usually good."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "N1demhrkGFRr"
            },
            "source": [
                "audio_path = \"/content/IKnowU_all.wav\"\n",
                "output_path = \"/content/IKnowU_all-30s.wav\"\n",
                "start_seconds = 30\n",
                "end_seconds = 60\n",
                "!ffmpeg -i {audio_path} -af \"atrim={start_seconds}:{end_seconds}\" {output_path}"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "Nfo33A6k0Mmg"
            },
            "source": [
                "### Visualize Chromagraphs and Onsets"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "Tk-VLJtEeFnT"
            },
            "source": [
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "import numpy as np\n",
                "\n",
                "audiofile = \"/content/IKnowU_all.wav\"\n",
                "melody_audiofile = \"/content/IKnowU_melody.wav\"\n",
                "drums_audiofile = \"/content/IKnowU_drums.wav\"\n",
                "bass_audiofile = \"/content/IKnowU_bass.wav\"\n",
                "vox_audiofile = \"/content/IKnowU_bass.wav\"\n",
                "duration = 120 #in seconds\n",
                "fps = 30\n",
                "n = int(round(duration*fps))\n",
                "\n",
                "audio, sr = rosa.load(audiofile, offset=0, duration=n)\n",
                "melody, sr = rosa.load(melody_audiofile, offset=0, duration=n)\n",
                "drums, sr = rosa.load(drums_audiofile, offset=0, duration=n)\n",
                "bass, sr = rosa.load(bass_audiofile, offset=0, duration=n)\n",
                "vox, sr = rosa.load(vox_audiofile, offset=0, duration=n)"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "ShCvd32jfub5"
            },
            "source": [
                "The first thing we might want to do is look at the waveform of the audio. This might provide us with something like \"volume\" over time."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "MBy4e8RmeDWo"
            },
            "source": [
                "%matplotlib inline\n",
                "import matplotlib.pyplot as plt\n",
                "\n",
                "plt.figure(figsize=(14, 5))\n",
                "#plotting the sampled signal\n",
                "rosa.display.waveplot(audio, sr=sr)"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "-lzvo2I7NuYb"
            },
            "source": [
                "plt.figure(figsize=(14, 5))\n",
                "rosa.display.waveplot(melody, sr=sr)"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "dEJfMuitf71t"
            },
            "source": [
                "Hans’s article, however, states there are really two main ways we’ll want to use the audio track: chromagraph and onsets.\n",
                "\n",
                "A chromagraph will give us an approximation of 12 tones (the Western scale)."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "9eYcxUieACqY"
            },
            "source": [
                "chroma = ar.chroma(audio, 22050, n)\n",
                "print(\"chroma, all:\")\n",
                "ar.plot_spectra([chroma], chroma=True)\n",
                "\n",
                "melody_chroma = ar.chroma(melody, 22050, n)\n",
                "print(\"chroma, melody:\")\n",
                "ar.plot_spectra([melody_chroma], chroma=True)"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "T5NXTfgGG5pX"
            },
            "source": [
                "Onsets will show us the rhythms/spikes in audio changes. Onsets can also look at specific frquency ranges"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "qq3aikwCHLkT"
            },
            "source": [
                "hi_onsets = ar.onsets(audio, 22050, n, fmin=150, smooth=3)\n",
                "hi_onsets_sm = ar.onsets(audio, 22050, n, fmin=150, smooth=10)\n",
                "lo_onsets = ar.onsets(audio, 22050, n, fmax=150, smooth=3)\n",
                "lo_onsets_sm = ar.onsets(audio, 22050, n, fmax=150, smooth=20)\n",
                "\n",
                "\n",
                "print(\"onsets:\")\n",
                "ar.plot_signals([hi_onsets, hi_onsets_sm, lo_onsets, lo_onsets_sm])"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "1XxCLtRlPC4U"
            },
            "source": [
                "hi_onsets = ar.onsets(drums, 22050, n, fmin=1000, smooth=3)\n",
                "hi_onsets_sm = ar.onsets(drums, 22050, n, fmin=1000, smooth=10)\n",
                "lo_onsets = ar.onsets(drums, 22050, n, fmax=250, smooth=3)\n",
                "lo_onsets_sm = ar.onsets(drums, 22050, n, fmax=250, smooth=10)\n",
                "\n",
                "bass_scaled = lo_onsets * 0.2\n",
                "\n",
                "print(\"onsets:\")\n",
                "# ar.plot_signals([hi_onsets, hi_onsets_sm, lo_onsets, lo_onsets_sm])\n",
                "ar.plot_signals([lo_onsets, bass_scaled])"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "tdNqDblu9XIv"
            },
            "source": [
                "bass_onsets = ar.onsets(bass, 22050, n, fmax=500, smooth=3)\n",
                "bass_onsets_sm = ar.onsets(bass, 22050, n, fmax=500, smooth=20)\n",
                "\n",
                "print(\"onsets:\")\n",
                "ar.plot_signals([bass_onsets, bass_onsets_sm])"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "bh3gQvv4Mgu4"
            },
            "source": [
                "vox_onsets = ar.onsets(bass, 22050, n, fmin=500, smooth=3)\n",
                "vox_onsets_sm = ar.onsets(bass, 22050, n, fmin=500, smooth=20)\n",
                "\n",
                "print(\"onsets:\")\n",
                "ar.plot_signals([vox_onsets, vox_onsets_sm])\n",
                "print(vox_onsets.shape)"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "B7-Fzxc-50LQ"
            },
            "source": [
                "## Basic example\n",
                "\n",
                "Let’s start by generating a video using the defaults in the notebook. This assumes you have a single audio track.\n",
                "\n",
                "Let’s run `--help` to see what options we have."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "Tn6VAYZD52Kw"
            },
            "source": [
                "!python generate_audiovisual.py --help"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "kiP-t-9xz3W8"
            },
            "source": [
                "We can run a pretty basic video generation script below (swap out the path to your audio file after the `--audio_file` argument):"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "eyzC47Qb1akU"
            },
            "source": [
                "!python generate_audiovisual.py --ckpt \"/content/freagan.pt\" --audio_file \"/content/IKnowU_all.wav\" --output_dir '/content/output'"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "juhA_EsJj0yP"
            },
            "source": [
                "Here’s what my output looked like. You’ll notice it sometimes picks up the correct drum beats, but its somewhat inconsistent. This is likely due to the chromagraph being process over a mixed audio file rather than separating out the individual instruments and processing them separately."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "-awYvRYAO2IM"
            },
            "source": [
                "## Customize the output\n",
                "\n",
                "This creates a pretty decent video, but we can do better. Now let’s look at customizing a couple of these functions."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "kwXqe9tFPFWv"
            },
            "source": [
                "!mkdir /content/maua-stylegan2/custom/\n",
                "!cp /content/maua-stylegan2/audioreactive/examples/default.py /content/maua-stylegan2/custom/custom.py"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "cn3if3aEjQRQ"
            },
            "source": [
                "Hans recommends doing a garbage collection process every time we redo a visualization"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "zr-YiDZHjV96"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "b9E8t22-hFLO"
            },
            "source": [
                "We’ll start by just building up our chromagraph so we have nice smooth latents that correspond to tones. Hans’s examples also add a layer of onsets. I made [an example video without (left) and with (right) onsets](https://drive.google.com/file/d/1_1G0y-hgHkX8zfsMxhtJmoV8sVUU8WZy/view?usp=sharing) so you can see the difference. Onsets help pick up small details in the track so we’ll includ them below."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "DbKxjidv5MNw"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "\n",
                "def initialize(args):\n",
                "    # Use just the melody file so the vocals, bass, and drums don't interfere\n",
                "    melody_audiofile = \"/content/IKnowU_melody-60s.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, selection)\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    # we can use onsets to capture quick changes/stabs in the melody\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * selection[[-6]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * selection[[-10]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    # we'll look at noise later\n",
                "\n",
                "    return None\n",
                "\n",
                "def get_bends(args):\n",
                "    bends = []\n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):\n",
                "    #fixed truncation\n",
                "    truncation = 0.7\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "1weelZ5kwdDr"
            },
            "source": [
                "Let’s run this. Using `--manual_seed` will give us the control to make minor changes to our code while still getting the same vectors (or we can change that value and get totally new vectors)"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "0CvaiQJWPmSQ"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_melody-60s.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/chroma-melody-v4-onsets-deep.mp4\" \\\n",
                "--out_size 1024 \\\n",
                "--manual_seed 0"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "ay8mPwKsBYWG"
            },
            "source": [
                "[This video](https://drive.google.com/file/d/15-sM_QhTdxRvZVN1tF3VD7A91UV1j0yT/view?usp=sharing) will show the difference between our default example and choosing custom vectors"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "lYsHKI735fed"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "NJB_p8A6wK_j"
            },
            "source": [
                "## Using custom vectors\n",
                "This is nice, but what if we wanted to pick our own vectors?\n",
                "\n",
                "(*See the bottom of this notebook for a way to generate images from seeds to make your choices*)"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "IeisQKuGbTFS"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "def initialize(args):\n",
                "    melody_audiofile = \"/content/IKnowU_melody-60s.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    # selection gives us the randomized vectors. lets skip this and make our own\n",
                "    # chroma_latents = ar.chroma_weight_latents(chroma, selection)\n",
                "\n",
                "    generator = Generator(\n",
                "        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                "    ).cuda()\n",
                "\n",
                "    # create custom vectors\n",
                "    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]\n",
                "    saved = []\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        zs = th.randn((1, 512), device=\"cuda\")\n",
                "        saved.append(zs)\n",
                "    zs = th.cat((saved),0)\n",
                "    # convert zs to ws\n",
                "    custom_vectors = generator(zs, map_latents=True).cpu()\n",
                "\n",
                "    #back to our regularly scheduled program...\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) \n",
                "\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    # we'll look at noise later\n",
                "    return None\n",
                "\n",
                "def get_bends(args):\n",
                "    bends = []\n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):\n",
                "    #fixed truncation\n",
                "    truncation = 0.7\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "wFdyTBH0oApl"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_melody-60s.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/custom-vectors.mp4\" \\\n",
                "--out_size 1024 \\\n",
                "--manual_seed 0 #this is technically unnecessary now"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "BPGQGycxChHF"
            },
            "source": [
                "### Example\n",
                "\n",
                "[Here’s where we stand.](https://drive.google.com/file/d/1-W9p6E2VSiLRONSEQLCtWyvZIIYUr4ze/view?usp=sharing) On the left is the default vectors, on the right is customized to my choices."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "ky9C0orl32na"
            },
            "source": [
                "### Aside: an alternate to hard-coding this directly into the custom script\n",
                "Alternately you can pass in a list of vectors in a `.npy` file to the script to define the vectors you want to use. In fact a nice little feature of this tool is that your last used vectors are saved in a file at `/content/maua-stylegan2/workspace/last-latents.npy` (well, not the custom vectors above because we’re hacking around it.)\n",
                "\n",
                "(Note: if you run this make sure you remove/comment out the code in the cell above that generates custom vectors)"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "938NbtQU4n4g"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_melody-60s.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/custom-vectors.mp4\" \\\n",
                "--out_size 1024 \\\n",
                "--latent_file '/content/maua-stylegan2/workspace/last-latents.npy' # or another custom file with 12 vectors"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "M2yaVSIj5iVW"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "fh63S_zc5jWH"
            },
            "source": [
                "\n",
                "## I want to feel/see the bass \n",
                "\n",
                "Ok, we’ve got a nice little melody/interpolation going here now. But we can do a lot more. My song has a lot of bass in it, so I want to emphasize that. I think about when that bass drops, and I think about truncation. So let’s match the onsets of some bass to truncation levels "
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "MnQQZ1AQoC-p"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "def initialize(args):\n",
                "    melody_audiofile = \"/content/IKnowU_melody-60s.wav\"\n",
                "    bass_audiofile = \"/content/IKnowU_bass-60s.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    # bass onsets\n",
                "    # I only really need one here\n",
                "    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    # selection gives us the randomized vectors. lets skip this and make our own\n",
                "    # chroma_latents = ar.chroma_weight_latents(chroma, selection)\n",
                "\n",
                "    generator = Generator(\n",
                "        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                "    ).cuda()\n",
                "\n",
                "    # create custom vectors\n",
                "    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]\n",
                "    saved = []\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        zs = th.randn((1, 512), device=\"cuda\")\n",
                "        saved.append(zs)\n",
                "    zs = th.cat((saved),0)\n",
                "    # convert zs to ws\n",
                "    custom_vectors = generator(zs, map_latents=True).cpu()\n",
                "\n",
                "    #back to our regularly scheduled program...\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) \n",
                "\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    # we'll look at noise later\n",
                "    # if width > 256:\n",
                "    #     return None\n",
                "\n",
                "    # lo_onsets = args.lo_onsets[:, None, None, None].cuda()\n",
                "    # hi_onsets = args.hi_onsets[:, None, None, None].cuda()\n",
                "\n",
                "    # noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 5)\n",
                "    # noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 128)\n",
                "\n",
                "    # if width < 128:\n",
                "    #     noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise\n",
                "    # if width > 32:\n",
                "    #     noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise\n",
                "\n",
                "    # noise /= noise.std() * 2.5\n",
                "\n",
                "    return None\n",
                "\n",
                "def get_bends(args):\n",
                "    bends = []\n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):\n",
                "    # lets use truncation to emphasize bass\n",
                "    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2\n",
                "    truncation = args.bass_onsets*1.5 + 0.7\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "rhvPTbp87ngM"
            },
            "source": [
                "Let’s run this and check out our results.\n",
                "\n",
                "(Another note: I’ve combined my melody and bass tracks into one in ffmpeg. So I’m referencing that in `--audio_file` so its used in the final render, but I’m using the individual tracks to generate my chromagraph and onsets)"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "lxgerXyQXwa_"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_bass+melody-60s.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/bass-truncation.mp4\" \\\n",
                "--out_size 1024"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "99dzec8qG1mR"
            },
            "source": [
                "### Example\n",
                "\n",
                "[Here’s a video](https://drive.google.com/file/d/1tq3tKkV-FAlE2vl6-s0vFgaHvCYMhGwb/view?usp=sharing) showing the bass influencing the truncation. It’s fairly subtle, but if you watch toward the end you will see some differences."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "WPo5nTtjdt46"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "dhidx4vW-6pI"
            },
            "source": [
                "## Mapping Drums\n",
                "Now we have all of our melodic reactions in, let’s look at how to use drums. You could do a lot with all the various drum pieces, but I’ll do two things. Let’s start by using them to influence the noise.\n",
                "\n",
                "Noise in StyleGAN can change a lot model to model. In most cases it changes the grain of images (or the hair textures in something like FFHQ) but sometimes it can affect more important details. You might want to test it a bit on your model (one trick is to isolate the latent vectors and truncation and only apply noise to your video to see what happens)\n",
                "\n",
                "[This video](https://drive.google.com/file/d/1sQksgVtgN7bJX8cjaYMCPamM8eVLpocb/view?usp=sharing) shows what audioreactive noise looks like when it is isolated without any other effects."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "8x-ooawhMngO"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "def initialize(args):\n",
                "    melody_audiofile = \"/content/IKnowU_melody-60s.wav\"\n",
                "    bass_audiofile = \"/content/IKnowU_bass-60s.wav\"\n",
                "    drums_audiofile = \"/content/IKnowU_drums-60s.wav\"\n",
                "    all_audiofile = \"/content/IKnowU_all-60s.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    # bass onsets\n",
                "    # I only really need one here\n",
                "    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)\n",
                "\n",
                "    # drums onsets\n",
                "    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    # all onsets\n",
                "    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "    # to isolate noise, uncomment the following line and comment everything else in this function out (except the return line)\n",
                "    # latents = selection[0].repeat(args.n_frames,1,1)\n",
                "\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    generator = Generator(\n",
                "        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                "    ).cuda()\n",
                "\n",
                "    # create custom vectors\n",
                "    seeds = [13,21,7,10,42,176,44,53,60,76,92,140]\n",
                "    saved = []\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        zs = th.randn((1, 512), device=\"cuda\")\n",
                "        saved.append(zs)\n",
                "    zs = th.cat((saved),0)\n",
                "    # convert zs to ws\n",
                "    custom_vectors = generator(zs, map_latents=True).cpu()\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) \n",
                "\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    if width > 512:\n",
                "        return None\n",
                "\n",
                "    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()\n",
                "    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()\n",
                "\n",
                "    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 5)\n",
                "    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 128)\n",
                "\n",
                "    if width < 128:\n",
                "        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise\n",
                "    if width > 32:\n",
                "        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise\n",
                "\n",
                "    noise /= noise.std() * 2.5\n",
                "\n",
                "    return noise.cpu()\n",
                "\n",
                "def get_bends(args):\n",
                "    bends = []\n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):\n",
                "    # to isolate noise, uncomment the following line and comment everything else out (except the return line)\n",
                "    # truncation = 0.7\n",
                "    \n",
                "    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2\n",
                "    truncation = args.bass_onsets*1.5 + 0.7\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "ig8p58HCBow1"
            },
            "source": [
                "I think at this point we can add a section the whole song in to see what it looks like."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "9y44KhkMNji4"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_all-60s.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/plus-noise.mp4\" \\\n",
                "--out_size 1024"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "VKOO1xOdAiha"
            },
            "source": [
                "### Example\n",
                "\n",
                "[This video](https://drive.google.com/file/d/15-sM_QhTdxRvZVN1tF3VD7A91UV1j0yT/view?usp=sharing) will compare default noise (left) with audioreactive noise (right). You’ll notice there is still noise in the default video—that’s because StyleGAN slways includes some noise in its model. But in the custom video we can control it and use it when desired."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "kOatwBVMNmbO"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "F-sqUzNmM_8K"
            },
            "source": [
                "## Network Bends\n",
                "I like the drums, but I kinda want them do a little something more. Let’s look at making the kick and snare do a little zoom into the frame using a network bend.\n",
                "\n",
                "Hans has three built-in bend options: Zoom, Translate, and Rotate. They’re a little finicky, so I’ve only implemented Zoom here."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "jzEXEcWtMB06"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "from functools import partial\n",
                "\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "def initialize(args):\n",
                "    melody_audiofile = \"/content/IKnowU_melody.wav\"\n",
                "    bass_audiofile = \"/content/IKnowU_bass.wav\"\n",
                "    drums_audiofile = \"/content/IKnowU_drums.wav\"\n",
                "    all_audiofile = \"/content/IKnowU_all.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    # bass onsets\n",
                "    # I only really need one here\n",
                "    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)\n",
                "\n",
                "    # drums onsets\n",
                "    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    # all onsets\n",
                "    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    generator = Generator(\n",
                "        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                "    ).cuda()\n",
                "\n",
                "    # create custom vectors\n",
                "    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]\n",
                "    saved = []\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        zs = th.randn((1, 512), device=\"cuda\")\n",
                "        saved.append(zs)\n",
                "    zs = th.cat((saved),0)\n",
                "    # convert zs to ws\n",
                "    custom_vectors = generator(zs, map_latents=True).cpu()\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) \n",
                "\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    if width > 256:\n",
                "        return None\n",
                "\n",
                "    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()\n",
                "    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()\n",
                "\n",
                "    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 5)\n",
                "    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 128)\n",
                "\n",
                "    if width < 128:\n",
                "        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise\n",
                "    if width > 32:\n",
                "        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise\n",
                "\n",
                "    noise /= noise.std() * 2.5\n",
                "\n",
                "    return noise.cpu()\n",
                "\n",
                "def get_bends(args):\n",
                "\n",
                "    # transform = th.nn.Sequential(\n",
                "    #     th.nn.ReplicationPad2d((2, 2, 2, 2)), ar.AddNoise(0.025 * th.randn(size=(1, 1, 4, 8), device=\"cuda\")),\n",
                "    # )\n",
                "    # bends = [{\"layer\": 0, \"transform\": transform}]\n",
                "\n",
                "    # we'll use just the lo onset so its the kick drum isolated only\n",
                "    hi_onsets = args.drums_lo_onsets\n",
                "    lo_onsets = args.drums_lo_onsets\n",
                "    scaled_onsets = (lo_onsets * 0.75) + 1.0 # switch from range of (0.0, 1.0) to (1.0,1.75) \n",
                "    scaled_onsets += (hi_onsets * 0.75) # we'll make the snares a little more punchy\n",
                "\n",
                "    # don't worry about this too much but we want to apply this bend to a low layer of the stylegan model\n",
                "    # apply network bending to second layer in StyleGAN\n",
                "    # lower layer network bends have more fluid outcomes\n",
                "    tl = 4\n",
                "    h = 2 ** tl\n",
                "    w = h\n",
                "\n",
                "    translation = scaled_onsets.unsqueeze(1)\n",
                "    transform = lambda batch: partial(ar.Zoom, h=h, w=w)(batch)\n",
                "    bends = [{\"layer\": tl, \"transform\": transform, \"modulation\": translation}]  # add network bend to list dict\n",
                "\n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):    \n",
                "    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2\n",
                "    truncation = (args.bass_onsets * 1.5) + 0.7\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "IM1kJjb4-kjS"
            },
            "source": [
                "At this point I’m going to render the full song. You may find that the section of audio you rendered previously looks slightly different now—that’s an indication that the rest of the audio has additional peaks not accounted for previously when you process the onsets on the longer source. You may need to go back and edit some of your previous settings if so."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "gF0tizYFPAgZ"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_all.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/add-bends.mp4\" \\\n",
                "--out_size 1024"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "ekhGH3GjEzIe"
            },
            "source": [
                "### Example\n",
                "[Video example is here](https://drive.google.com/file/d/1guHgynifzYeLmu-FkHxICtDdp-tvUEiC/view?usp=sharing). One interesting effect of using the whole video is that the bass kick early in the song don’t have as much affect as they do later."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "zeG0G-J4Pq23"
            },
            "source": [
                "print(\"Time                     GPU        Used      Total\")\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader\n",
                "import gc\n",
                "import torch\n",
                "gc.collect()\n",
                "torch.cuda.empty_cache()\n",
                "!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "JyKxMqlDPjx0"
            },
            "source": [
                "## Using Feature Vectors\n",
                "\n",
                "Ok this is looking really good! We have this vocal file, so we might as well use it. We’ve previously covered feature vectors, so I’m going to have the vocals activate a single feature vector.\n",
                "\n",
                "(*See more in Appendix 2 at the bottom about trying out feature vectors*)\n"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "y4G-CSDIPoDL"
            },
            "source": [
                "%%writefile /content/maua-stylegan2/custom/custom.py\n",
                "from functools import partial\n",
                "\n",
                "import torch as th\n",
                "\n",
                "import librosa as rosa\n",
                "import audioreactive as ar\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "def initialize(args):\n",
                "    melody_audiofile = \"/content/IKnowU_melody.wav\"\n",
                "    bass_audiofile = \"/content/IKnowU_bass.wav\"\n",
                "    drums_audiofile = \"/content/IKnowU_drums.wav\"\n",
                "    all_audiofile = \"/content/IKnowU_all.wav\"\n",
                "    vox_audiofile = \"/content/IKnowU_vox.wav\"\n",
                "    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)\n",
                "    args.vox, sr = rosa.load(vox_audiofile, offset=0, duration=args.n_frames)\n",
                "\n",
                "    # melody onsets\n",
                "    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)\n",
                "\n",
                "    # bass onsets\n",
                "    # I only really need one here\n",
                "    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)\n",
                "    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)\n",
                "\n",
                "    # drums onsets\n",
                "    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    # all onsets\n",
                "    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)\n",
                "\n",
                "    # vox onsets\n",
                "    # lets skip the bass onsets\n",
                "    args.vox_hi_onsets = ar.onsets(args.vox, 22050, args.n_frames, fmin=250, smooth=3)\n",
                "    print(args.vox_hi_onsets.shape)\n",
                "\n",
                "    return args\n",
                "\n",
                "\n",
                "def get_latents(selection, args):\n",
                "\n",
                "    chroma = ar.chroma(args.melody, args.sr, args.n_frames)\n",
                "    generator = Generator(\n",
                "        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                "    ).cuda()\n",
                "\n",
                "    # create custom vectors\n",
                "    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]\n",
                "    saved = []\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        zs = th.randn((1, 512), device=\"cuda\")\n",
                "        saved.append(zs)\n",
                "    zs = th.cat((saved),0)\n",
                "    # convert zs to ws\n",
                "    custom_vectors = generator(zs, map_latents=True).cpu()\n",
                "    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) \n",
                "\n",
                "    latents = ar.gaussian_filter(chroma_latents, 4)\n",
                "\n",
                "    lo_onsets = args.mel_lo_onsets[:, None, None]\n",
                "    hi_onsets = args.mel_hi_onsets[:, None, None]\n",
                "\n",
                "    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents\n",
                "    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents\n",
                "\n",
                "    latents = ar.gaussian_filter(latents, 2, causal=0.2)\n",
                "\n",
                "    # read in feature vector file\n",
                "    eigvec = th.load('/content/freagan-factor.pt')[\"eigvec\"].to(\"cuda\")\n",
                "    # I chose vector 17 after looking thru some feature vectors\n",
                "    direction = eigvec[:, 17].unsqueeze(0)\n",
                "\n",
                "    # convert direction to w vector\n",
                "    direction_w = generator(direction, map_latents=True).repeat(args.n_frames,1,1).cpu() # [1,18,512] to [n_frames,18,512]\n",
                "    \n",
                "    # 10 is the strength of the effect the vector will have, you can make it smaller or larger\n",
                "    vox_hi_onsets = args.vox_hi_onsets[:, None, None]\n",
                "    print(vox_hi_onsets.shape)\n",
                "    dist = 10 * vox_hi_onsets * direction_w \n",
                "\n",
                "    latents += dist\n",
                "\n",
                "    return latents\n",
                "\n",
                "\n",
                "def get_noise(height, width, scale, num_scales, args):\n",
                "    if width > 256:\n",
                "        return None\n",
                "\n",
                "    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()\n",
                "    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()\n",
                "\n",
                "    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 5)\n",
                "    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device=\"cuda\"), 128)\n",
                "\n",
                "    if width < 128:\n",
                "        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise\n",
                "    if width > 32:\n",
                "        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise\n",
                "\n",
                "    noise /= noise.std() * 2.5\n",
                "\n",
                "    return noise.cpu()\n",
                "\n",
                "def get_bends(args):\n",
                "\n",
                "    # transform = th.nn.Sequential(\n",
                "    #     th.nn.ReplicationPad2d((2, 2, 2, 2)), ar.AddNoise(0.025 * th.randn(size=(1, 1, 4, 8), device=\"cuda\")),\n",
                "    # )\n",
                "    # bends = [{\"layer\": 0, \"transform\": transform}]\n",
                "\n",
                "    # we'll use just the lo onset so its the kick drum isolated only\n",
                "    hi_onsets = args.drums_lo_onsets\n",
                "    lo_onsets = args.drums_lo_onsets\n",
                "    scaled_onsets = (lo_onsets * 0.75) + 1.0 # switch from range of (0.0, 1.0) to (1.0,1.75) \n",
                "    scaled_onsets += (hi_onsets * 0.75) # we'll make the snares a little more punchy\n",
                "\n",
                "    # don't worry about this too much but we want to apply this bend to a low layer of the stylegan model\n",
                "    # apply network bending to second layer in StyleGAN\n",
                "    # lower layer network bends have more fluid outcomes\n",
                "    tl = 4\n",
                "    h = 2 ** tl\n",
                "    w = h\n",
                "\n",
                "    translation = scaled_onsets.unsqueeze(1)\n",
                "    transform = lambda batch: partial(ar.Zoom, h=h, w=w)(batch)\n",
                "    bends = [{\"layer\": tl, \"transform\": transform, \"modulation\": translation}]  # add network bend to list dict\n",
                "    \n",
                "    return bends\n",
                "\n",
                "def get_rewrites(args):\n",
                "    rewrites = {}\n",
                "    return rewrites\n",
                "\n",
                "def get_truncation(args):    \n",
                "    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2\n",
                "    truncation = (args.bass_onsets * 1.5) + 0.7\n",
                "\n",
                "    return truncation\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "7uApB45okRLX"
            },
            "source": [
                "## Final Render\n",
                "Let’s do the final render. If you know this is your final video, I recommend using `--ffmpeg_preset veryslow`. This will take a little longer to render but will lead to better quality video and a smaller filesize."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "OwNYodxyPxj8"
            },
            "source": [
                "!python generate_audiovisual.py \\\n",
                "--ckpt \"/content/freagan.pt\" \\\n",
                "--audio_file \"/content/IKnowU_all.wav\" \\\n",
                "--audioreactive_file \"custom/custom.py\" \\\n",
                "--output_dir '/content/output' \\\n",
                "--output_file \"/content/output/add-feature-vectors.mp4\" \\\n",
                "--ffmpeg_preset veryslow \\\n",
                "--out_size 1024"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "S2FauqXXB0LU"
            },
            "source": [
                "### Example\n",
                "\n",
                "[Here’s the last step of our video.](https://drive.google.com/file/d/1BwHWovSi6V_MPe55JHOm8HVG6goT_XDz/view?usp=sharing)\n",
                "\n",
                "### Final Comparison\n",
                "\n",
                "If you want to see the difference from the default vidoe to the custom version you can [see the final output on my Vimeo channel](https://vimeo.com/560688589)."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "qpyxSWf1edJV"
            },
            "source": [
                "# Appendix\n",
                "\n",
                "Some tools not completely related to the audioreactive tool, but helpful for various components of it.\n",
                "\n",
                "## Appendix 1: Generating Seed Images for Testing\n",
                "\n",
                "It’s possible that a seed chosen in the ADA repo will generate the same image in this repo. But it’s probably safer to test them with this repo entirely. The code below will generate images based on seeds for you to use elsewhere in this notebook."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "Yr6ZssQhPIfh"
            },
            "source": [
                "# set seeds here\n",
                "seeds = range(0,200)\n",
                "truncation = 0.5\n",
                "size = 1024 # edit this if your model is smaller than 1024\n",
                "\n",
                "#you probably don't need to edit anything below here\n",
                "import os\n",
                "import torch as th\n",
                "from torchvision import utils\n",
                "from models.stylegan2 import Generator\n",
                "\n",
                "os.makedirs('/content/seeds',exist_ok=True)\n",
                "\n",
                "generator = Generator(\n",
                "    size, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',\n",
                ").cuda()\n",
                "\n",
                "if truncation < 1:\n",
                "    with th.no_grad():\n",
                "        mean_latent = generator.mean_latent(4096)\n",
                "else:\n",
                "    mean_latent = None\n",
                "\n",
                "with th.no_grad():\n",
                "    generator.eval()\n",
                "    for idx, i in enumerate(seeds):\n",
                "        th.manual_seed(i)\n",
                "        sample_z = th.randn((1,512), device=\"cuda\")\n",
                "\n",
                "        sample, _ = generator(\n",
                "            [sample_z], truncation=truncation, truncation_latent=mean_latent\n",
                "        )\n",
                "\n",
                "        utils.save_image(\n",
                "            sample,\n",
                "            f\"/content/seeds/{str(i).zfill(6)}.jpg\",\n",
                "            nrow=1,\n",
                "            normalize=True,\n",
                "            value_range=(-1, 1),\n",
                "        )"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "jZi-4R18uWA7"
            },
            "source": [
                "Best thing to do is probably zip up your images and download them."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "qEPXXT-cjOc1"
            },
            "source": [
                "!zip -r /content/seeds.zip /content/seeds/"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "_XqrznJquZXs"
            },
            "source": [
                "If you want to redo this with different seeds, run the last line to trash the previous folder."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "B4CLF3UMjrYW"
            },
            "source": [
                "!rm -r /content/seeds"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "BLF063f9PUxu"
            },
            "source": [
                "## Appendix 2: Generating Feature Vectors\n",
                "You can generate feature vectors directly from the rosinality repo. \n",
                "\n",
                "Below we’ll install it."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "qIyJhDSw2LRk"
            },
            "source": [
                "%cd /content/\n",
                "!git clone https://github.com/dvschultz/stylegan2-pytorch\n",
                "%cd /content/stylegan2-pytorch"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "5uAR0Q9Z4ehd"
            },
            "source": [
                "Now we can generate the feature vectors with the following script:"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "MuWUcWk32ZbM"
            },
            "source": [
                "!python closed_form_factorization.py /content/freagan.pt --out /content/freagan-factor.pt"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "3HO-JY0H3WwI"
            },
            "source": [
                "Now we can generate our feature vectors. Because features can operate differently on different images, I highly recommend you choose your seeds using the Appendix 1 before doing this step. And then use those seeds in the first variable below.\n",
                "\n",
                "Then set your feature vectors in `vecs`. You could choose a range of `range(0,511)` to get all of them, but be warned that’s going to be a lot of images to sift through."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "OUnIDrk22d1_"
            },
            "source": [
                "# set seeds, vectors, and truncation\n",
                "seeds = [176,44,53,60,76,92,140,13,21,7,10,42]\n",
                "vecs = range(0,50)\n",
                "truncation = 0.5\n",
                "model_file = '/content/freagan.pt'\n",
                "feature_file = '/content/freagan-factor.pt'\n",
                "\n",
                "# you probably don't need to edit anything below here\n",
                "import os\n",
                "import torch\n",
                "from torchvision import utils\n",
                "from model import Generator\n",
                "\n",
                "os.makedirs('/content/vectors',exist_ok=True)\n",
                "\n",
                "\n",
                "def line_interpolate(zs, steps):\n",
                "   out = []\n",
                "   for i in range(len(zs)-1):\n",
                "    for index in range(steps):\n",
                "     fraction = index/float(steps) \n",
                "     out.append(zs[i+1]*fraction + zs[i]*(1-fraction))\n",
                "   return out\n",
                "\n",
                "eigvec = torch.load(feature_file)[\"eigvec\"].to(\"cuda\")\n",
                "ckpt = torch.load(model_file)\n",
                "g = Generator(1024, 512, 8, channel_multiplier=2).to(\"cuda\")\n",
                "g.load_state_dict(ckpt[\"g_ema\"], strict=False)\n",
                "\n",
                "trunc = g.mean_latent(4096)\n",
                "\n",
                "for idx, i in enumerate(seeds):\n",
                "    torch.manual_seed(i)\n",
                "\n",
                "    latent = torch.randn(1, 512, device=\"cuda\")\n",
                "    latent = g.get_latent(latent)\n",
                "\n",
                "    for v_idx, v in enumerate(vecs):\n",
                "        direction = 10 * eigvec[:, v].unsqueeze(0)\n",
                "\n",
                "        img, _ = g(\n",
                "            [latent],\n",
                "            truncation=truncation,\n",
                "            truncation_latent=trunc,\n",
                "            input_is_latent=True,\n",
                "        )\n",
                "        img1, _ = g(\n",
                "            [latent + direction],\n",
                "            truncation=truncation,\n",
                "            truncation_latent=trunc,\n",
                "            input_is_latent=True,\n",
                "        )\n",
                "        img2, _ = g(\n",
                "            [latent - direction],\n",
                "            truncation=truncation,\n",
                "            truncation_latent=trunc,\n",
                "            input_is_latent=True,\n",
                "        )\n",
                "\n",
                "        grid = utils.save_image(\n",
                "            torch.cat([img1, img, img2], 0),\n",
                "            f\"/content/vectors/index_{v}-degree_10-seed_{i}.jpg\",\n",
                "            normalize=True,\n",
                "            value_range=(-1, 1),\n",
                "            # nrow=args.n_sample,\n",
                "        )\n"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "Yis1zmQV3Bnx"
            },
            "source": [
                "I recommend zipping and downloading these to look at on your desktop."
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "G59e2jXr2wzO"
            },
            "source": [
                "!zip -r /content/vectors.zip /content/vectors/"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "geNkKuaW3yFk"
            },
            "source": [
                "From here I recommend you look thru the images and find a single vector (`index_X`) that you like for all of the seeds. Lower indexes will make drastic modifications, while higher indexes will often exhibit much more subtle changes.\n",
                "\n",
                "If you want to return to the maua repo, make sure you `cd` back to it:"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "3S_FFOEn4aPH"
            },
            "source": [
                "%cd /content/maua-stylegan2/"
            ],
            "execution_count": null,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "utotVm8H288s"
            },
            "source": [
                "Cleanup if you want to run it again with different settings:"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "sz91fcD824n_"
            },
            "source": [
                "!rm -r /content/vectors\n",
                "!rm /content/vectors.zip"
            ],
            "execution_count": null,
            "outputs": []
        }
    ]
}
