# yohane

GitHub project: https://github.com/Japan7/yohane

---

Click the badge below to open the latest version of the notebook:

<a target="_blank" href="https://colab.research.google.com/github/Japan7/yohane/blob/main/notebook/yohane.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

---

**⚠️ Before proceeding, change your runtime type to T4 GPU: Toolbar > Runtime > Change runtime type > T4 GPU**


## Install

Execute the next cells to install yohane.


In [None]:
!python3 --version

In [None]:
%%bash
REPO_URL=https://github.com/Japan7/yohane.git
LATEST_TAG=$(git ls-remote --tags --sort -v:refname $REPO_URL | head -n1 | cut --delimiter='/' --fields=3)

pip3 install uv
uv pip install --system git+$REPO_URL@$LATEST_TAG

**⚠️ Restart your runtime to apply the torch downgrade (or `VocalRemoverSeparator` will not work): Toolbar > Runtime > Restart session**

In [None]:
!pip3 show yohane

## Parameters

The next cells will set the parameters for the yohane pipeline.

You can either upload your own song file (_Song Upload_ cell), or use yt-dlp below (_Song Download_ cell).


In [None]:
# @title Song (upload)
# @markdown Run this cell to **upload your song** using the form below.
#
# @markdown Accepted formats: audio or video files.
#
# @markdown **Note**: If the upload fails, try using a different browser or upload the file manually in the Files section on the left sidebar.

from google.colab import files

upload = files.upload()
for file in upload:
    song_filename = file
    break

In [None]:
# @markdown **If you uploaded the file manually, enter the song filename here.**

song_filename_override = "" # @param {type:"string"}
song_filename = song_filename_override or song_filename

In [None]:
# @title Song (yt-dlp)
# @markdown **Enter** the remote URL below, **then execute** this cell to download the song.

!uv pip install -q --system yt-dlp[default]

from yt_dlp import YoutubeDL

song_url = "" # @param {type:"string"}

with YoutubeDL({"format_sort": ["res:1080", "vcodec:h264", "acodec:aac"]}) as ydl:
    info = ydl.extract_info(song_url)
    song_filename = ydl.prepare_filename(info)

In [None]:
# @title Lyrics { display-mode: "form", run: "auto" }
# @markdown Run this cell, then **paste your lyrics** in the box below.

from IPython.display import display
from ipywidgets import Layout, Textarea

lyrics_area = Textarea(layout=Layout(width="100%", height="200px"))
display(lyrics_area)

In [None]:
# @title Source Separator { display-mode: "form", run: "auto" }
# @markdown Run this cell and select a **Source Separator**:
# @markdown - **VocalRemoverSeparator**: Based on the [`vocal-remover`](https://github.com/tsurumeso/vocal-remover) library. Choose this if you're unsure.
# @markdown - **HybridDemucsSeparator**: Uses `torchaudio`'s [Hybrid Demucs model](https://pytorch.org/audio/2.1.0/tutorials/hybrid_demucs_tutorial.html), which is faster but less aggressive.
# @markdown - **None**: Skips the vocal extraction step if it's not needed.

from yohane.audio import VocalRemoverSeparator, HybridDemucsSeparator

separator_class = VocalRemoverSeparator # @param ["VocalRemoverSeparator", "HybridDemucsSeparator", "None"] {type:"raw"}
separator = separator_class() if separator_class is not None else None

## Run

When ready, execute the next cells to run the pipeline.


In [None]:
# @title Generate

import logging
from pathlib import Path
from yohane import Yohane

logging.basicConfig(level="INFO", force=True)

song_path = Path(song_filename)

yohane = Yohane(separator)
yohane.load_song(song_path)
yohane.load_lyrics(lyrics_area.value)
yohane.extract_vocals()
yohane.force_align()
subs = yohane.make_subs()

In [None]:
# @title Save and download karaoke file

from google.colab import files

subs_path = song_path.with_suffix('.ass')
subs.save(subs_path)
files.download(subs_path)

In [None]:
# @title Download song file

files.download(song_path)

The karaoke file should have downloaded. If not, open Files in the left sidebar and look for `*.ass`.

**Next recommended steps in Aegisub:**

1. Load the .ass and the video
2. Replace the _Default_ style with your own
3. Due to the normalization during the process, lines are lowercased and special characters have been removed: use the original lines in comments to fix the timed lines
4. Subtitle > Select Lines… > check _Comments_ and _Set selection_ > OK and delete the selected lines
5. Listen to each line and fix their End time
6. Add a 1s karaoke lead-in to every line
7. Iterate over each line in karaoke mode and merge/fix syllable timings

**Happy editing!**

<img src="https://github.com/user-attachments/assets/614cd8ca-d471-447c-8596-4ac800d690cf" width="25%" >
