Skip to content

2. Editing the audio and transcript

Davide edited this page Dec 9, 2025 · 1 revision

Since I had more text lines than audio lines, I needed to find a way to delete all the lines that I did not have in my audio footage.

1. Strategy

I still had hacky ideas. For example, transcribe all my audio footage and cross-reference my transcript, removing the lines that are not present. But that would still be likely flawed, unoptimal and ultimately unreliable. So, for once, I made the job by hand.

Yes, not super intelligent of me, but I wanted to make sure that I would have consistent, high quality data.

2. Finding dialogues in the audio

I had over 9 hours of footage and over 2k lines of dialogue. Listening to all that footage carefully and removing the missing lines from my transcript was going to be at least a 9 hour job, but likely much more.

So, is there a way to cut short?

Already 1 hour in, I started to realize that the audio waveform had much more volume during fights, much less during world exploration and an in-between during dialogues. So I focused on that in-between, and I was quickly able to identify a dialogue just by the waveform, without having to listen to it all.

In doing so, I also cut away a lot of dialogue-less audio, which saved not only disk space, but also time.

So, it was definetely not short, but less than I feared.

3. Deleting excess lines from the transcript

Now that I had found a way to identify the dialogues within the audio waveform, I needed to come up with an automated solution to delete all the lines that are "extra" in the transcript.

The solution was to identify a starting line and an end line, then all the lines in between would get deleted automatically. Naturally, this would need to be done more than once per each chapter.

Thus, I manually wrote a quite simple JSON file with all the lines ranges to delete, for each chapter.

This file would then be fed to a script that would loop over each chapter and delete every range of rows as it was specified.

4. Removing narrator and prefixing gibberish

After deleting all lines that weren't in my gameplay, I only thought to do a couple more data preparation steps:

  • delete the narrator lines automatically, since they are not voiced
  • prefix all the lines from characters that speak in gibberish, thus unintelligeble language that the LLM would not be able to associate to natural language

These two steps are default, but they can be disabled by the command line arguments of the main.py script

Clone this wiki locally