Skip to content
Jon Evans edited this page Jul 13, 2022 · 8 revisions

Welcome to the UCS Voice Naming Tool wiki!

On this Wiki, you'll see posts about the project status, challenges, research, and much more.

Check out the Dev Logs to see the most recent status of the tool, or browse through the Source Code

While speech-to-text technology has a large window for adding errors, I hope that this tool will work to provide rough labeling of files, and/or metadata, before the editing process.

Goal

Provide a tool to audio recordists and sound designers that can speed up the labeling process of recorded files by analyzing the voice slate, parsing known terms, cross-referencing terms to the Universal Category System, and labeling files according to the UCS standard.

What this tool DOES

This tool provides ROUGH naming of files, and some metadata, with the use of proper, descriptive, clearly spoken slate recordings.

What this tool DOES NOT

It cannot perfectly name all files and metadata. There will most likely be errors in the naming. This is due to a multitude of factors including speech-to-text engines selecting phonetically similar words, microphone handling noise, ambient environmental noise, mumbling, mispronunciations, false starts in the slate, off-axis voice recordings, and many more.

How some of these issues will be... lessened

We will attempt to add features in the future that can help to clear up some of the above-mentioned errors. Particularly by:

  • Adding conflict flagging, so issues that arise during analysis notify the user, and prompt for resolution.
  • Adding conflict resolution, so flagged conflicts can be manually resolved, or depending on settings, auto-resolved.
  • Adding a 'Phonectical Equivalent Terms' feature to conflict resolution, so different ways that the speech-to-text engine interprets words can be auto-resolved via pre-set 'phonetics and term' pairs (i.e. the label 'Rode NTG-2' could be interpreted as 'road n t g too', 'wrote in tea G two', 'rod an tee gee to' and so many others)
  • Adding default case categorizing so if a category and subcategory are not found, they go into the MISC subcategory. But if there's no MISC subcategory, then flag the sound and prompt the user as needing manual input.

Other Known Issues to Overcome

  • Speed of voice analysis. Analyzing files can be time consuming
  • Cross-platform support for both Mac and Windows
  • Multi-language support

About

This project is starting out of pure laziness when naming. As a sound designer, I find myself wanting to quickly record a few things and get to designing! I don't want to name, edit and add sounds to my library when I'm in the designing mode. I just want to record, and create. I like to leave the naming and editing for when I have set time to do so.

But working in this way leads to downfalls. Things don't get named! They become unusable. Forever lost in some session, labeled as '12-FX 1-220713_1202.wav'.

It can be difficult when a sound has been recorded, is used in a project/session, and I now have to backtrack through all my sessions to figure out where the raw recording is because I didn't take the time to label it and add it to my library editing folder (a folder of sounds that need to be cleaned up prior to adding it to the completed library).

I hope that we can reach a point where we no longer have to do any heavy labeling. Hopefully, with this tool, in conjunction with the other great UCS tools people have made and library software (e.g. SoundMiner), we can do some serious naming.

-Jon