You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added support for WavTokenizer and SimVQ! Both are single-level codecs that share the same architecture but differ in their VQ strategy. WavTokenizer comes in 40Hz and 75Hz variants with a vocabulary size of 4096. SimVQ variants have a 75Hz framerate with vocabulary sizes ranging from 4096 to 262144 codes. SimVQ also features a causal encoder and partially causal decoder, making it suitable for streaming use cases.
Use --codec_model WavTokenizer-large-320-24k-4096 (or any other from the Model column on this table) with codec_bpe.audio_to_codes to encode audio using WavTokenizer.
Use --codec_model simvq_4k (or any other from the Model column on this table) with codec_bpe.audio_to_codes to encode audio using SimVQ.