A pipeline library for batch encode → decode round-trips through neural audio codec models. Feed it a folder of WAV files, pick a codec, get reconstructed WAVs — useful for building codec-distorted datasets, evaluating codec quality, or preprocessing audio for TTS/ASR training.
| ID | Name | Sample Rate | Install | Output |
|---|---|---|---|---|
| 1 | snac_24khz |
24 kHz | pip install snac |
mono |
| 2 | snac_32khz |
32 kHz | pip install snac |
mono |
| 3 | snac_44khz |
44 kHz | pip install snac |
mono |
| 4 | dac_16khz |
16 kHz | pip install descript-audio-codec |
mono |
| 5 | dac_24khz |
24 kHz | pip install descript-audio-codec |
mono |
| 6 | dac_44khz |
44 kHz | pip install descript-audio-codec |
mono |
| 7 | encodec_24khz |
24 kHz | pip install transformers encodec |
mono |
| 8 | encodec_48khz |
48 kHz | pip install transformers encodec |
stereo |
| 9 | soundstream_16khz |
16 kHz | pip install soundstream |
mono |
| 10 | speechtokenizer |
16 kHz | pip + manual checkpoint | mono |
| — | FunCodec |
16 kHz | external repo | — |
| — | AudioDec |
24 / 48 kHz | external repo | — |
⚠️ SoundStream (soundstream==0.0.1) pinsnumpy<2.0andhuggingface-hub<0.16. After installing it, runpip install --upgrade huggingface-hubto keep EnCodec working. For a fully clean setup, use a dedicated virtual environment for SoundStream.
All model weights download automatically from HuggingFace on first use (except SpeechTokenizer — see below).
Neural-Codecs/
├── audio_codec/
│ ├── config.py ← AUTO_INSTALL_DEPS flag lives here
│ ├── registry.py ← codec metadata (packages, import checks, hub names)
│ ├── installer.py ← dep-check, auto-install, setup commands
│ ├── cli.py ← neural-codec CLI entry point
│ └── codecs/
│ ├── snac.py
│ ├── dac.py
│ ├── encodec24.py
│ ├── encodec48.py
│ ├── soundstream.py
│ └── speechtokenizer.py
├── requirements/
│ ├── base.txt ← torch, torchaudio, soundfile, numpy, tqdm
│ ├── snac.txt ← IDs 1–3
│ ├── dac.txt ← IDs 4–6
│ ├── encodec.txt ← IDs 7–8
│ ├── soundstream.txt ← ID 9
│ └── speechtokenizer.txt← ID 10
├── config/
│ └── config.json ← SpeechTokenizer model config
├── checkpoints/ ← place SpeechTokenizer.pt here
├── audio_sample/ ← put your input WAV files here
└── pyproject.toml
git clone https://github.com/CodeVault-girish/NeuralCodecDecoder.git
cd NeuralCodecDecoder
pip install -e .This registers the neural-codec CLI command. Base dependencies (torch, torchaudio,
soundfile, tqdm) are installed automatically. Per-codec packages are installed on demand.
Missing codec dependencies are installed automatically the first time you run a codec.
Controlled by one flag in audio_codec/config.py:
# audio_codec/config.py
AUTO_INSTALL_DEPS = True # auto-install missing packages before decoding (default)
AUTO_INSTALL_DEPS = False # print the install command and exit instead| Value | Behaviour |
|---|---|
True |
First decode_folder() call installs any missing packages, then runs. No manual setup needed. |
False |
Prints the missing packages and exact pip install / neural-codec setup command, then exits cleanly. |
# 1. See all codecs with live install status
neural-codec list
# 2. Decode a folder — auto-installs deps on first run (AUTO_INSTALL_DEPS=True)
neural-codec decode --codec snac_24khz --input ./audio_sample --output ./out
# 3. Use a different codec
neural-codec decode --codec dac_16khz --input ./audio_sample --output ./out
neural-codec decode --codec encodec_24khz --input ./audio_sample --output ./out
# 4. Use GPU
neural-codec decode --codec snac_24khz --input ./audio_sample --output ./out --device cuda
# 5. Use codec by numeric ID instead of name
neural-codec decode --codec 7 --input ./audio_sample --output ./out
# 6. Pre-install deps without decoding
neural-codec setup --codec snac_24khz
neural-codec setup --allShows every codec — ID, name, sample rate, install status, and required packages.
Also shows whether AUTO_INSTALL_DEPS is currently enabled.
neural-codec list
Install dependencies for a codec without running it.
neural-codec setup --codec snac_24khz # install by name
neural-codec setup --codec 1 # install by ID
neural-codec setup --all # install all pip-installable codecs
# External codecs — prints manual setup steps
neural-codec setup --codec funcodec
neural-codec setup --codec audiodecBatch encode/decode all WAV files in a folder (recursive).
neural-codec decode --codec <NAME_OR_ID> --input <DIR> --output <DIR> [--device cpu|cuda]| Flag | Required | Description |
|---|---|---|
--codec |
yes | Codec name (snac_24khz) or numeric ID (1) |
--input |
yes | Folder with .wav files (searched recursively) |
--output |
yes | Folder where decoded files are written |
--device |
no | cpu (default) or cuda |
Output files are named <original_stem>_<codec_name>.wav.
from audio_codec import decode_folder, decoder_list, setup_codec, setup_all
# show all codecs and their install status
decoder_list()
# decode a folder (auto-installs deps if AUTO_INSTALL_DEPS=True)
decode_folder("1", "audio_sample/", "out/", "cpu") # by ID
decode_folder("snac_24khz", "audio_sample/", "out/", "cuda") # by name
# explicitly install before decoding
setup_codec("snac_24khz")
setup_all()
# check if a codec's deps are satisfied (no side effects)
from audio_codec import deps_satisfied
from audio_codec.registry import CODEC_REGISTRY
print(deps_satisfied(CODEC_REGISTRY["7"])) # True / FalseModels: snac_24khz · snac_32khz · snac_44khz
Weights: auto-download from HuggingFace (~30–80 MB each)
# requirements file
pip install -r requirements/snac.txt
# or via CLI
neural-codec setup --codec snac_24khz| Package | Notes |
|---|---|
snac |
SNAC model |
torchaudio |
resampling |
soundfile |
WAV I/O |
neural-codec decode --codec snac_24khz --input ./wavs --output ./out
neural-codec decode --codec snac_32khz --input ./wavs --output ./out
neural-codec decode --codec snac_44khz --input ./wavs --output ./outModels: dac_16khz · dac_24khz · dac_44khz
Weights: auto-download via dac.utils.download() (~75 MB each)
pip install -r requirements/dac.txt
neural-codec setup --codec dac_16khz| Package | Notes |
|---|---|
descript-audio-codec |
DAC model + bundles audiotools |
soundfile |
WAV I/O |
neural-codec decode --codec dac_16khz --input ./wavs --output ./out
neural-codec decode --codec dac_24khz --input ./wavs --output ./out
neural-codec decode --codec dac_44khz --input ./wavs --output ./outModels: encodec_24khz (mono) · encodec_48khz (stereo)
Weights: auto-download from HuggingFace
pip install -r requirements/encodec.txt
neural-codec setup --codec encodec_24khz| Package | Notes |
|---|---|
transformers |
HuggingFace model loader |
encodec |
EnCodec core |
soundfile |
WAV I/O |
neural-codec decode --codec encodec_24khz --input ./wavs --output ./out # mono, 24 kHz
neural-codec decode --codec encodec_48khz --input ./wavs --output ./out # stereo, 48 kHzModel: soundstream_16khz
Weights: auto-download from HuggingFace (naturalspeech2.pt, ~143 MB)
pip install -r requirements/soundstream.txt
# After installing, restore a newer huggingface-hub so EnCodec still works:
pip install --upgrade huggingface-hub
neural-codec setup --codec soundstream_16khz| Package | Version | Notes |
|---|---|---|
soundstream |
0.0.1 | pins numpy<2.0 and huggingface-hub<0.16 |
soundfile |
latest | WAV I/O |
Dependency conflict with EnCodec:
soundstream==0.0.1forceshuggingface-hub<0.16, which breakstransformers. Fix: after installing soundstream, runpip install --upgrade huggingface-hub. Both codecs then work in the same environment. For a fully isolated setup, use a dedicated virtual environment:
python -m venv venv_soundstream
venv_soundstream\Scripts\activate # Windows
source venv_soundstream/bin/activate # Linux / Mac
pip install -r requirements/soundstream.txt
neural-codec decode --codec soundstream_16khz --input ./wavs --output ./outModel: speechtokenizer (16 kHz)
Weights: manual download required
pip install -r requirements/speechtokenizer.txt| Package | Notes |
|---|---|
speechtokenizer |
model loader |
beartype |
required runtime dependency of speechtokenizer |
soundfile |
WAV I/O |
The pip packages alone are not enough — you must also download the checkpoint manually.
Step 1 — Install packages:
neural-codec setup --codec speechtokenizerStep 2 — Download both files from HuggingFace:
Place them at these exact paths:
Neural-Codecs/
checkpoints/
SpeechTokenizer.pt ← download from HuggingFace
config/
config.json ← download from HuggingFace
Step 3 — Decode:
neural-codec decode --codec speechtokenizer --input ./wavs --output ./outIf the checkpoint is missing the CLI prints the exact download URL and expected path — no silent failures.
Requires its own virtual environment due to dependency conflicts.
# Print full step-by-step instructions
neural-codec setup --codec funcodecManual summary:
python -m venv funcodec
funcodec\Scripts\activate # Windows
source funcodec/bin/activate # Linux / Mac
git clone https://github.com/alibaba-damo-academy/FunCodec.git
cd FunCodec && pip install -e .
pip install torch torchaudio numpy soundfile
cd egs/LibriTTS/codec && mkdir -p exp
git lfs install
git clone https://huggingface.co/alibaba-damo/audio_codec-encodec-en-libritts-16k-nq32ds640-pytorch \
exp/audio_codec-encodec-en-libritts-16k-nq32ds640-pytorchBuild input list:
find /path/to/wavs -name "*.wav" \
| awk -F/ '{printf "%s %s\n", $(NF-1)"_"$NF, $0}' > input.scpEncode:
model=audio_codec-encodec-en-libritts-16k-nq32ds640-pytorch
bash encoding_decoding.sh \
--stage 1 --batch_size 1 --num_workers 1 --gpu_devices 0 \
--model_dir exp/${model} --bit_width 16000 --file_sampling_rate 16000 \
--wav_scp input.scp --out_dir outputs/codecsDecode:
bash encoding_decoding.sh \
--stage 2 --batch_size 1 --num_workers 1 --gpu_devices 0 \
--model_dir exp/${model} --bit_width 16000 --file_sampling_rate 16000 \
--wav_scp outputs/codecs/codecs.txt --out_dir outputs/recon_wavs# Print full step-by-step instructions
neural-codec setup --codec audiodecManual summary:
git clone https://github.com/facebookresearch/AudioDec.git
cd AudioDec && pip install -r requirements.txtDownload exp.zip,
extract into AudioDec/, then copy AudioDec.py from this repo into AudioDec/.
# Encode + decode
python AudioDec.py --model libritts_v1 -i input/ -o output/ # 24 kHz
python AudioDec.py --model vctk_v1 -i input/ -o output/ # 48 kHz| Model | Sample Rate |
|---|---|
libritts_v1 |
24 kHz |
vctk_v1 |
48 kHz |