CounterFlow

CounterFlow is a research repository for CounterFlow-specific code, experiment wrappers, evaluation scripts, and project documentation. Third-party repositories are kept out of the project code path and are installed under external/.

Repository Layout

counterflow/      CounterFlow wrappers and project-owned code
external/         Cloned third-party repositories, ignored by Git
baselines/        Baseline repositories such as CAFA and ReWaS, ignored by Git
datasets/         Local datasets and feature caches, ignored by Git
pretrained/       Local model checkpoints, ignored by Git
experiments/      Experiment entry points
evaluation/       Evaluation code only
results/          Generated experiment and evaluation outputs, ignored by Git
scripts/          Setup and utility scripts
patches/          Reproducible patches for external repositories

Setup

Create the CounterFlow-level environment dependencies from requirements.txt if needed. Backend-specific dependencies are managed by the external repositories and their Conda environments.

pip install -r requirements.txt
bash scripts/setup_external_repos.sh

This repository contains only CounterFlow-level requirements. MMAudio and av-benchmark may require their own dependencies. Please refer to each external repository for backend-specific setup.

External Repositories

External repositories live under:

external/MMAudio/
external/av-benchmark/

Use:

bash scripts/setup_external_repos.sh

The script clones missing repositories without overwriting existing directories. If MMAudio needs the CounterFlow network changes, apply:

cd external/MMAudio
git apply ../../patches/mmaudio_networks_counterflow.patch

CounterFlow backend entry scripts are stored under counterflow/mmaudio/. The setup script copies them into MMAudio when the corresponding external files are missing or outdated.

Conda Environments

Use the MMAudio Conda environment for MMAudio-based CounterFlow experiments and MMAudio inference:

conda activate MMAudio

Create the MMAudio environment from a clean Conda environment:

conda create -n MMAudio python=3.10 -y
conda activate MMAudio

# Recommended on shared machines so user-site packages under ~/.local do not
# shadow the Conda environment.
export PYTHONNOUSERSITE=1

python -m pip install --upgrade pip setuptools wheel

# Install the PyTorch wheel that matches your CUDA driver.
# The 2.6.0/cu118 stack works with MMAudio, av-benchmark, and optional OpenFLAM.
python -m pip install \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu118

# ffmpeg is used when composing generated audio back into videos.
conda install -c conda-forge ffmpeg -y

# Clone external repositories if you have not done so yet.
bash scripts/setup_external_repos.sh

# Apply the CounterFlow network changes needed by the MMAudio backbone.
cd external/MMAudio
git apply ../../patches/mmaudio_networks_counterflow.patch
cd ../..

# Install CounterFlow-level helpers and MMAudio dependencies.
python -m pip install -r requirements.txt
python -m pip install -e external/MMAudio

Optional evaluation dependencies:

python -m pip install -e external/av-benchmark

# Needed only for evaluation/eval_flam_metric.py.
python -m pip install openflam

If git apply reports that the MMAudio patch is already applied, keep going. Avoid installing OpenFLAM before the PyTorch 2.6.0 stack above is in place; otherwise pip may upgrade the torch packages to a mismatched set. If packages installed in ~/.local still leak into your Conda environment, keep PYTHONNOUSERSITE=1 set while running inference and evaluation commands.

Pretrained Models

Do not commit pretrained model files. MMAudio checkpoints are downloaded automatically by the MMAudio backend on first run. For CLAP and DeSync evaluation, prepare av-benchmark checkpoints with:

bash scripts/download_pretrained_models.sh

This downloads:

external/av-benchmark/weights/music_speech_audioset_epoch_15_esc_89.98.pt
external/av-benchmark/weights/synchformer_state_dict.pth

Local or manually downloaded model files can also be placed under:

pretrained/mmaudio/
pretrained/etc/

Checkpoint-like files such as *.ckpt, *.pt, *.pth, *.safetensors, and *.bin are ignored by Git.

Datasets

Do not commit datasets. Put local datasets and feature caches under:

datasets/VGGSound-Sparse/

Small metadata files used for local experiments can live there as well, but the dataset directory is ignored by Git.

Baselines

Baseline repositories are kept under:

baselines/CAFA/
baselines/ReWaS/

These directories are ignored because they may contain downloaded code, checkpoints, generated outputs, and datasets. Add reproducible setup scripts later if baseline setup needs to be automated.

Experiments

conda activate MMAudio
python experiments/exp_vggsound_sparse.py \
  --exp-name 2026-05-05_counterflow-mmaudio-default \
  --csv_path datasets/VGGSound-Sparse/vggsound_sparse.csv \
  --clean_csv_path datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --video_root /path/to/vggsound/video

Use --dry-run to print the backend command without launching inference.

The default CounterFlow-MMAudio experiment config matches the public demo: cfg_text=5.0, transition_step=17, Phase 1 ODE, Phase 2 ODE, sigma=0.0, and seed=42.

For a quick clean-subset smoke test, use --pilot --pilot_n 3 with --subset clean:

conda activate MMAudio
CUDA_VISIBLE_DEVICES=0 PYTHONNOUSERSITE=1 python experiments/exp_vggsound_sparse.py \
  --exp-name smoke_mmaudio_clean3 \
  --subset clean \
  --pilot \
  --pilot_n 3 \
  --gpu 0 \
  --csv_path datasets/VGGSound-Sparse/vggsound_sparse.csv \
  --clean_csv_path datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --video_root /path/to/vggsound/video

Demos

Run the public CounterFlow prompt-switch demo with the tracked videos:

conda activate MMAudio
export PYTHONNOUSERSITE=1
CUDA_VISIBLE_DEVICES=0 python run_counterflow_demo.py --gpu 0

This demo uses:

datasets/demo_videos/cat.mp4: source prompt "cat meowing" -> target prompt "horse neighing"
datasets/demo_videos/dog.mp4: source prompt "dog barking" -> target prompt "bear growling"

Outputs are written under:

results/demo/counterflow-cat-dog/

The default demo config is cfg_text=5.0, transition_step=17, Phase 1 ODE, Phase 2 ODE, sigma=0.0, and seed=42.

Run CounterFlow-MMAudio on a small local video manifest:

conda activate MMAudio
export PYTHONNOUSERSITE=1
CUDA_VISIBLE_DEVICES=0 python demos/demo_mmaudio_local_videos.py \
  --manifest datasets/demo_vggsound_sparse_2.csv \
  --output-dir results/demo/mmaudio-vggsound-sparse-2 \
  --gpu 0

The manifest must be a CSV with video_path,prompt columns. Paths may be absolute or relative to the repository root. Example:

video_path,prompt
datasets/demo_videos/example_000001.mp4,people eating crisps
datasets/demo_videos/example_000002.mp4,striking bowling

conda activate MMAudio
python demos/demo_CounterFlow.py --dry-run

Evaluation

Quantitative VGGSound-Sparse evaluation should run in the MMAudio environment:

conda activate MMAudio
export PYTHONNOUSERSITE=1
python evaluation/eval_vggsound_sparse_metrics.py \
  --output_dir results/evaluation/VGGSound-Sparse/qualitative/2026-05-05_counterflow-mmaudio-default \
  --filter_csv datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --gpu 0

The CLAP and DeSync metrics require the av-benchmark checkpoints downloaded by scripts/download_pretrained_models.sh. FAD is computed when datasets/VGGSound-Sparse/gt_fad_stats.pt exists or when a path is provided with --gt_fad_stats. DeSync uses datasets/VGGSound-Sparse/test_video_features.pt when present; otherwise it extracts video features from the generated mp4 files.

FLAM metric evaluation:

conda activate MMAudio
export PYTHONNOUSERSITE=1
python evaluation/eval_flam_metric.py \
  --exp_dir results/evaluation/VGGSound-Sparse/qualitative/2026-05-05_counterflow-mmaudio-default \
  --filter_csv datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --gpu_id 0

Qualitative analysis notebook:

jupyter notebook evaluation/qualitative_analysis.ipynb

Results Convention

Store quantitative outputs under:

results/evaluation/VGGSound-Sparse/quantitative/YYYY-MM-DD_model-dataset-setting/

Store qualitative samples, figures, and inspection artifacts under:

results/evaluation/VGGSound-Sparse/qualitative/YYYY-MM-DD_model-dataset-setting/

Generated results are ignored by Git. Keep only results/README.md tracked.

Git-Ignored Files

The .gitignore excludes Python caches, local environments, logs, checkpoints, datasets, pretrained models, generated samples, evaluation outputs, large media files, baseline repositories, and cloned external repositories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CounterFlow

Repository Layout

Setup

External Repositories

Conda Environments

Pretrained Models

Datasets

Baselines

Experiments

Demos

Evaluation

Results Convention

Git-Ignored Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
baselines		baselines
configs		configs
counterflow		counterflow
datasets		datasets
demos		demos
evaluation		evaluation
experiments		experiments
external		external
patches		patches
pretrained		pretrained
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_counterflow_demo.py		run_counterflow_demo.py

Folders and files

Latest commit

History

Repository files navigation

CounterFlow

Repository Layout

Setup

External Repositories

Conda Environments

Pretrained Models

Datasets

Baselines

Experiments

Demos

Evaluation

Results Convention

Git-Ignored Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages