Skip to content

Gyubin-Lee/CounterFlow

Repository files navigation

CounterFlow

CounterFlow is a research repository for CounterFlow-specific code, experiment wrappers, evaluation scripts, and project documentation. Third-party repositories are kept out of the project code path and are installed under external/.

Repository Layout

counterflow/      CounterFlow wrappers and project-owned code
external/         Cloned third-party repositories, ignored by Git
baselines/        Baseline repositories such as CAFA and ReWaS, ignored by Git
datasets/         Local datasets and feature caches, ignored by Git
pretrained/       Local model checkpoints, ignored by Git
experiments/      Experiment entry points
evaluation/       Evaluation code only
results/          Generated experiment and evaluation outputs, ignored by Git
scripts/          Setup and utility scripts
patches/          Reproducible patches for external repositories

Setup

Create the CounterFlow-level environment dependencies from requirements.txt if needed. Backend-specific dependencies are managed by the external repositories and their Conda environments.

pip install -r requirements.txt
bash scripts/setup_external_repos.sh

This repository contains only CounterFlow-level requirements. MMAudio and av-benchmark may require their own dependencies. Please refer to each external repository for backend-specific setup.

External Repositories

External repositories live under:

external/MMAudio/
external/av-benchmark/

Use:

bash scripts/setup_external_repos.sh

The script clones missing repositories without overwriting existing directories. If MMAudio needs the CounterFlow network changes, apply:

cd external/MMAudio
git apply ../../patches/mmaudio_networks_counterflow.patch

CounterFlow backend entry scripts are stored under counterflow/mmaudio/. The setup script copies them into MMAudio when the corresponding external files are missing or outdated.

Conda Environments

Use the MMAudio Conda environment for MMAudio-based CounterFlow experiments and MMAudio inference:

conda activate MMAudio

Create the MMAudio environment from a clean Conda environment:

conda create -n MMAudio python=3.10 -y
conda activate MMAudio

# Recommended on shared machines so user-site packages under ~/.local do not
# shadow the Conda environment.
export PYTHONNOUSERSITE=1

python -m pip install --upgrade pip setuptools wheel

# Install the PyTorch wheel that matches your CUDA driver.
# The 2.6.0/cu118 stack works with MMAudio, av-benchmark, and optional OpenFLAM.
python -m pip install \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu118

# ffmpeg is used when composing generated audio back into videos.
conda install -c conda-forge ffmpeg -y

# Clone external repositories if you have not done so yet.
bash scripts/setup_external_repos.sh

# Apply the CounterFlow network changes needed by the MMAudio backbone.
cd external/MMAudio
git apply ../../patches/mmaudio_networks_counterflow.patch
cd ../..

# Install CounterFlow-level helpers and MMAudio dependencies.
python -m pip install -r requirements.txt
python -m pip install -e external/MMAudio

Optional evaluation dependencies:

python -m pip install -e external/av-benchmark

# Needed only for evaluation/eval_flam_metric.py.
python -m pip install openflam

If git apply reports that the MMAudio patch is already applied, keep going. Avoid installing OpenFLAM before the PyTorch 2.6.0 stack above is in place; otherwise pip may upgrade the torch packages to a mismatched set. If packages installed in ~/.local still leak into your Conda environment, keep PYTHONNOUSERSITE=1 set while running inference and evaluation commands.

Pretrained Models

Do not commit pretrained model files. MMAudio checkpoints are downloaded automatically by the MMAudio backend on first run. For CLAP and DeSync evaluation, prepare av-benchmark checkpoints with:

bash scripts/download_pretrained_models.sh

This downloads:

external/av-benchmark/weights/music_speech_audioset_epoch_15_esc_89.98.pt
external/av-benchmark/weights/synchformer_state_dict.pth

Local or manually downloaded model files can also be placed under:

pretrained/mmaudio/
pretrained/etc/

Checkpoint-like files such as *.ckpt, *.pt, *.pth, *.safetensors, and *.bin are ignored by Git.

Datasets

Do not commit datasets. Put local datasets and feature caches under:

datasets/VGGSound-Sparse/

Small metadata files used for local experiments can live there as well, but the dataset directory is ignored by Git.

Baselines

Baseline repositories are kept under:

baselines/CAFA/
baselines/ReWaS/

These directories are ignored because they may contain downloaded code, checkpoints, generated outputs, and datasets. Add reproducible setup scripts later if baseline setup needs to be automated.

Experiments

conda activate MMAudio
python experiments/exp_vggsound_sparse.py \
  --exp-name 2026-05-05_counterflow-mmaudio-default \
  --csv_path datasets/VGGSound-Sparse/vggsound_sparse.csv \
  --clean_csv_path datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --video_root /path/to/vggsound/video

Use --dry-run to print the backend command without launching inference.

The default CounterFlow-MMAudio experiment config matches the public demo: cfg_text=5.0, transition_step=17, Phase 1 ODE, Phase 2 ODE, sigma=0.0, and seed=42.

For a quick clean-subset smoke test, use --pilot --pilot_n 3 with --subset clean:

conda activate MMAudio
CUDA_VISIBLE_DEVICES=0 PYTHONNOUSERSITE=1 python experiments/exp_vggsound_sparse.py \
  --exp-name smoke_mmaudio_clean3 \
  --subset clean \
  --pilot \
  --pilot_n 3 \
  --gpu 0 \
  --csv_path datasets/VGGSound-Sparse/vggsound_sparse.csv \
  --clean_csv_path datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --video_root /path/to/vggsound/video

Demos

Run the public CounterFlow prompt-switch demo with the tracked videos:

conda activate MMAudio
export PYTHONNOUSERSITE=1
CUDA_VISIBLE_DEVICES=0 python run_counterflow_demo.py --gpu 0

This demo uses:

datasets/demo_videos/cat.mp4: source prompt "cat meowing" -> target prompt "horse neighing"
datasets/demo_videos/dog.mp4: source prompt "dog barking" -> target prompt "bear growling"

Outputs are written under:

results/demo/counterflow-cat-dog/

The default demo config is cfg_text=5.0, transition_step=17, Phase 1 ODE, Phase 2 ODE, sigma=0.0, and seed=42.

Run CounterFlow-MMAudio on a small local video manifest:

conda activate MMAudio
export PYTHONNOUSERSITE=1
CUDA_VISIBLE_DEVICES=0 python demos/demo_mmaudio_local_videos.py \
  --manifest datasets/demo_vggsound_sparse_2.csv \
  --output-dir results/demo/mmaudio-vggsound-sparse-2 \
  --gpu 0

The manifest must be a CSV with video_path,prompt columns. Paths may be absolute or relative to the repository root. Example:

video_path,prompt
datasets/demo_videos/example_000001.mp4,people eating crisps
datasets/demo_videos/example_000002.mp4,striking bowling
conda activate MMAudio
python demos/demo_CounterFlow.py --dry-run

Evaluation

Quantitative VGGSound-Sparse evaluation should run in the MMAudio environment:

conda activate MMAudio
export PYTHONNOUSERSITE=1
python evaluation/eval_vggsound_sparse_metrics.py \
  --output_dir results/evaluation/VGGSound-Sparse/qualitative/2026-05-05_counterflow-mmaudio-default \
  --filter_csv datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --gpu 0

The CLAP and DeSync metrics require the av-benchmark checkpoints downloaded by scripts/download_pretrained_models.sh. FAD is computed when datasets/VGGSound-Sparse/gt_fad_stats.pt exists or when a path is provided with --gt_fad_stats. DeSync uses datasets/VGGSound-Sparse/test_video_features.pt when present; otherwise it extracts video features from the generated mp4 files.

FLAM metric evaluation:

conda activate MMAudio
export PYTHONNOUSERSITE=1
python evaluation/eval_flam_metric.py \
  --exp_dir results/evaluation/VGGSound-Sparse/qualitative/2026-05-05_counterflow-mmaudio-default \
  --filter_csv datasets/VGGSound-Sparse/vggsound_sparse_clean_fixed_offsets.csv \
  --gpu_id 0

Qualitative analysis notebook:

jupyter notebook evaluation/qualitative_analysis.ipynb

Results Convention

Store quantitative outputs under:

results/evaluation/VGGSound-Sparse/quantitative/YYYY-MM-DD_model-dataset-setting/

Store qualitative samples, figures, and inspection artifacts under:

results/evaluation/VGGSound-Sparse/qualitative/YYYY-MM-DD_model-dataset-setting/

Generated results are ignored by Git. Keep only results/README.md tracked.

Git-Ignored Files

The .gitignore excludes Python caches, local environments, logs, checkpoints, datasets, pretrained models, generated samples, evaluation outputs, large media files, baseline repositories, and cloned external repositories.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors