Skip to content

HZXXXC/funscript-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

funscript-ai

Generate .funscript motion files from an audio track using a hybrid of pretrained neural networks and acoustic signal processing.

The headline feature: it tells apart someone talking from physical action, so a stroker device (e.g. The Handy, or anything Buttplug.io-compatible) stays held still during dialogue / intro lines instead of flailing around.

18+ / adults only. This is a signal-processing tool intended for use with legally obtained content you own. It does not bundle, host, download, or distribute any audio. Use responsibly and in accordance with your local laws.


Why this exists

Naive audio-to-motion converters map "loud = move", so they twitch during speech, moaning, and ambient noise. Hand-written spectral heuristics improve on this but hit a ceiling — they cannot really tell what a sound is.

funscript-ai solves this by combining three signal sources and letting each do what it is best at:

Source Model / method Good at
Speech / voice content PANNs CNN14 (AudioSet, 527 classes) Recognising Speech / Narration / Whispering / Moan / Pant / Breathing
Precise speech timing Silero VAD (ONNX) Exact "this is a person speaking" timestamps
Impact / rhythm Multi-band STFT + RMS-dB silence gate Detecting rhythmic impact sounds

A joint decision tree labels every segment as holding / gentle / intense / climax, then a motion generator produces physically-feasible stroke points and applies device speed/interval limits.

See docs/DESIGN.md for the full story of how this evolved from a naive heuristic (v1) to the current hybrid AI approach (v9).


Quick start

Option A — one-click GUI (recommended)

Windows: double-click start.bat macOS / Linux: bash start.sh

The launcher checks/installs dependencies, downloads model weights on first run, and opens a browser UI at http://127.0.0.1:7860. Drop in an audio file, click Generate, download the .funscript.

Option B — command line

pip install -r requirements.txt
python scripts/download_models.py        # one-time, ~312 MB
python -m funscript_ai.cli input.wav

Useful flags:

python -m funscript_ai.cli input.mp3 -o out.funscript --debug
python -m funscript_ai.cli input.wav --max-speed 400 --gentle-high 40 --invert
python -m funscript_ai.cli input.wav --climax-sensitivity 0.10   # more climax
python -m funscript_ai.cli input.wav --device cuda               # GPU

Option C — as a library

from funscript_ai import generate_funscript, Config

cfg = Config(handy_max_speed=470, pos_gentle_high=45)
result = generate_funscript("input.wav", "out.funscript", config=cfg)
print(result["classes"])   # {'holding': 91, 'intense': 34, 'climax': 13, 'gentle': 3}

How segments map to motion

Label When Motion
holding Dialogue, narration, silence (VAD/PANNs confirmed) Stays at pos=100 (held), no movement
gentle Light activity / breathing, weak impact Irregular shallow strokes (045)
intense Clear rhythmic impact Irregular full-range strokes (0100)
climax Dense, strong impact Regular full-range strokes (~220 ms/cycle)

Position convention: pos=100 = deepest / held, pos=0 = withdrawn. Holding at 100 (rather than a mid-point) mirrors how professional human-made scripts behave during pauses — see the design doc.


Requirements

  • Python 3.10+
  • ~1.5 GB disk for dependencies (PyTorch) + ~312 MB for the PANNs model
  • CPU is fine (~15 s for a 12-minute track); CUDA optional

Model weights are not committed to the repo; they download automatically on first run to ~/panns_data/.


Tuning tips

  • Toy moves during talking → raise --climax-sensitivity slightly, or lower --silence-db (e.g. -40) to gate more aggressively.
  • Not enough climax detected → lower --climax-sensitivity (e.g. 0.10).
  • Strokes too deep/shallow for your anatomy/device → adjust --gentle-high and device limits.
  • Run with --debug to get *.debug.txt (per-segment reasons) and *.features.csv (probability time-series you can plot).

Credits

  • PANNs — Q. Kong et al., PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition (2020).
  • Silero VAD — Silero Team.
  • librosa — audio analysis.

License

MIT.

Disclaimer

Generated scripts are heuristic and may be imperfect — always review before use. The authors take no responsibility for how the output is used. No copyrighted or personal media is included in this repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors