Skip to content

AuViMi stands for audio-visual mirror. The idea is to have CLIP generate its interpretation of what your webcam sees, combined with the words thare are spoken.

License

NotNANtoN/AuViMi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AuViMi

A close-up of my cup with a dragon on it, imagined by deep-daze via my webcam:

image

A quick self-portrait, imagined using big-sleep via my webcam:

image

AuViMi stands for audio-visual mirror. The idea is to have CLIP generate its interpretation of what your webcam sees, combined with the words thare are spoken.

This implementation assumes that you want to operate on a non-GPU laptop, but have quick connection to a more powerful GPU server.

See it in action (with deep-daze as a backbone). You can observe some art, reinterpreted by deep-daze.:

art.mp4

And here's a beautiful self-portrait of NotNANtoN with big-sleep as a backbone:

03_12_21_15.20.27_mirror.mp4

At the moment, we only support reading in the webcam pictures and we support the combination of the webcam pictures with a single sentence read from the CLI.

Usage

Install

Install the requirements.txt using python3 -m pip install -r requirements.txt. Also, install ffmpeg on the host server if you want a .mp4 video of the interpretation using sudo apt-get install ffmpeg.

Note

If you use a remote GPU host to do the heavy computation, we assume that ssh is set up. Furthermore, we assume that an ssh-key is used instead of a password to connect to the remote server.

Commands:

To run on your GPU laptop or desktop with a webcam, the following command should work. python_pathdefaults to /usr/bin/python3, so you only need to use it if you are using a venv or if the python path on the remote host is different from the default:

python3 app.py --run_local 1 --python_path YOUR_PYTHON_PATH

You need to set --host, --user, and --python_path if you run remotely!

hostcould be university_X.edu.com and user would be your username on that host, e.g. student_Y. To find out what to insert for python_path, connect to your host and enter which python3. This could lead to:

python3 app.py --user student_Y --host university_X.edu.com --python_path /usr/bin/python3

Specifying the operating mode: If pic is set as an operating mode, the user can press p to set a new optimization goal - for stream the optimization goal is set automatically to the newest pictures from the webcam feed:

python3 app.py --mode stream

Specifying the backbone, image size (smaller lead to higher FPR but look less nice), batch_size (fewer reduces the amount of VRAM needed on the GPU), whether meta-learning should be used, and what meta-learning learning rate is used:

python3 app.py --gen_backbone deepdaze --size 256 --batch_size 32 --mode stream --meta 1 --meta_lr 0.2

Add text using --text and set its weight with --text_weight. Setting the weight to 1.0will ignore the webcam and only visualize the text:

python3 app.py --text "A funky human." --text_weight 0.5

About

AuViMi stands for audio-visual mirror. The idea is to have CLIP generate its interpretation of what your webcam sees, combined with the words thare are spoken.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages