Matamata (an acronym for "Matamata attempts to animate mouths, at times accurately") is a tool to automatically create lip-synced animations.
Windows Tutorial: https://youtu.be/mLtNEUG9Hc4
-
Install FFmpeg to the path. The easiest way to do this is install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"```
brew install ffmpeg
-
Download the Gentle executable
-
Download the latest Mac (Darwin) binary from the releases page
- Install Docker Desktop
- Install Gentle
docker pull lowerquality/gentle
- Install FFmpeg
winget install ffmpeg
- Download the latest Windows binary from the releases page
- Install Docker
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- Install Gentle
docker pull lowerquality/gentle
- Install FFmpeg
sudo apt update
sudo apt install ffmpeg
- Download the latest Linux binary from the releases page
See /defaults/SampleCharacter for an example of how to create your character folder.
You can automatically generate your character.json file using https://matamata-animator.github.io/Character-Creator/.
{
"schema": 5,
"poses": {
"defaultMouthScale": 2.0,
"defaultPose": "purple",
"purple": {
"image": "purple.png",
"x": 640,
"y": 400,
"facingLeft": false,
"mouthScale": 1.0
},
"green": {
"image": "green.png",
"x": 640,
"y": 400,
"facingLeft": false,
"mouthScale": 1.0
},
"blue": {
"image": "blue.png",
"x": 640,
"y": 400,
"facingLeft": false,
"mouthScale": 1.0
}
},
"eyes": {
"scale": 0.8,
"x": 640,
"y": 300,
"images": {
"angry": "angry.png",
"normal": "normal.png",
"sad": "sad.png"
}
}
}
poses
contains the images that the character can do
poses.default_mouth_scale
says how much the mouth should be scaled up or down. poses.POSE.mouth_scale
is the same thing for a specific pose. These values are multiplied
x
and y
are the coordinates on the photo where the mouth should be placed.
facingLeft
indicates whether to mirror the mouth image.
eyes
specifies a "placeable part". The sample character pose images don't have eyes, as these are specified by placeable parts. Although this example has placeable eyes, you can have placeable pins, objects in the background, or even hats. scale
specifies how much the placeable part image should be scaled up or down. x
and y
specify the location on the pose where the part should be placed. images
contained key-value pairs where the key is the name of the part, and the value is the image name. This section shows angry, normal, and sad eye selections.
All pose images are in CharacterFolder/poses
All placeable part images are in CharacterFolder/PART_NAME
The timestamps file is composed of a list of pose changes along with how many milliseconds into the animation the pose should change. For instance, if you wanted to swap to the happy
pose after 3.5 seconds, the timestamps file will look like:
3500 happy
Additionally, you can change a placeable part by adding the type afterwards.
0 angry eyes
You can also remove a placeable part by using the name None
5000 None eyes
Flag | Description | Default |
---|---|---|
-a | The path to the audio file being animated (REQUIRED) | |
-k | OpenAI API Key (REQUIRED IF api_url IS NOT SET) |
|
-c | Path to the character folder | By default, the sample character is used |
-o | Path of the output file | output.mov |
-t | Path to the timestamps file | "defaults/characters.json" |
-v | How verbose to be (0 to 3). | 1 |
-transcriber | Which transcriber to use. Vosk doesn't require any setup, however it is the slowest. (Whisper/LocalAI/Vosk) | Whisper |
-whisper_url | URL for the transcription API. By default, this points to OpenAI's Whisper. You can also set it to point to a LocalAI instance. | https://api.openai.com/v1/ |
-vosk_url | Url if using Vosk transcriber | https://matamata.org/web-vosk-transcriber/ |
-aligner_url | URL for Gentle Aligner server. | http://localhost:8765/transcriptions?async=false |
-phonemes | Custom phonemes JSON path | By default, the sample phonemes is used |
After running the program once, a defaultArguments.json
file is created in your cache directory (the path to this file is printed at the start of the program running). I recommend storing your API key there.
- Launch Gentle
- Mac: Launch the prebuilt app
- Linux:
sudo docker run --name gentle -p 8765:8765 lowerquality/gentle
- Windows:
docker run --name gentle -p 8765:8765 lowerquality/gentle
./matamata -a audio.wav [optional arguments]
Do you use this project and want to see a new feature added? Open an issue with the tag feature request and say what you want.
Want to try your hand at writing code? Create a fork, upload your code, and make a pull request. Anything from fixing formatting/typos to entirely new features is welcome!
Don't know what to work on? Take a look at the issues page to see what improvements people want. Anything marked good first issue should be great for newcomers!