Index

Notes by Joe Penna
Setup
Captions
Textual Inversion vs. Dreambooth
Using the Generated Model
Debugging Your Results
Hugging Face Diffusers

The Repo Formerly Known As "Dreambooth"

Notes by Joe Penna

INTRODUCTIONS!

Hi! My name is Joe Penna.

You might have seen a few YouTube videos of mine under MysteryGuitarMan. I'm now a feature film director. You might have seen ARCTIC or STOWAWAY.

For my movies, I need to be able to train specific actors, props, locations, etc. So, I did a bunch of changes to @XavierXiao's repo in order to train people's faces.

I can't release all the tests for the movie I'm working on, but when I test with my own face, I release those on my Twitter page - @MysteryGuitarM.

Lots of these tests were done with a buddy of mine -- Niko from CorridorDigital. It might be how you found this repo!

I'm not really a coder. I'm just stubborn, and I'm not afraid of googling. So, eventually, some really smart folks joined in and have been contributing. In this repo, specifically: @djbielejeski @gammagec @MrSaad –– but so many others in our Discord!

This is no longer my repo. This is the people-who-wanna-see-Dreambooth-on-SD-working-well's repo!

Now, if you wanna try to do this... please read the warnings below first:

WARNING!

Let's respect the hard work and creativity of people who have spent years honing their skills.
- This iteration of Dreambooth was specifically designed for digital artists to train their own characters and styles into a Stable Diffusion model, as well as for people to train their own likenesses. My main goal is to make a tool for filmmakers to interact with concept artists that they've hired -- to generate the seed of an initial idea, so that they can then communicate visually. Meant to be used by filmmakers, concept artists, comic book designers, etc.
- One day, there'll be a Stable Diffussion trained on perfect datasets. In the meantime, for moral / ethical / potentially legal reasons, I strongly discourage training someone else's art into these model (unless you've obtained explicit permission, or they've made a public statement about this technology). For similar reasons, I recommend against using artists' names in your prompts. Don't put the people who made this possible out of the job!
Onto the technical side:
- You can now run this on a GPU with 24GB of VRAM (e.g. 3090). Training will be slower, and you'll need to be sure this is the only program running.
- If, like myself, you don't happen to own one of those, I'm including a Jupyter notebook here to help you run it on a rented cloud computing platform.
- It's currently tailored to runpod.io and vast.ai
- We do support a colab notebook as well:
This implementation does not fully implement Google's ideas on how to preserve the latent space.
- Most images that are similar to what you're training will be shifted towards that.
- e.g. If you're training a person, all people will look like you. If you're training an object, anything in that class will look like your object.
There doesn't seem to be an easy way to train two subjects consecutively. You will end up with an 11-12GB file before pruning.
- The provided notebook has a pruner that crunches it down to ~2gb
Best practice is to change the token to a celebrity name (note: token, not class -- so your prompt would be something like: Chris Evans person). Here's my wife trained with the exact same settings, except for the token

Setup

Easy RunPod Instructions

Note (1/7/23) Runpod recently upgraded their base Docker image which breaks this repo by default. None of the Youtube videos are up to date, yet. Follow along the typical Runpod Youtube videos/tutorials, with the following changes:

From within the My Pods page,

Click the menu button (to the left of the purple play button)
Click Edit Pod
Update "Docker Image Name" to say runpod/pytorch. It shouldn't have any numbers or letters after it.
Click Save.
Restart your pod

Carry on with the rest of the guide:

Sign up for RunPod. Feel free to use my referral link here, so that I don't have to pay for it (but you do).
After logging in, select either SECURE CLOUD or COMMUNITY CLOUD
Make sure you find a "High" interent speed so you're not wasting time and money on slow downloads
Select something with at least 24gb VRAM like RTX 3090 or RTX A5000
Follow these video instructions below:

Vast.AI Instructions

Sign up for Vast.AI (Referral Links by David Bielejeski)
Add some funds (I typically add them in $10 increments)
Navigate to the Client - Create page
- Select pytorch/pytorch as your docker image, and the buttons "Use Jupyter Lab Interface" and "Jupyter direct HTTPS"
You will want to increase your disk space, and filter on GPU RAM (2GB checkpoint files + 2-8GB model file + regularization images + other stuff adds up fast)
- I typically allocate 150GB
- Also good to check the Upload/Download speed for enough bandwidth so you don't spend all your money waiting for things to download.
Select the instance you want, and click Rent, then head over to your Instances page and click Open
- You will get an unsafe certificate warning. Click past the warning or install the Vast cert.
Click Notebook -> Python 3 (You can do this next step a number of ways, but I typically do this)
Clone Joe's repo with this command
- !git clone https://github.com/JoePenna/Dreambooth-Stable-Diffusion.git
- Click run
Navigate into the new Dreambooth-Stable-Diffusion directory on the left and open either the dreambooth_simple_joepenna.ipynb or dreambooth_runpod_joepenna.ipynb file
Follow the instructions in the workbook and start training

Running Locally Instructions

Setup - Virtual Environment

Pre-Requisites

Git
Python 3.10
Open cmd
Clone the repository
1. C:\>git clone https://github.com/JoePenna/Dreambooth-Stable-Diffusion
Navigate into the repository
1. C:\>cd Dreambooth-Stable-Diffusion

Install Dependencies and Activate Environment

cmd> python -m venv dreambooth_joepenna
cmd> dreambooth_joepenna\Scripts\activate.bat
cmd> pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
cmd> pip install -r requirements.txt

Run

cmd> python "main.py" --project_name "ProjectName" --training_model "C:\v1-5-pruned-emaonly-pruned.ckpt" --regularization_images "C:\regularization_images" --training_images "C:\training_images" --max_training_steps 2000 --class_word "person" --token "zwx" --flip_p 0 --learning_rate 1.0e-06 --save_every_x_steps 250

Cleanup

cmd> deactivate

Setup - Conda

Pre-Requisites

Git
Python 3.10
miniconda3
Open Anaconda Prompt (miniconda3)
Clone the repository
1. (base) C:\>git clone https://github.com/JoePenna/Dreambooth-Stable-Diffusion
Navigate into the repository
1. (base) C:\>cd Dreambooth-Stable-Diffusion

Install Dependencies and Activate Environment

(base) C:\Dreambooth-Stable-Diffusion> conda env create -f environment.yaml
(base) C:\Dreambooth-Stable-Diffusion> conda activate dreambooth_joepenna

Run

cmd> python "main.py" --project_name "ProjectName" --training_model "C:\v1-5-pruned-emaonly-pruned.ckpt" --regularization_images "C:\regularization_images" --training_images "C:\training_images" --max_training_steps 2000 --class_word "person" --token "zwx" --flip_p 0 --learning_rate 1.0e-06 --save_every_x_steps 250

Cleanup

cmd> conda deactivate

Captions

Captions are supported. Here is the guide on how we implemented them.

Let's say that your token is effy and your class is person, your data root is /train then:

training_images/img-001.jpg is captioned with effy person

You can customize the captioning by adding it after a @ symbol in the filename.

/training_images/img-001@a photo of effy => a photo of effy

You can use two tokens in your captions S - uppercase S - and C - uppercase C - to indicate subject and class.

/training_images/img-001@S being a good C.jpg => effy being a good person

To create a new subject you just need to create a folder for it. So:

/training_images/bingo/img-001.jpg => bingo person

The class stays the same, but now the subject has changed.

Again - the token S is now bingo:

/training_images/bingo/img-001@S is being silly.jpg => bingo is being silly

One folder deeper and you can change the class: /training_images/bingo/dog/img-001@S being a good C.jpg => bingo being a good dog

No comes the kicker: one level deeper and you can caption group of images: /training_images/effy/person/a picture of/img-001.jpg => a picture of effy person

Textual Inversion vs. Dreambooth

The majority of the code in this repo was written by Rinon Gal et. al, the authors of the Textual Inversion research paper. Though a few ideas about regularization images and prior loss preservation (ideas from "Dreambooth") were added in, out of respect to both the MIT team and the Google researchers, I'm renaming this fork to: "The Repo Formerly Known As "Dreambooth"".

For an alternate implementation , please see "Alternate Option" below.

Using the generated model

The ground truth (real picture, caution: very beautiful woman)

Same prompt for all of these images below:

`sks person`	`woman person`	`Natalie Portman person`	`Kate Mara person`

Debugging your results

❗❗ THE NUMBER ONE MISTAKE PEOPLE MAKE ❗❗

Prompting with just your token. ie "joepenna" instead of "joepenna person"

If you trained with joepenna under the class person, the model should only know your face as:

joepenna person

Example Prompts:

🚫 Incorrect (missing person following joepenna)

portrait photograph of joepenna 35mm film vintage glass

✅ This is right (person is included after joepenna)

portrait photograph of joepenna person 35mm film vintage glass

You might sometimes get someone who kinda looks like you with joepenna (especially if you trained for too many steps), but that's only because this current iteration of Dreambooth overtrains that token so much that it bleeds into that token.

☢ Be careful with the types of images you train

While training, Stable doesn't know that you're a person. It's just going to mimic what it sees.

So, if these are your training images look like this:

You're only going to get generations of you outside next to a spiky tree, wearing a white-and-gray shirt, in the style of... well, selfie photograph.

Instead, this training set is much better:

The only thing that is consistent between images is the subject. So, Stable will look through the images and learn only your face, which will make "editing" it into other styles possible.

Oh no! You're not getting good generations!

OPTION 1: They're not looking like you at all! (Train longer, or get better training images)

Are you sure you're prompting it right?

It should be <token> <class>, not just <token>. For example:

JoePenna person, portrait photograph, 85mm medium format photo

If it still doesn't look like you, you didn't train long enough.

OPTION 2: They're looking like you, but are all looking like your training images. (Train for less steps, get better training images, fix with prompting)

Okay, a few reasons why: you might have trained too long... or your images were too similar... or you didn't train with enough images.

No problem. We can fix that with the prompt. Stable Diffusion puts a LOT of merit to whatever you type first. So save it for later:

an exquisite portrait photograph, 85mm medium format photo of JoePenna person with a classic haircut

OPTION 3: They're looking like you, but not when you try different styles. (Train longer, get better training images)

You didn't train long enough...

No problem. We can fix that with the prompt:

JoePenna person in a portrait photograph, JoePenna person in a 85mm medium format photo of JoePenna person

More tips and help here: Stable Diffusion Dreambooth Discord

Hugging Face Diffusers - Alternate Option

Dreambooth is now supported in HuggingFace Diffusers for training with Stable Diffusion.

Try it out here:

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
JupyterNotebookHelpers		JupyterNotebookHelpers
assets		assets
configs		configs
data		data
dreambooth_helpers		dreambooth_helpers
evaluation		evaluation
ldm		ldm
models		models
readme-images		readme-images
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dreambooth_google_colab_joepenna.ipynb		dreambooth_google_colab_joepenna.ipynb
dreambooth_joepenna.ipynb		dreambooth_joepenna.ipynb
dreambooth_runpod_vastai_joepenna_obsolete.ipynb		dreambooth_runpod_vastai_joepenna_obsolete.ipynb
environment.yaml		environment.yaml
gdrive		gdrive
main.py		main.py
merge_embeddings.py		merge_embeddings.py
prune_ckpt.py		prune_ckpt.py
requirements.txt		requirements.txt
setup.py		setup.py

License

djbielejeski/Dreambooth-Stable-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Index

The Repo Formerly Known As "Dreambooth"

Notes by Joe Penna

INTRODUCTIONS!

WARNING!

Setup

Easy RunPod Instructions

Carry on with the rest of the guide:

Vast.AI Instructions

Running Locally Instructions

Setup - Virtual Environment

Pre-Requisites

Install Dependencies and Activate Environment

Run

Cleanup

Setup - Conda

Pre-Requisites

Install Dependencies and Activate Environment

Run

Cleanup

Captions

Textual Inversion vs. Dreambooth

Using the generated model

Debugging your results

❗❗ THE NUMBER ONE MISTAKE PEOPLE MAKE ❗❗

☢ Be careful with the types of images you train

Oh no! You're not getting good generations!

OPTION 1: They're not looking like you at all! (Train longer, or get better training images)

OPTION 2: They're looking like you, but are all looking like your training images. (Train for less steps, get better training images, fix with prompting)

OPTION 3: They're looking like you, but not when you try different styles. (Train longer, get better training images)

More tips and help here: Stable Diffusion Dreambooth Discord

Hugging Face Diffusers - Alternate Option

About

Resources

License

Stars

Watchers

Forks

Languages