Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MediaPipe Face Control #688

Merged
merged 15 commits into from
Apr 17, 2023
Merged

Add MediaPipe Face Control #688

merged 15 commits into from
Apr 17, 2023

Conversation

josephcatrambone-crucible
Copy link
Contributor

@josephcatrambone-crucible josephcatrambone-crucible commented Mar 31, 2023

We've trained a ControlNet model on Stable Diffusion 2.1 Base with MediaPipe Face constraints. We added keypoints for the pupils which allows for better gaze control than existing alternatives.

The full changes, dataset, and process are described here: https://github.com/crucible-ai/ControlNet/blob/laion_dataset/README_laion_face.md

And the models (ckpt / safetensor) are on huggingface here: https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace

One advantage over other approaches is there is significantly fewer dependencies and less code that has to be added. A disadvantage is this is currently only working with SD2.1. (We have a SD1.5 model brewing right now and will update when it's finished training.)

Notable concern: we added mediapipe to the requirements.txt -- the file seems oddly empty and we don't want to make this a necessity for others, but there isn't another obvious place.

The readme above has some cherry picked results, but here's an output of the UI:

Screenshot 2023-03-31 at 11 09 35 AM

@josephcatrambone-crucible josephcatrambone-crucible marked this pull request as ready for review March 31, 2023 19:32
@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Mar 31, 2023

Important reminder to set the config adapter to models/cldm_v21.yaml!

EDIT: This is no longer necessary with the updates we've done to the model.

@lllyasviel
Copy link
Collaborator

We are also considering MediaPipe. perhaps we need to take a look at whether the installation of the dependency MediaPipe is robust on different platforms. It seems that MediaPipe does not require compiling but we need to make sure about that.

@chris-crucible
Copy link

We are also considering MediaPipe. perhaps we need to take a look at whether the installation of the dependency MediaPipe is robust on different platforms. It seems that MediaPipe does not require compiling but we need to make sure about that.

Thanks for raising the point. We haven't seen issues on Windows or Linux yet, but we haven't tested extensively on multiple systems. Let us know what we can do to help!

@Datou
Copy link

Datou commented Apr 3, 2023

image
It doesn't work :(

@ostap667inbox
Copy link

ostap667inbox commented Apr 3, 2023

It doesn't work :(

How about using 768x768 resolution for SD v2.1_768?
Also, is the correct cldm_v21.yaml configuration file in 'settings'?

@lllyasviel
Copy link
Collaborator

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

Also, please consider the name "mediapipe_face" like controlnet_sd21_mediapipe_face_X.

@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Apr 3, 2023

... It doesn't work :(

SD v2.1_768 has a hard time making 512x512 images. It may be better to try SD v2.1 BASE.

SD v2.1_768 不擅长做这512x512. 我们使用 https://huggingface.co/stabilityai/stable-diffusion-2-1-base .

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

Also, please consider the name "mediapipe_face" like controlnet_sd21_mediapipe_face_X.

Noted! I think we can make that change. Thank you.

@killporter
Copy link

why there is no annotator to use?

@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Apr 3, 2023

I've updated our model repo (https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace) to be consistent with ControlNet's naming scheme.

The model is now called "control_mediapipe_face_sd21_v2.safetensor|pt|yaml". We've also added the yaml file to the remote side.

Are there additional recommended changes on this branch?

@FurkanGozukara
Copy link

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

@josephcatrambone-crucible
Copy link
Contributor Author

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

We have an SD1.5 version training. It's a few hundred hours in, but it's not ready yet. When that's ready we'll push it up to HuggingFace. The SD2.1 model works pretty well. It's based on SD2.1 512-base.

Some folks have reported success with using custom SD2.1 based models. I've not tried them.

@FurkanGozukara
Copy link

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

We have an SD1.5 version training. It's a few hundred hours in, but it's not ready yet. When that's ready we'll push it up to HuggingFace. The SD2.1 model works pretty well. It's based on SD2.1 512-base.

Some folks have reported success with using custom SD2.1 based models. I've not tried them.

looking forward to sd 1.5. so once sd 1.5 released will be able to use it on our custom trained models?

by the way sd 2.1 works perfect

@chris-crucible
Copy link

looking forward to sd 1.5. so once sd 1.5 released will be able to use it on our custom trained models?

Yep any custom models should work as long as they match the base model we trained on. So custom 2.1 models should work with the one we released already. Custom 1.5 models should work with our 1.5 model when we release it.

@ostap667inbox
Copy link

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

I have two questions.

  1. Did I get it right that it is enough for the ControlNet plugin if the folder with the models contains the same number of yaml-files as the models and has the same names as the model files? And that is why it is possible to use models with different configuration files in Multi-ControlNet at the same time, e.g. t2iadapter_keypose and t2iadapter_style?

  2. Is it necessary to leave these two text fields blank in 'Settings' in this case? If these two text boxes are filled in, but at the same time the configuration files are renamed according to the model names, which will have the highest priority?

controlnet-configs

@chris-crucible
Copy link

I have two questions.

  1. Did I get it right that it is enough for the ControlNet plugin if the folder with the models contains the same number of yaml-files as the models and has the same names as the model files? And that is why it is possible to use models with different configuration files in Multi-ControlNet at the same time, e.g. t2iadapter_keypose and t2iadapter_style?
  2. Is it necessary to leave these two text fields blank in 'Settings' in this case? If these two text boxes are filled in, but at the same time the configuration files are renamed according to the model names, which will have the highest priority?

Yes that's correct. As long as each model has a corresponding yaml with the same name, it will use that rather than the default in the settings. You can leave the defaults in the settings alone.

@josephcatrambone-crucible
Copy link
Contributor Author

We just pushed the SD 1.5 version of this model to HuggingFace Hub. 🥳

Let us know if there's anything we can do to help this PR get merged.

@lllyasviel
Copy link
Collaborator

I think official controlnet1.1 will also come out these days - hopefully next week.
That should be a good time to merge a bunch of new annotators.
Wait for us a bit and i believe there is also a pull request about zoe depth

@nemilya
Copy link

nemilya commented Apr 7, 2023

Thanks for amazing feature!

Just successfully tested it (based on SD1.5) on MacBookPro Apple M2 (OS Monterey 12.6.3, 8Gb), the only issue was installation mediapipe (after starting webui):

stderr: ERROR: Could not find a version that satisfies the requirement mediapipe==0.9.1.0 (from versions: none)
ERROR: No matching distribution found for mediapipe==0.9.1.0

Was solved by manual replacement mediapipe==0.9.1.0 to mediapipe-silicon==0.9.1 in requirements.txt

ps: also there is log message after start Generation (with enabled ControlNet and mediapipe_face preprocessor):

...
ControlNet model control_mediapipe_face_sd15_v2 [9c7784a9] loaded.
Loading preprocessor: mediapipe_face
objc[26962]: Class CaptureDelegate is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d25a0) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_videoio.3.4.16.dylib (0x2944e0860). One of the two will be used. Which one is undefined.
objc[26962]: Class CVWindow is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d25f0) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fca68). One of the two will be used. Which one is undefined.
objc[26962]: Class CVView is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d2618) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fca90). One of the two will be used. Which one is undefined.
objc[26962]: Class CVSlider is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d2640) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fcab8). One of the two will be used. Which one is undefined.
...

but this not cause any issues.

@josephcatrambone-crucible
Copy link
Contributor Author

It looks like mediapipe has a completely separate branch for Apple silicon because it was set up by a third party maintainer. My reading of it is they're looking to merge the automatic build into the normal mediapipe flow. We can either change the requirement to this:

mediapipe>=0.8.9; platform_system != "Darwin" and platform.machine != 'arm64'
mediapipe-silicon>=0.8.9; platform_system == "Darwin" and platform.machine == 'arm64'

which might break because the setup script just iterates over the lines and does a split, or we can wait for Google to update their build pipeline. Here's the tracking issue: google-ai-edge/mediapipe#3277

@enternalsaga
Copy link

enternalsaga commented Apr 11, 2023

hello @josephcatrambone-crucible, for some reasons, mediapipe only works with output size as 512:768 or 768:512, I tried other sizes but always get RuntimeError: The expanded size of the tensor (2560) must match the existing size (4608) at non-singleton dimension 1. Target sizes: [2, 2560, 320]. Tensor sizes: [1, 4608, 1], or ZeroDivisionError: division by zero.
https://imgur.com/a/JIjvOow
https://imgur.com/a/5erZVko

@josephcatrambone-crucible
Copy link
Contributor Author

hello @josephcatrambone-crucible, for some reasons, mediapipe only works with output size as 512:768 or 768:512, I tried other sizes but always get RuntimeError: The expanded size of the tensor (2560) must match the existing size (4608) at non-singleton dimension 1. Target sizes: [2, 2560, 320]. Tensor sizes: [1, 4608, 1], or ZeroDivisionError: division by zero. https://imgur.com/a/JIjvOow https://imgur.com/a/5erZVko

Thanks for the report. 🤔 We tried some intermediate sizes and didn't see this. Did you change the annotator resolution or the input resolution? Perhaps we should force the mediapipe to use a 512x512 image.

Quick questions: are you on Apple Silicon? Are you using the SD1.5 or SD2.1-base model?

@sabbih-shah
Copy link

sabbih-shah commented Apr 12, 2023

@josephcatrambone-crucible, So, the size issue is due to the resize function only resizing along a single dimension:

line 237 in scripts/processor.py
img = resize_image(HWC3(img), res)

The resize function given an example input produces the following:

Input_shape: (3865, 2576, 3)
resized_shape: (768, 512, 3)

you can replace the resize_image function with this one:

def resize_image(image, target_width, target_height):
    # Get the dimensions of the original image
    height, width, channels = image.shape

    # Calculate the aspect ratio
    aspect_ratio = width / height

    # Calculate the new dimensions based on the aspect ratio and the desired size
    if target_width / target_height > aspect_ratio:
        new_width = int(target_height * aspect_ratio)
        new_height = target_height
    else:
        new_width = target_width
        new_height = int(target_width / aspect_ratio)

    # Resize the image while maintaining the aspect ratio
    resized_image = cv2.resize(image, (new_width, new_height))

    # Add padding or crop the image to the desired size
    if target_width / target_height > aspect_ratio:
        padding = int((target_width - new_width) / 2)
        resized_image = cv2.copyMakeBorder(resized_image, 0, 0, padding, target_width - new_width - padding, cv2.BORDER_CONSTANT)
    else:
        padding = int((target_height - new_height) / 2)
        resized_image = cv2.copyMakeBorder(resized_image, padding, target_height - new_height - padding, 0, 0, cv2.BORDER_CONSTANT)

    return resized_image

This should handle different height and width combos while maintaining aspect ratio and not stretching the mask. For example:

image = resize_image(HWC3(image), target_width=512, target_height=512)
Input_shape: (3865, 2576, 3)
resized_shpae: (512, 512, 3)
image = resize_image(HWC3(image), target_width=512, target_height=768)
Input_shape: (3865, 2576, 3)
resized_shpae: (768, 512, 3)

@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Apr 12, 2023

Nice find. There are a LOT of the other processors using the resize_image (canny, simple_scribble, hed, mlsd, midas, leres, openpose, uniformer, pidinet, clip, and binary). I feel like perhaps we should do a separate PR to resolve the scaling issue because this will hit more than just us.

As an off-hand, I'm surprised to hear that resize_image only does one axis. From the implementation it looks like it will be handling aspect ratio and all that:

def resize_image(input_image, resolution):
    H, W, C = input_image.shape
    H = float(H)
    W = float(W)
    k = float(resolution) / min(H, W)
    H *= k
    W *= k
    H = int(np.round(H / 64.0)) * 64
    W = int(np.round(W / 64.0)) * 64
    img = cv2.resize(input_image, (W, H), interpolation=cv2.INTER_LANCZOS4 if k > 1 else cv2.INTER_AREA)
    return img

@sabbih-shah
Copy link

sabbih-shah commented Apr 12, 2023

Hmm, we could use a separate resize function for media pipe. But that wouldn't be a good practice I believe. A separate PR does make more sense here. I found the scaling issue while implementing a pre-processor class to use with diffusers. Hopefully, this gets fixed. And, thanks for the nice work on expressions.

@lllyasviel
Copy link
Collaborator

hello we recommend to follow the naming standard of controlnet 1.1. We will begin to merge models these days.

@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Apr 13, 2023

Congratulations! That's a big release. We'll work on getting those model names updated. There are a few applications that are already using the model as named in the repo, but I think we can find a solution.

EDIT: Do we need to make changes to this PR aside from the name update to be compatible with the ControlNet 1.1 release?

@josephcatrambone-crucible
Copy link
Contributor Author

I've renamed our models to be consistent with the ControlNet 1.1 scheme: https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace/commit/6948da26359817bd4f366a9549fb094091560623

@lllyasviel
Copy link
Collaborator

hello, for models marked as [p] production-ready, we will test them with a few cases, eg, a few non-cherry-picked random batch with seed 12345. [e/u] models does not need this. This may take one or two working days.

@lllyasviel
Copy link
Collaborator

hello we have comfirmed that this can be merged.
Please solve conflicts and we will merge as soon as possible when conflicts resolved

@josephcatrambone-crucible
Copy link
Contributor Author

josephcatrambone-crucible commented Apr 17, 2023

On it!

@josephcatrambone-crucible
Copy link
Contributor Author

Changes are made and sanity checked on the local system. SD 1.5 and SD 2.1 models are both behaving okay from what I can tell.

@lllyasviel lllyasviel merged commit b47172d into Mikubill:main Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants