Add MediaPipe Face Control #688

josephcatrambone-crucible · 2023-03-31T19:17:35Z

We've trained a ControlNet model on Stable Diffusion 2.1 Base with MediaPipe Face constraints. We added keypoints for the pupils which allows for better gaze control than existing alternatives.

The full changes, dataset, and process are described here: https://github.com/crucible-ai/ControlNet/blob/laion_dataset/README_laion_face.md

And the models (ckpt / safetensor) are on huggingface here: https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace

One advantage over other approaches is there is significantly fewer dependencies and less code that has to be added. A disadvantage is this is currently only working with SD2.1. (We have a SD1.5 model brewing right now and will update when it's finished training.)

Notable concern: we added mediapipe to the requirements.txt -- the file seems oddly empty and we don't want to make this a necessity for others, but there isn't another obvious place.

The readme above has some cherry picked results, but here's an output of the UI:

…lueError: At least one stride in the given numpy array is negative'. Will have to ask why this happens.

… to see if we can alleviate the negative stride without changing the codebase.

…s to 10.

…ll rather than empty.

… vestigial annotation stuff.

josephcatrambone-crucible · 2023-03-31T19:34:43Z

Important reminder to set the config adapter to models/cldm_v21.yaml!

EDIT: This is no longer necessary with the updates we've done to the model.

lllyasviel · 2023-04-01T10:35:35Z

We are also considering MediaPipe. perhaps we need to take a look at whether the installation of the dependency MediaPipe is robust on different platforms. It seems that MediaPipe does not require compiling but we need to make sure about that.

chris-crucible · 2023-04-01T16:32:22Z

We are also considering MediaPipe. perhaps we need to take a look at whether the installation of the dependency MediaPipe is robust on different platforms. It seems that MediaPipe does not require compiling but we need to make sure about that.

Thanks for raising the point. We haven't seen issues on Windows or Linux yet, but we haven't tested extensively on multiple systems. Let us know what we can do to help!

Datou · 2023-04-03T03:16:35Z

It doesn't work :(

ostap667inbox · 2023-04-03T07:37:48Z

It doesn't work :(

How about using 768x768 resolution for SD v2.1_768?
Also, is the correct cldm_v21.yaml configuration file in 'settings'?

lllyasviel · 2023-04-03T11:54:26Z

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

Also, please consider the name "mediapipe_face" like controlnet_sd21_mediapipe_face_X.

josephcatrambone-crucible · 2023-04-03T17:25:29Z

... It doesn't work :(

SD v2.1_768 has a hard time making 512x512 images. It may be better to try SD v2.1 BASE.

SD v2.1_768 不擅长做这512x512. 我们使用 https://huggingface.co/stabilityai/stable-diffusion-2-1-base .

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

Also, please consider the name "mediapipe_face" like controlnet_sd21_mediapipe_face_X.

Noted! I think we can make that change. Thank you.

killporter · 2023-04-03T22:18:40Z

why there is no annotator to use?

josephcatrambone-crucible · 2023-04-03T23:39:53Z

I've updated our model repo (https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace) to be consistent with ControlNet's naming scheme.

The model is now called "control_mediapipe_face_sd21_v2.safetensor|pt|yaml". We've also added the yaml file to the remote side.

Are there additional recommended changes on this branch?

FurkanGozukara · 2023-04-03T23:40:50Z

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

josephcatrambone-crucible · 2023-04-03T23:44:43Z

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

We have an SD1.5 version training. It's a few hundred hours in, but it's not ready yet. When that's ready we'll push it up to HuggingFace. The SD2.1 model works pretty well. It's based on SD2.1 512-base.

Some folks have reported success with using custom SD2.1 based models. I've not tried them.

FurkanGozukara · 2023-04-04T01:31:51Z

can this be used on custom models? sd 1.5 version? model we trained on? @josephcatrambone-crucible i am gonna test it now

We have an SD1.5 version training. It's a few hundred hours in, but it's not ready yet. When that's ready we'll push it up to HuggingFace. The SD2.1 model works pretty well. It's based on SD2.1 512-base.

Some folks have reported success with using custom SD2.1 based models. I've not tried them.

looking forward to sd 1.5. so once sd 1.5 released will be able to use it on our custom trained models?

by the way sd 2.1 works perfect

chris-crucible · 2023-04-04T17:19:29Z

looking forward to sd 1.5. so once sd 1.5 released will be able to use it on our custom trained models?

Yep any custom models should work as long as they match the base model we trained on. So custom 2.1 models should work with the one we released already. Custom 1.5 models should work with our 1.5 model when we release it.

ostap667inbox · 2023-04-04T17:57:12Z

this webui plugin can read overwritten configs. Please consider copy cldm_v21.yaml to "controlnet_sd21_laion_face_v2_prund.yaml" and then put it in the same folder with the model so that users do not need to set settings manually.

I have two questions.

Did I get it right that it is enough for the ControlNet plugin if the folder with the models contains the same number of yaml-files as the models and has the same names as the model files? And that is why it is possible to use models with different configuration files in Multi-ControlNet at the same time, e.g. t2iadapter_keypose and t2iadapter_style?
Is it necessary to leave these two text fields blank in 'Settings' in this case? If these two text boxes are filled in, but at the same time the configuration files are renamed according to the model names, which will have the highest priority?

chris-crucible · 2023-04-04T18:25:03Z

I have two questions.

Did I get it right that it is enough for the ControlNet plugin if the folder with the models contains the same number of yaml-files as the models and has the same names as the model files? And that is why it is possible to use models with different configuration files in Multi-ControlNet at the same time, e.g. t2iadapter_keypose and t2iadapter_style?

Is it necessary to leave these two text fields blank in 'Settings' in this case? If these two text boxes are filled in, but at the same time the configuration files are renamed according to the model names, which will have the highest priority?

Yes that's correct. As long as each model has a corresponding yaml with the same name, it will use that rather than the default in the settings. You can leave the defaults in the settings alone.

josephcatrambone-crucible · 2023-04-06T20:05:38Z

We just pushed the SD 1.5 version of this model to HuggingFace Hub. 🥳

Let us know if there's anything we can do to help this PR get merged.

lllyasviel · 2023-04-06T23:53:05Z

I think official controlnet1.1 will also come out these days - hopefully next week.
That should be a good time to merge a bunch of new annotators.
Wait for us a bit and i believe there is also a pull request about zoe depth

nemilya · 2023-04-07T08:52:55Z

Thanks for amazing feature!

Just successfully tested it (based on SD1.5) on MacBookPro Apple M2 (OS Monterey 12.6.3, 8Gb), the only issue was installation mediapipe (after starting webui):

stderr: ERROR: Could not find a version that satisfies the requirement mediapipe==0.9.1.0 (from versions: none)
ERROR: No matching distribution found for mediapipe==0.9.1.0

Was solved by manual replacement mediapipe==0.9.1.0 to mediapipe-silicon==0.9.1 in requirements.txt

ps: also there is log message after start Generation (with enabled ControlNet and mediapipe_face preprocessor):

...
ControlNet model control_mediapipe_face_sd15_v2 [9c7784a9] loaded.
Loading preprocessor: mediapipe_face
objc[26962]: Class CaptureDelegate is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d25a0) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_videoio.3.4.16.dylib (0x2944e0860). One of the two will be used. Which one is undefined.
objc[26962]: Class CVWindow is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d25f0) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fca68). One of the two will be used. Which one is undefined.
objc[26962]: Class CVView is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d2618) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fca90). One of the two will be used. Which one is undefined.
objc[26962]: Class CVSlider is implemented in both /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x2877d2640) and /Users/nemilya/stable-diffusion-webui/venv/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x2906fcab8). One of the two will be used. Which one is undefined.
...

but this not cause any issues.

josephcatrambone-crucible · 2023-04-10T17:30:07Z

It looks like mediapipe has a completely separate branch for Apple silicon because it was set up by a third party maintainer. My reading of it is they're looking to merge the automatic build into the normal mediapipe flow. We can either change the requirement to this:

mediapipe>=0.8.9; platform_system != "Darwin" and platform.machine != 'arm64'
mediapipe-silicon>=0.8.9; platform_system == "Darwin" and platform.machine == 'arm64'

which might break because the setup script just iterates over the lines and does a split, or we can wait for Google to update their build pipeline. Here's the tracking issue: google-ai-edge/mediapipe#3277

enternalsaga · 2023-04-11T14:50:26Z

hello @josephcatrambone-crucible, for some reasons, mediapipe only works with output size as 512:768 or 768:512, I tried other sizes but always get RuntimeError: The expanded size of the tensor (2560) must match the existing size (4608) at non-singleton dimension 1. Target sizes: [2, 2560, 320]. Tensor sizes: [1, 4608, 1], or ZeroDivisionError: division by zero.
https://imgur.com/a/JIjvOow
https://imgur.com/a/5erZVko

josephcatrambone-crucible · 2023-04-12T18:06:21Z

hello @josephcatrambone-crucible, for some reasons, mediapipe only works with output size as 512:768 or 768:512, I tried other sizes but always get RuntimeError: The expanded size of the tensor (2560) must match the existing size (4608) at non-singleton dimension 1. Target sizes: [2, 2560, 320]. Tensor sizes: [1, 4608, 1], or ZeroDivisionError: division by zero. https://imgur.com/a/JIjvOow https://imgur.com/a/5erZVko

Thanks for the report. 🤔 We tried some intermediate sizes and didn't see this. Did you change the annotator resolution or the input resolution? Perhaps we should force the mediapipe to use a 512x512 image.

Quick questions: are you on Apple Silicon? Are you using the SD1.5 or SD2.1-base model?

sabbih-shah · 2023-04-12T19:57:19Z

@josephcatrambone-crucible, So, the size issue is due to the resize function only resizing along a single dimension:

line 237 in scripts/processor.py
img = resize_image(HWC3(img), res)

The resize function given an example input produces the following:

Input_shape: (3865, 2576, 3)
resized_shape: (768, 512, 3)

you can replace the resize_image function with this one:

def resize_image(image, target_width, target_height):
    # Get the dimensions of the original image
    height, width, channels = image.shape

    # Calculate the aspect ratio
    aspect_ratio = width / height

    # Calculate the new dimensions based on the aspect ratio and the desired size
    if target_width / target_height > aspect_ratio:
        new_width = int(target_height * aspect_ratio)
        new_height = target_height
    else:
        new_width = target_width
        new_height = int(target_width / aspect_ratio)

    # Resize the image while maintaining the aspect ratio
    resized_image = cv2.resize(image, (new_width, new_height))

    # Add padding or crop the image to the desired size
    if target_width / target_height > aspect_ratio:
        padding = int((target_width - new_width) / 2)
        resized_image = cv2.copyMakeBorder(resized_image, 0, 0, padding, target_width - new_width - padding, cv2.BORDER_CONSTANT)
    else:
        padding = int((target_height - new_height) / 2)
        resized_image = cv2.copyMakeBorder(resized_image, padding, target_height - new_height - padding, 0, 0, cv2.BORDER_CONSTANT)

    return resized_image

This should handle different height and width combos while maintaining aspect ratio and not stretching the mask. For example:

image = resize_image(HWC3(image), target_width=512, target_height=512)
Input_shape: (3865, 2576, 3)
resized_shpae: (512, 512, 3)

image = resize_image(HWC3(image), target_width=512, target_height=768)
Input_shape: (3865, 2576, 3)
resized_shpae: (768, 512, 3)

josephcatrambone-crucible · 2023-04-12T20:06:15Z

Nice find. There are a LOT of the other processors using the resize_image (canny, simple_scribble, hed, mlsd, midas, leres, openpose, uniformer, pidinet, clip, and binary). I feel like perhaps we should do a separate PR to resolve the scaling issue because this will hit more than just us.

As an off-hand, I'm surprised to hear that resize_image only does one axis. From the implementation it looks like it will be handling aspect ratio and all that:

def resize_image(input_image, resolution):
    H, W, C = input_image.shape
    H = float(H)
    W = float(W)
    k = float(resolution) / min(H, W)
    H *= k
    W *= k
    H = int(np.round(H / 64.0)) * 64
    W = int(np.round(W / 64.0)) * 64
    img = cv2.resize(input_image, (W, H), interpolation=cv2.INTER_LANCZOS4 if k > 1 else cv2.INTER_AREA)
    return img

sabbih-shah · 2023-04-12T20:24:00Z

Hmm, we could use a separate resize function for media pipe. But that wouldn't be a good practice I believe. A separate PR does make more sense here. I found the scaling issue while implementing a pre-processor class to use with diffusers. Hopefully, this gets fixed. And, thanks for the nice work on expressions.

lllyasviel · 2023-04-13T06:09:17Z

hello we recommend to follow the naming standard of controlnet 1.1. We will begin to merge models these days.

josephcatrambone-crucible · 2023-04-13T16:44:18Z

Congratulations! That's a big release. We'll work on getting those model names updated. There are a few applications that are already using the model as named in the repo, but I think we can find a solution.

EDIT: Do we need to make changes to this PR aside from the name update to be compatible with the ControlNet 1.1 release?

josephcatrambone-crucible · 2023-04-14T01:16:27Z

I've renamed our models to be consistent with the ControlNet 1.1 scheme: https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace/commit/6948da26359817bd4f366a9549fb094091560623

lllyasviel · 2023-04-14T05:22:07Z

hello, for models marked as [p] production-ready, we will test them with a few cases, eg, a few non-cherry-picked random batch with seed 12345. [e/u] models does not need this. This may take one or two working days.

lllyasviel · 2023-04-15T23:07:44Z

hello we have comfirmed that this can be merged.
Please solve conflicts and we will merge as soon as possible when conflicts resolved

josephcatrambone-crucible · 2023-04-17T16:25:34Z

On it!

josephcatrambone-crucible · 2023-04-17T17:07:03Z

Changes are made and sanity checked on the local system. SD 1.5 and SD 2.1 models are both behaving okay from what I can tell.

josephcatrambone-crucible added 14 commits March 30, 2023 16:30

Add MediaPipe Face conditioning as a step.

78833dc

Add mediapipe to requirements.txt

9f02579

Fix processor script + init to correctly generate processed face.

a2c1561

Correctly return image. Perform a copy of detected_map because of 'Va…

869602f

…lueError: At least one stride in the given numpy array is negative'. Will have to ask why this happens.

Move the copy from controlnet's internals back into laion_face_common…

f4d7285

… to see if we can alleviate the negative stride without changing the codebase.

Rename mediapipe_laion_face to mediapipe_face.

3cd07a6

Missed a few references. Renaming.

eeb27da

Update UI to allow number of faces and confidence to be selected.

42a25e0

Make min_confidence a required parameter. Change the default max face…

4adb9a3

…s to 10.

Default to 0.5 confidence. Handle an edge case where result set is nu…

51cbd17

…ll rather than empty.

Since we're already diverging from the original, might as well remove…

132bcb4

… vestigial annotation stuff.

Forgot to connect UI components.

50a4865

Fix issue with thresholds not being accepted by mediapipe function.

f2c6341

Clarify error message and remove PIL import.

603449c

josephcatrambone-crucible marked this pull request as ready for review March 31, 2023 19:32

Merge upstream changes and resolve merge conflicts.

385b67c

lllyasviel merged commit b47172d into Mikubill:main Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MediaPipe Face Control #688

Add MediaPipe Face Control #688

josephcatrambone-crucible commented Mar 31, 2023 •

edited

Loading

josephcatrambone-crucible commented Mar 31, 2023 •

edited

Loading

lllyasviel commented Apr 1, 2023

chris-crucible commented Apr 1, 2023

Datou commented Apr 3, 2023 •

edited

Loading

ostap667inbox commented Apr 3, 2023 •

edited

Loading

lllyasviel commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023 •

edited

Loading

killporter commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023 •

edited

Loading

FurkanGozukara commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023

FurkanGozukara commented Apr 4, 2023

chris-crucible commented Apr 4, 2023

ostap667inbox commented Apr 4, 2023

chris-crucible commented Apr 4, 2023

josephcatrambone-crucible commented Apr 6, 2023

lllyasviel commented Apr 6, 2023

nemilya commented Apr 7, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 10, 2023

enternalsaga commented Apr 11, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 12, 2023

sabbih-shah commented Apr 12, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 12, 2023 •

edited

Loading

sabbih-shah commented Apr 12, 2023 •

edited

Loading

lllyasviel commented Apr 13, 2023

josephcatrambone-crucible commented Apr 13, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 14, 2023

lllyasviel commented Apr 14, 2023

lllyasviel commented Apr 15, 2023

josephcatrambone-crucible commented Apr 17, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 17, 2023

Add MediaPipe Face Control #688

Add MediaPipe Face Control #688

Conversation

josephcatrambone-crucible commented Mar 31, 2023 • edited Loading

josephcatrambone-crucible commented Mar 31, 2023 • edited Loading

lllyasviel commented Apr 1, 2023

chris-crucible commented Apr 1, 2023

Datou commented Apr 3, 2023 • edited Loading

ostap667inbox commented Apr 3, 2023 • edited Loading

lllyasviel commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023 • edited Loading

killporter commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023 • edited Loading

FurkanGozukara commented Apr 3, 2023

josephcatrambone-crucible commented Apr 3, 2023

FurkanGozukara commented Apr 4, 2023

chris-crucible commented Apr 4, 2023

ostap667inbox commented Apr 4, 2023

chris-crucible commented Apr 4, 2023

josephcatrambone-crucible commented Apr 6, 2023

lllyasviel commented Apr 6, 2023

nemilya commented Apr 7, 2023 • edited Loading

josephcatrambone-crucible commented Apr 10, 2023

enternalsaga commented Apr 11, 2023 • edited Loading

josephcatrambone-crucible commented Apr 12, 2023

sabbih-shah commented Apr 12, 2023 • edited Loading

josephcatrambone-crucible commented Apr 12, 2023 • edited Loading

sabbih-shah commented Apr 12, 2023 • edited Loading

lllyasviel commented Apr 13, 2023

josephcatrambone-crucible commented Apr 13, 2023 • edited Loading

josephcatrambone-crucible commented Apr 14, 2023

lllyasviel commented Apr 14, 2023

lllyasviel commented Apr 15, 2023

josephcatrambone-crucible commented Apr 17, 2023 • edited Loading

josephcatrambone-crucible commented Apr 17, 2023

josephcatrambone-crucible commented Mar 31, 2023 •

edited

Loading

josephcatrambone-crucible commented Mar 31, 2023 •

edited

Loading

Datou commented Apr 3, 2023 •

edited

Loading

ostap667inbox commented Apr 3, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 3, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 3, 2023 •

edited

Loading

nemilya commented Apr 7, 2023 •

edited

Loading

enternalsaga commented Apr 11, 2023 •

edited

Loading

sabbih-shah commented Apr 12, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 12, 2023 •

edited

Loading

sabbih-shah commented Apr 12, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 13, 2023 •

edited

Loading

josephcatrambone-crucible commented Apr 17, 2023 •

edited

Loading