Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Please add my extension to the Extensions Index :) #5037

Closed
1 task done
Extraltodeus opened this issue Nov 24, 2022 · 37 comments
Closed
1 task done

Comments

@Extraltodeus
Copy link
Contributor

Extraltodeus commented Nov 24, 2022

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I made that extension and wish to have it added to the index so people could install it automatically :)

It is a depth aware extension that can help to create multiple complex subjects on a single image.

It generates a background, then multiple foreground subjects, cuts their backgrounds after a depth analysis, paste them onto the background and finally does an img2img for a clean finish.

You can use it to make more complex images or simply full blown waifus harems. Your choice.
Example :
Distracted boyfriend meme :

image

KITTENS deprived of oxygen :

image

Multiple characters with various colors or features :

image

It does not add a new tab. So I guess just the "script" tag is necessary.

@bbecausereasonss
Copy link

SUPER useful!

@rlayne
Copy link

rlayne commented Nov 24, 2022

it would be great to decouple the background / foreground seperator and extractor so that it can be run as a standalone thing, if that's not already possible.

@Echolink50
Copy link

Wow. Very cool. All this time I have been keying out blue/green backgrounds. I do wonder though, is this the same as the new depth2img in 2.0

@Extraltodeus
Copy link
Contributor Author

Extraltodeus commented Nov 24, 2022

No but it could be implemented. The depth to image feature seems to be using a depth map as a mask for img2img. Probably that replacing the depth by transparency does the trick.

What I do here is cut the background out of the image by using the depth mask. It could be seen as some sort of outpainting I guess.

@ClashSAN
Copy link
Collaborator

ClashSAN commented Nov 24, 2022

Wow! What a feature! By cropping and pasting the depth map, it could move a single subject slightly to the right, am I correct? It has the potential to fully edit, resize, invert, shift, etc?

Future tech: I could imagine when paired with a language model, you could say:
-move the subject 20 px right
-make the subject smaller
-a little bigger
-flip the subject
-all done

@Extraltodeus
Copy link
Contributor Author

Thanks! Yes indeed! So far my script is dividing the different foregrounds along the width of the background but using the center as a reference.

The yeah for sure I could do more simple operations on these foreground subjects. I was also maybe thinking about creating a mask generator using the depth maps for img2img masks and we would get the same depth aware img2img feature as the 2.0.

Should actually be pretty fucking easy as long as the mask feature takes the transparency into account (I haven't used it much tho)

@Extraltodeus
Copy link
Contributor Author

-move the subject 20 px right -make the subject smaller -a little bigger -flip the subject -all done

I would rather use sliders for that and a few pillow functions :D

@ClashSAN
Copy link
Collaborator

well, take your time, the feature you have now are already awesome!

@SuperFurias
Copy link

The last image seems more like "actual paradise obtainable"

@Extraltodeus
Copy link
Contributor Author

Hey can someone tell me where the mask image hides in p. during img2img? Somehow p.mask gives me "none type" if I do a print(type(p.mask)) even while having a mask

@Extraltodeus
Copy link
Contributor Author

00087-2723791100-DPM++ 2M-42-12 5-ac07d41f-20221124201126

index

I can already make a transparent mask related to the depth but can't find how to assign it before processing.

@Extraltodeus
Copy link
Contributor Author

Ok found it
image

@Extraltodeus
Copy link
Contributor Author

So basically if you run this by using the "custom code" custom script you can already get an img2img depth aware inpainting :

from modules.shared import opts, cmd_opts, state
from modules.processing import Processed, StableDiffusionProcessingImg2Img, process_images, images
from PIL import Image
import importlib.util
import copy


def run(p):
    def module_from_file(module_name, file_path):
        spec = importlib.util.spec_from_file_location(module_name, file_path)
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
        return module
    def cut_depth_mask(img):
        img = copy.deepcopy(img.convert("RGBA"))
        mask_img = copy.deepcopy(img.convert("L"))
        mask_datas = mask_img.getdata()
        datas = img.getdata()
        newData = []
        for i in range(len(mask_datas)):
            newData.append((datas[i][0],datas[i][1],datas[i][2],255-mask_datas[i]))
        img.putdata(newData)
        return img
    initial_CLIP = opts.data["CLIP_stop_at_last_layers"]
    sdmg = module_from_file("simple_depthmap",'extensions/multi-subject-render/scripts/simple_depthmap.py')
    sdmg = sdmg.SimpleDepthMapGenerator() #import midas
    d_m = sdmg.calculate_depth_map_for_waifus(p.init_images[0])
    d_m = cut_depth_mask(d_m)
    p.image_mask = d_m
    proc = process_images(p)
    return proc
run(p)

You need my extension to be already installed before because it refers to the simple_depthmap script. Just using the same function as my extension.

@Extraltodeus
Copy link
Contributor Author

Note : the depth precision depends on which midas model is used. So right now I'm using the small and fast one which is mostly good a guessing the front/background. Same levels of precision as SD2.0 can be reached with a few more tweaks.

@AugmentedRealityCat
Copy link

I've played with 4 different MiDaS models and so far the best was almost always DPT-Hybrid. Try it if you haven't already. It works very well with the depthmap script over here https://github.com/thygate/stable-diffusion-webui-depthmap-script

@Extraltodeus
Copy link
Contributor Author

I took the code for the depth analysis from there after asking thygate :)

I only implement the small model because it suited the needs of my idea but indeed that the next part of the plan.

I don't remember how that model compares to the big one.

@AugmentedRealityCat
Copy link

Even though DPT-Hybrid gave me the most usable results 90% of the time, there were still 1 case out of 10 that were better served with one of the 3 alternatives, so it's clearly better to give the user the ability to select any of those 4 models.
I wonder if there are others ?

Another project you should look at if you are interested by this depthmap generation procedure is this one, which takes it some step further and extract a 3d model from the depthmap (with color info recorded as vertex colors) and then creates simple videos to demonstrate the 3d effect.

https://github.com/donlinglok/3d-photo-inpainting/commits/master

I helped donlinglok debug the branch he adapted for this to work locally on windows and it now works perfectly. I used to have to use a collab to get these kind of results, and I could barely believe my eyes when I finally got it to work on my machine (8 GB VRAM).

@Extraltodeus
Copy link
Contributor Author

Extraltodeus commented Nov 24, 2022

I'm interested in testing a colab with that!

I initially got inspired by this repository which I had in my bookmarks since something like 2 years and only tested a few months ago when Stable Diffusion came out! My first idea was to try to integrate that into SD or find some similar way to use midas! I will definitely take a look into it!

So far I've made pixels move based on their depth which is way below the level of anything that I've seen yet.

so it's clearly better to give the user the ability to select any of those 4 models

Indeed. I can't add such feature without giving that option.

Edit : oh lol that's the same repository!

@Echolink50
Copy link

While the experts are in here, is it possible to get an alpha channel in the png with any of this?

@AugmentedRealityCat
Copy link

What do you want to do exactly ? In most cases, it's better to have the alpha channel in a separate image.

@Echolink50
Copy link

What do you want to do exactly ? In most cases, it's better to have the alpha channel in a separate image.
I just mean get the subject of the image without the background for compositing later

@AugmentedRealityCat
Copy link

Can't you do that with the extension proposed in this thread ? (I haven't tested it yet so I may be wrong but it already does so much more than that - it's nothing less than an image compositing extension !)

One way I've been doing it manually is by using a depthmap and then turn that into a mask by adjusting the levels to make the area I want to keep white, and the rest of it black. But you need to do that in another application - something like photoshop or gimp or krita.

Since a depthmap starts with white for objects very close to the camera and then gradually darkens to black as objects are positioned further and further away from the camera, you can key out the background by making everything darker than a given shade black, and making all pixels that are brighter than this threshold full-white. This is your depth-based alpha channel. You can then import it as a mask for inpainting. Or to composite images together in some other app.

@Echolink50
Copy link

Thanks. I was using it for video. Maybe it can be done in those ways but just having an image sequence with alpha seems a lot easier.

@Extraltodeus
Copy link
Contributor Author

I'm having a bit of issues with the hybrid model so far.

I made this to be able to test it fully from the custom code feature so I don't need to reload.

So with this :

from modules.shared import opts, cmd_opts, state
from modules.processing import Processed, StableDiffusionProcessingImg2Img, process_images, images, shared
from PIL import Image
import importlib.util
import copy
import torch
import cv2
import requests
import os.path
import contextlib
from torchvision.transforms import Compose
from repositories.midas.midas.dpt_depth import DPTDepthModel
from repositories.midas.midas.midas_net import MidasNet
from repositories.midas.midas.midas_net_custom import MidasNet_small
from repositories.midas.midas.transforms import Resize, NormalizeImage, PrepareForNet
import numpy as np

def calculate_depth_maps(image,img_x,img_y):
      try:
          def download_file(filename, url):
              print("Downloading midas model weights to %s" % filename)
              with open(filename, 'wb') as fout:
                  response = requests.get(url, stream=True)
                  response.raise_for_status()
                  # Write response data to file
                  for block in response.iter_content(4096):
                      fout.write(block)

          # model path and name
          model_dir = "./models/midas"
          # create path to model if not present
          os.makedirs(model_dir, exist_ok=True)
          print("Loading midas model weights ..")
          model_path = f"{model_dir}/dpt_hybrid-midas-501f0c75.pt"
          print(model_path)
          if not os.path.exists(model_path):
              download_file(model_path,"https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid-midas-501f0c75.pt")
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          print("device: %s" % device)
          model = DPTDepthModel(
              path=model_path,
              backbone="vitb_rn50_384",
              non_negative=True,
          )
          net_w, net_h = 384, 384
          resize_mode="minimal"
          normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
          # init transform
          transform = Compose(
              [
                  Resize(
                      384,
                      384,
                      resize_target=None,
                      keep_aspect_ratio=True,
                      ensure_multiple_of=32,
                      resize_method=resize_mode,
                      image_interpolation_method=cv2.INTER_CUBIC,
                  ),
                  normalization,
                  PrepareForNet(),
              ]
          )
          model.eval()
          # optimize
          if device == torch.device("cuda"):
              model = model.to(memory_format=torch.channels_last)
              if not cmd_opts.no_half:
                  model = model.half()
          model.to(device)
          img = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB) / 255.0
          img_input = transform({"image": img})["image"]

          precision_scope = torch.autocast if shared.cmd_opts.precision == "autocast" and device == torch.device("cuda") else contextlib.nullcontext
          # compute
          with torch.no_grad(), precision_scope("cuda"):
              sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
              if device == torch.device("cuda"):
                  sample = sample.to(memory_format=torch.channels_last)
                  if not cmd_opts.no_half:
                      sample = sample.half()
              prediction = model.forward(sample)
              prediction = (
                  torch.nn.functional.interpolate(
                      prediction.unsqueeze(1),
                      size=img.shape[:2],
                      mode="bicubic",
                      align_corners=False,
                  )
                  .squeeze()
                  .cpu()
                  .numpy()
              )

          # output
          depth = prediction
          numbytes=2
          depth_min = depth.min()
          depth_max = depth.max()
          max_val = (2**(8*numbytes))-1

          # check output before normalizing and mapping to 16 bit
          if depth_max - depth_min > np.finfo("float").eps:
              out = max_val * (depth - depth_min) / (depth_max - depth_min)
          else:
              out = np.zeros(depth.shape)
          # single channel, 16 bit image
          img_output = out.astype("uint16")

          # # invert depth map
          # img_output = cv2.bitwise_not(img_output)

          # three channel, 8 bits per channel image
          img_output2 = np.zeros_like(image)
          img_output2[:,:,0] = img_output / 256.0
          img_output2[:,:,1] = img_output / 256.0
          img_output2[:,:,2] = img_output / 256.0
          img = Image.fromarray(img_output2)
          return img
      except Exception:
          raise
      finally:
          del model

def run(p):
    def module_from_file(module_name, file_path):
        spec = importlib.util.spec_from_file_location(module_name, file_path)
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
        return module
    def cut_depth_mask(img):
        img = copy.deepcopy(img.convert("RGBA"))
        mask_img = copy.deepcopy(img.convert("L"))
        mask_datas = mask_img.getdata()
        datas = img.getdata()
        newData = []
        for i in range(len(mask_datas)):
            newData.append((datas[i][0],datas[i][1],datas[i][2],255-mask_datas[i]))
        img.putdata(newData)
        return img
    d_m = calculate_depth_maps(p.init_images[0],p.width,p.width)
    d_m = cut_depth_mask(d_m)
    p.image_mask = d_m
    proc = process_images(p)
    return proc

run(p)

I get that :

Traceback (most recent call last):
  File "/kaggle/working/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/kaggle/working/stable-diffusion-webui/webui.py", line 57, in f
    res = func(*args, **kwargs)
  File "/kaggle/working/stable-diffusion-webui/modules/img2img.py", line 137, in img2img
    processed = modules.scripts.scripts_img2img.run(p, *args)
  File "/kaggle/working/stable-diffusion-webui/modules/scripts.py", line 317, in run
    processed = script.run(p, *script_args)
  File "/kaggle/working/stable-diffusion-webui/scripts/custom_code.py", line 38, in run
    exec(compiled, module.__dict__)
  File "", line 147, in <module>
  File "", line 141, in run
  File "", line 82, in calculate_depth_maps
  File "/kaggle/working/stable-diffusion-webui/repositories/midas/midas/dpt_depth.py", line 108, in forward
    return super().forward(x).squeeze(dim=1)
  File "/kaggle/working/stable-diffusion-webui/repositories/midas/midas/dpt_depth.py", line 71, in forward
    layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)
  File "/kaggle/working/stable-diffusion-webui/repositories/midas/midas/vit.py", line 59, in forward_vit
    glob = pretrained.model.forward_flex(x)
  File "/kaggle/working/stable-diffusion-webui/repositories/midas/midas/vit.py", line 127, in forward_flex
    x = self.patch_embed.backbone(x)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/timm/models/resnetv2.py", line 409, in forward
    x = self.forward_features(x)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/timm/models/resnetv2.py", line 403, in forward_features
    x = self.stem(x)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/p310/lib/python3.10/site-packages/timm/models/layers/std_conv.py", line 70, in forward
    self.weight.view(1, self.out_channels, -1), None, None,
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Which seems to happen around here :

          with torch.no_grad(), precision_scope("cuda"):
              sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
              if device == torch.device("cuda"):
                  sample = sample.to(memory_format=torch.channels_last)
                  if not cmd_opts.no_half:
                      sample = sample.half()
              prediction = model.forward(sample)
              prediction = (
                  torch.nn.functional.interpolate(
                      prediction.unsqueeze(1),
                      size=img.shape[:2],
                      mode="bicubic",
                      align_corners=False,
                  )
                  .squeeze()
                  .cpu()
                  .numpy()
              )

Other than that with the other models I get results but the depthmaps are not super "deep".

Also it's not about the transparency for the mask but simply the grayscale.

image

image

image

@Extraltodeus
Copy link
Contributor Author

also for some reason inpaint at full resolution combined with the depth maps give me a weird transparency effect.
While not using it acts more as a normal mask.

See dj fluff with a gas mask :

image

image

it's not purrfect

@AugmentedRealityCat
Copy link

AugmentedRealityCat commented Nov 25, 2022

Thanks. I was using it for video. Maybe it can be done in those ways but just having an image sequence with alpha seems a lot easier.

I see and I understand what you mean. It should be possible to automate the process I was describing of turning a depthmap into a mask, and then assemble that channel with the RGB channels to create a new PNG.

The big limit caused by the inclusion of an alpha channel stored alongside RGB channels in a single PNG file is that the color information can be truncated in the parts of your image that are fully transparent. Sometimes that's what you want - it's a compression scheme to remove pixels that would not be seen and it does make the file size smaller. But if, for example, you have an airplane in the foreground and clouds in the background, you won't be able to maintain the cloud information of your image if your alpha channel is made to keep only the airplane foreground visible. It will be kept for semi-transparent pixels, but for completely transparent pixels the image is just replaced by (usually) pure white. At least in apps like Photoshop - there might be ways around this limit that I'm not aware of.

If you have your alpha channel on a separate image, then you can keep both the plane and its cloudy background together in the same image, and you also gain the ability to adjust the mask without having to re-encode your RGB image.

This is a limit of the PNG format, but other formats that do support alpha channels, such as TIFF, allow you to keep the full RGB information in the parts of your image that would be indexed as transparent.

So yes, for image sequences, it's more convenient to have everything assembled together (RGB+A) in single PNG, but if you do that, make sure you keep your original sequence somewhere in case you'd need to rework your alpha channel because if you don't you might be stuck with pixels that have been erased from your PNG sequence because you thought they were to be transparent.

What could help though is some option to add frame numbers to filenames when rendering sequences, and to let the user define the frame numbers, which would allow us to re-render only a part of an animation, or to append it later, without having to re-edit your sequences further down the production pipeline.

@a-l-e-x-d-s-9
Copy link

Can this extension do something similar to subject select feature in Photoshop?

@Extraltodeus
Copy link
Contributor Author

Can this extension do something similar to subject select feature in Photoshop?

I am sorry, I do not know what you mean as I do not use photoshop.

@Extraltodeus
Copy link
Contributor Author

Extraltodeus commented Nov 25, 2022

@AugmentedRealityCat would you know why I'm getting that error with the hybrid model?

@AugmentedRealityCat
Copy link

@AugmentedRealityCat would you know why I'm getting that error with the hybrid model?

I do not. Just to make sure, do you have this problem even with 512x512 as a resolution ?

@Extraltodeus
Copy link
Contributor Author

Extraltodeus commented Nov 25, 2022

Yes I do. But in the end I don't think that this is a big issue. I noticed that the big model gives quite high contrasts depth map (so low depth differences) and the small one has more "depth" to its depthmaps.

I also added a function to stretch out the depthmaps if really needed.

Currently creating a new repository. And I removed the hybrid.

@Extraltodeus
Copy link
Contributor Author

ok it's done
https://github.com/Extraltodeus/depthmap2mask

@Extraltodeus Extraltodeus changed the title [Feature Request]: Please add my extension to the Extensions Index :) [Feature Request]: Please add my extensionS to the Extensions Index :) Nov 25, 2022
@Extraltodeus Extraltodeus changed the title [Feature Request]: Please add my extensionS to the Extensions Index :) [Feature Request]: Please add my extension to the Extensions Index :) Nov 25, 2022
@Extraltodeus
Copy link
Contributor Author

I see that it has been added since a bit so I'm closing this request :)

@a-l-e-x-d-s-9
Copy link

a-l-e-x-d-s-9 commented Nov 25, 2022

@Extraltodeus

I am sorry, I do not know what you mean as I do not use Photoshop.

Basically, this feature allows the creation of a mask for whatever the subject of the image is and separates it pretty well from the background. It's an essential feature for asset generation and character generation with Stable Diffusion.
Examples:
image
image

@Extraltodeus
Copy link
Contributor Author

@a-l-e-x-d-s-9

My other extension does exactly that.

@a-l-e-x-d-s-9
Copy link

@Extraltodeus
Excellent, I'm definitely going to try it.

@donlinglok1
Copy link

donlinglok1 commented Nov 30, 2022

I'm interested in testing a colab with that!

I initially got inspired by this repository which I had in my bookmarks since something like 2 years and only tested a few months ago when Stable Diffusion came out! My first idea was to try to integrate that into SD or find some similar way to use midas! I will definitely take a look into it!

So far I've made pixels move based on their depth which is way below the level of anything that I've seen yet.

so it's clearly better to give the user the ability to select any of those 4 models

Indeed. I can't add such feature without giving that option.

Edit : oh lol that's the same repository!

@Extraltodeus Here is the colab of latest 3d-photo-inpainting with LeRes
https://colab.research.google.com/drive/1DEySGgVdZtpdy3aP0xP8Sa4yvYTxxV_y?usp=sharing

Thanks for @AugmentedRealityCat help!

And how I replace MiDas with LeRes
https://github.com/donlinglok/3d-photo-inpainting/pull/1/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants