RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead #10813

Dmitr15 · 2025-02-17T18:22:52Z

Dmitr15
Feb 17, 2025

I have problem with AI models.
My code:

import torch
import numpy as np
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline
import cv2
import os
from dotenv import load_dotenv
from segment_anything import sam_model_registry, SamPredictor

load_dotenv()
REV_ANIMATED_MODEL_PATH = os.getenv('REV_ANIMATED_MODEL_PATH')
KANDINSKY_MODEL_PATH = os.getenv('KANDINSKY_MODEL_PATH')
VAE_MODEL_PATH = os.getenv('VAE_MODEL_PATH')
SAM_MODEL_PATH = os.getenv("SAM_MODEL_PATH")
MODEL_TYPE = "vit_b"
CHECKPOINT_PATH = 'sam_vit_b_01ec64.pth'
SDV5_MODEL_PATH = os.getenv('SDV5_MODEL_PATH')
img = 'inpaint-example.png'

#mask generation function
def mask_generator(img):
image = cv2.imread(img)
image_rgb = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

sam = sam_model_registry[MODEL_TYPE ](checkpoint=CHECKPOINT_PATH)
sam.to(device='cuda')
mask_predictor = SamPredictor(sam)
mask_predictor.set_image(image_rgb)
input_point = np.array([[250, 250]])
input_label = np.array([1])
masks, scores, logits = mask_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=False,
)
mask = masks.astype(float) * 255
mask = np.transpose(mask, (1, 2, 0))
_, bw_image = cv2.threshold(mask, 100, 255, cv2.THRESH_BINARY)
cv2.imwrite('mask.png', bw_image)

#inpainting function
def inpaint(init_img, mask):
init_image = Image.open(init_img)
mask_image = Image.open(mask)
pipe = StableDiffusionInpaintPipeline.from_pretrained(
SDV5_MODEL_PATH,
use_safetensors=True,
torch_dtype=torch.float32
).to('cpu')
negative_prompt = 'ugly'
prompt = "a grey cat sitting on a bench, high resolution"
image = pipe(prompt=prompt,
negative_prompt=negative_prompt,
image=init_image,
mask_image=mask_image
).images[0]
image.save('output.png')

mask_generator(img)
inpaint(img, 'mask.png')

And i can see this problem in terminal: RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead

Can anyone help?

asomoza · 2025-02-18T02:34:38Z

asomoza
Feb 18, 2025
Maintainer

Hi, your problem is with the mask it seems, the error it's pretty self explanatory, you're using a 3 channel image (RGB) when it needs a 1 channel one (grayscale), also it seems the dimensions are wrong.

You can start by looking at mask.png to see if it's really a good mask image (same dimensions as init_image and a grayscale image)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead #10813

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead #10813

Uh oh!

Dmitr15 Feb 17, 2025

Replies: 1 comment

Uh oh!

asomoza Feb 18, 2025 Maintainer

Dmitr15
Feb 17, 2025

asomoza
Feb 18, 2025
Maintainer