Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CogView4 model support #7770

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from
Draft

Add CogView4 model support #7770

wants to merge 38 commits into from

Conversation

RyanJDick
Copy link
Contributor

@RyanJDick RyanJDick commented Mar 12, 2025

Summary

Add support for the CogView4 model in nodes.

Example workflows:

Example

Expanded prompt:

A whimsical stuffed gnome sits on a golden sandy beach, its plush fabric slightly textured and well-worn. The gnome has a round, cheerful face with a fluffy white beard, a bulbous nose, and a tall, slightly floppy red hat with a few decorative stitching details. It wears a tiny blue vest over a soft, earthy-toned tunic, and its stubby arms grasp a ripe yellow banana with a few brown speckles. The ocean waves gently roll onto the shore in the background, with turquoise water reflecting the warm glow of the late afternoon sun. A few scattered seashells and driftwood pieces are near the gnome, while a colorful beach umbrella and footprints in the sand hint at a lively beach scene. The sky is a soft pastel blend of pink, orange, and light blue, with wispy clouds stretching across the horizon.

Result:
image

Follow-up work

Related Issues / Discussions

N/A

QA Instructions

  • Regression test FLUX and SD3 inpainting (since the inpainting extension code was consolidated)
  • Smoke test text-to-image with progress images for all base model types (since the progress image code was refactored a bit)
  • Install CogView4 model via Starter Models tab
  • CogView4 appears in the model list
  • Test CogView4 text-to-image workflow
  • Test CogView4 image-to-image workflow
  • Test CogView4 inpainting workflow
  • Test that CogView4 denoising can be cancelled mid-execution
  • Test CogView4 progress images (they work, but it looks like there's room for improvement with further tuning)
  • Test CogView4 partial loading (constrain available VRAM to ~6GB)

Merge Plan

  • This PR pins an arbitrary diffusers commit to get access to the CogView4 model. We must decide if we are ok with this, or want to wait for an official release.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-deps PRs that change python dependencies labels Mar 12, 2025
@@ -38,7 +38,7 @@ dependencies = [
"clip_anytorch==2.6.0", # replacing "clip @ https://github.com/openai/CLIP/archive/eaa22acb90a5876642d0507623e859909230a52d.zip",
"compel==2.0.2",
"controlnet-aux==0.0.7",
"diffusers[torch]==0.31.0",
"diffusers[torch] @ git+https://github.com/huggingface/diffusers.git@fbf6b856cc61fd22ad8635547bff4aafe05723f3", # We are pinning to a commit to get access to CogView4, which hasn't been released yet.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide if we are comfortable with this, or want to wait for the next diffusers release.

@psychedelicious
Copy link
Collaborator

psychedelicious commented Mar 18, 2025

While reviewing the CogView4 HF repo, I noticed this inference restriction:

Resolution: Width and height must be between 512px and 2048px, divisible by 32, and ensure the maximum number of pixels does not exceed 2^21 px.

See: https://huggingface.co/THUDM/CogView4-6B#inference-requirements-and-model-introduction

This introduces a new type of constraint. You'd expect the max dimensions to be 2048 x 2048, but that is 4,194,304 pixels, which exceeds max pixel count of 2 ^ 21 = 2,097,152. So we may need to make some changes to dimension constraints to support CogView4.

Also, image sizes must be divisible by 32. This needs to be handled in a number of areas.

Note: I'm still downloading the model - slow internet today - so I haven't actually tested yet. Maybe this is a non-issue. Just reviewing the model docs and taking notes.

@psychedelicious
Copy link
Collaborator

The max number of pixels requirement seems to be fake news. I can generate largeer images than 1024x2048, though I OOM with 24GB VRAM around 1700x2000 on VAE decode.

I've added checks for the dimensions.

@psychedelicious psychedelicious marked this pull request as draft March 18, 2025 04:28
This doesn't make sense to have as a default workflow given the trickiness of producing alpha masks.
@psychedelicious
Copy link
Collaborator

  • Tested text to image, image to image, inpainting/outpainting - all working well.
  • Added CogView4 Text to Image to default workflows.

This PR has diffusers pinned to a pre-release commit. We are on diffusers==0.31.0 right now - about 5 months old.

Feels risky to merge this and release w/ an unreleased, potentially unstable diffusers dep. Let's wait for the next stable diffusers release and do thorough testing before merging this PR.

Marked as draft to prevent premature merge.

@a-r-r-o-w
Copy link

Great to see InvokeAI support for CogView4! We'll try to do a diffusers release asap to unblock this (hopefully next week) 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants