Skip to content

Enable gemma3-4b-it in VLM Pipeline #2340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 116 commits into from
Aug 12, 2025

Conversation

yangsu2022
Copy link
Collaborator

Ticket: CVS-166388

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) no-match-files labels Jun 12, 2025
@sammysun0711 sammysun0711 requested a review from Copilot June 12, 2025 10:13
Copilot

This comment was marked as outdated.

@Wovchena Wovchena requested a review from yatarkan June 12, 2025 10:47
@rkazants rkazants requested review from Wovchena and rkazants June 12, 2025 17:44
@yangsu2022 yangsu2022 force-pushed the ys/vlm-gemma3-4b-it-25.3 branch from e524647 to f2676ed Compare June 13, 2025 05:43
Copilot

This comment was marked as outdated.

sammysun0711

This comment was marked as resolved.

@rkazants

This comment was marked as outdated.

@rkazants rkazants added this to the 2025.3 milestone Jun 13, 2025
yangsu2022 and others added 2 commits June 16, 2025 19:48
- Add virtual destructor

- Add random test model

- Refactor Gemma3 tokenization for image markers (extra_special_tokens):

  - boi_token: <start_of_image>

  - eoi_token: <end_of_image>
@yangsu2022

This comment was marked as outdated.

@yangsu2022

This comment was marked as outdated.

@yangsu2022

This comment was marked as outdated.

@Wovchena Wovchena requested a review from Copilot August 11, 2025 03:22
Copilot

This comment was marked as outdated.

@Wovchena Wovchena requested a review from Copilot August 11, 2025 06:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables support for the Gemma3-4b-it model in the Visual Language Model (VLM) pipeline. The implementation adds comprehensive support for Gemma3 including configuration classes, input embedders, vision encoders, and the necessary infrastructure for token type IDs which are required for distinguishing between text and image tokens in the multimodal Gemma3 model.

  • Addition of Gemma3 model type and configuration parameters for image processing and token handling
  • Implementation of specialized embedding classes for Gemma3 with token type ID support
  • Integration of Gemma3 image tag <start_of_image> in documentation and API references

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/python_tests/test_vlm_pipeline.py Adds test cases for tiny-random-gemma3 model with SDPA backend requirement
src/python/py_vlm_pipeline.cpp Updates documentation to include gemma-3-4b-it image tag
src/python/openvino_genai/py_openvino_genai.pyi Adds type stub documentation for Gemma3 image tag
src/cpp/src/visual_language/vlm_config.hpp Defines GEMMA3 model type and configuration parameters
src/cpp/src/visual_language/vlm_config.cpp Registers gemma3 model type mapping
src/cpp/src/visual_language/vision_encoder.cpp Adds VisionEncoderGemma3 creation logic
src/cpp/src/visual_language/processor_config.hpp Adds Gemma3-specific image size configuration
src/cpp/src/visual_language/processor_config.cpp Implements Gemma3 size parameter parsing
src/cpp/src/visual_language/pipeline.cpp Updates attention backend requirements for Gemma3
src/cpp/src/visual_language/llava_next/classes.cpp Moves merge function to utils namespace
src/cpp/src/visual_language/llava/classes.cpp Refactors to use shared merge utility function
src/cpp/src/visual_language/inputs_embedder.hpp Adds token type ID support interface
src/cpp/src/visual_language/inputs_embedder.cpp Implements token type ID functionality and Gemma3 embedder creation
src/cpp/src/visual_language/gemma3/classes.hpp Defines Gemma3 vision encoder and input embedder classes
src/cpp/src/visual_language/gemma3/classes.cpp Implements Gemma3-specific image processing and embedding logic
src/cpp/src/utils.hpp Declares shared merge function for text/image embeddings
src/cpp/src/utils.cpp Implements shared merge utility function
src/cpp/src/tokenizer/chat_template_fallback_map.hpp Adds Gemma3 chat template without trim filter
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp Updates interface to support token type IDs
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp Implements token type ID passing for speculative decoding
src/cpp/src/sequence_group.hpp Adds token type ID storage to sequence groups
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp Updates interface for token type ID support
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp Implements token type ID forwarding
src/cpp/src/lm_encoding.hpp Updates function signature for token type ID parameter
src/cpp/src/lm_encoding.cpp Implements token type ID tensor handling in language model
src/cpp/src/llm/pipeline_stateful.cpp Updates call to pass token type IDs
src/cpp/src/continuous_batching/pipeline_impl.hpp Adds token type ID support to pipeline interface
src/cpp/src/continuous_batching/pipeline_impl.cpp Implements token type ID handling in batching pipeline
src/cpp/src/continuous_batching/pipeline_base.hpp Updates base interface for token type ID support
src/cpp/src/continuous_batching/pipeline_base.cpp Implements token type ID processing in VLM generation
src/cpp/src/continuous_batching/model_runner.hpp Adds token type ID tensor and processing logic
src/cpp/include/openvino/genai/visual_language/pipeline.hpp Updates API documentation with Gemma3 image tag
site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx Documents Gemma3 image tag usage
site/docs/supported-models/_components/vlm-models-table/models.ts Adds Gemma3 to supported models table
samples/cpp/visual_language_chat/benchmark_vlm.cpp Updates comment about continuous batching limitations
Comments suppressed due to low confidence (1)

Copy link
Collaborator

@Wovchena Wovchena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@github-actions github-actions bot added the category: GGUF GGUF file reader label Aug 11, 2025
@yangsu2022 yangsu2022 added this pull request to the merge queue Aug 12, 2025
Merged via the queue into openvinotoolkit:master with commit 1cf1608 Aug 12, 2025
106 of 108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: LLM LLM pipeline (stateful, static) category: prompt lookup Prompt look-up decoding category: Python API Python API for GenAI category: speculative decoding Speculative decoding category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline category: VLM samples GenAI VLM samples Code Freeze no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gemma 3 support (1b text, 4b multi-modal)
7 participants