Enable gemma3-4b-it in VLM Pipeline #2340

yangsu2022 · 2025-06-12T10:10:36Z

Ticket: CVS-166388

src/cpp/src/visual_language/gemma3/classes.cpp

src/cpp/src/visual_language/inputs_embedder.hpp

src/cpp/src/visual_language/pipeline.cpp

src/cpp/src/continuous_batching/pipeline_base.cpp

src/cpp/src/llm/pipeline_stateful.cpp

- Add virtual destructor - Add random test model - Refactor Gemma3 tokenization for image markers (extra_special_tokens): - boi_token: <start_of_image> - eoi_token: <end_of_image>

src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp

Copilot

Pull Request Overview

This PR enables support for the Gemma3-4b-it model in the Visual Language Model (VLM) pipeline. The implementation adds comprehensive support for Gemma3 including configuration classes, input embedders, vision encoders, and the necessary infrastructure for token type IDs which are required for distinguishing between text and image tokens in the multimodal Gemma3 model.

Addition of Gemma3 model type and configuration parameters for image processing and token handling
Implementation of specialized embedding classes for Gemma3 with token type ID support
Integration of Gemma3 image tag <start_of_image> in documentation and API references

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/python_tests/test_vlm_pipeline.py	Adds test cases for tiny-random-gemma3 model with SDPA backend requirement
src/python/py_vlm_pipeline.cpp	Updates documentation to include gemma-3-4b-it image tag
src/python/openvino_genai/py_openvino_genai.pyi	Adds type stub documentation for Gemma3 image tag
src/cpp/src/visual_language/vlm_config.hpp	Defines GEMMA3 model type and configuration parameters
src/cpp/src/visual_language/vlm_config.cpp	Registers gemma3 model type mapping
src/cpp/src/visual_language/vision_encoder.cpp	Adds VisionEncoderGemma3 creation logic
src/cpp/src/visual_language/processor_config.hpp	Adds Gemma3-specific image size configuration
src/cpp/src/visual_language/processor_config.cpp	Implements Gemma3 size parameter parsing
src/cpp/src/visual_language/pipeline.cpp	Updates attention backend requirements for Gemma3
src/cpp/src/visual_language/llava_next/classes.cpp	Moves merge function to utils namespace
src/cpp/src/visual_language/llava/classes.cpp	Refactors to use shared merge utility function
src/cpp/src/visual_language/inputs_embedder.hpp	Adds token type ID support interface
src/cpp/src/visual_language/inputs_embedder.cpp	Implements token type ID functionality and Gemma3 embedder creation
src/cpp/src/visual_language/gemma3/classes.hpp	Defines Gemma3 vision encoder and input embedder classes
src/cpp/src/visual_language/gemma3/classes.cpp	Implements Gemma3-specific image processing and embedding logic
src/cpp/src/utils.hpp	Declares shared merge function for text/image embeddings
src/cpp/src/utils.cpp	Implements shared merge utility function
src/cpp/src/tokenizer/chat_template_fallback_map.hpp	Adds Gemma3 chat template without trim filter
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp	Updates interface to support token type IDs
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp	Implements token type ID passing for speculative decoding
src/cpp/src/sequence_group.hpp	Adds token type ID storage to sequence groups
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp	Updates interface for token type ID support
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp	Implements token type ID forwarding
src/cpp/src/lm_encoding.hpp	Updates function signature for token type ID parameter
src/cpp/src/lm_encoding.cpp	Implements token type ID tensor handling in language model
src/cpp/src/llm/pipeline_stateful.cpp	Updates call to pass token type IDs
src/cpp/src/continuous_batching/pipeline_impl.hpp	Adds token type ID support to pipeline interface
src/cpp/src/continuous_batching/pipeline_impl.cpp	Implements token type ID handling in batching pipeline
src/cpp/src/continuous_batching/pipeline_base.hpp	Updates base interface for token type ID support
src/cpp/src/continuous_batching/pipeline_base.cpp	Implements token type ID processing in VLM generation
src/cpp/src/continuous_batching/model_runner.hpp	Adds token type ID tensor and processing logic
src/cpp/include/openvino/genai/visual_language/pipeline.hpp	Updates API documentation with Gemma3 image tag
site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx	Documents Gemma3 image tag usage
site/docs/supported-models/_components/vlm-models-table/models.ts	Adds Gemma3 to supported models table
samples/cpp/visual_language_chat/benchmark_vlm.cpp	Updates comment about continuous batching limitations

Comments suppressed due to low confidence (1)

tests/python_tests/test_vlm_pipeline.py

src/cpp/src/visual_language/gemma3/classes.cpp

tests/python_tests/test_vlm_pipeline.py

src/cpp/src/continuous_batching/pipeline_base.cpp

tests/python_tests/test_vlm_pipeline.py

…sues

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

Wovchena

Thank you!

src/cpp/src/continuous_batching/model_runner.hpp

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

src/cpp/src/continuous_batching/model_runner.hpp

tests/python_tests/test_vlm_pipeline.py

…fail reason

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

github-actions bot added category: visual language category: continuous batching category: LLM no-match-files labels Jun 12, 2025

sammysun0711 requested a review from Copilot June 12, 2025 10:13

This comment was marked as outdated.

Sign in to view

Wovchena requested a review from yatarkan June 12, 2025 10:47

sammysun0711 reviewed Jun 12, 2025

View reviewed changes

rkazants requested review from Wovchena and rkazants June 12, 2025 17:44

yangsu2022 added 5 commits June 13, 2025 13:32

feat(vlm): initial gemma3-4b-it support on master (25.3)

faa97b2

wip: migrate gemma3 from 25.2 to 25.3

8bf31e5

update for PA

92f7830

fix bug with llava and validate both pa and sdpa on MTL/LNL

3183a12

Refactor ProcessorConfig for Gemma3; remove unnecessary crop.

f2676ed

yangsu2022 force-pushed the ys/vlm-gemma3-4b-it-25.3 branch from e524647 to f2676ed Compare June 13, 2025 05:43

yangsu2022 requested review from Copilot and sammysun0711 June 13, 2025 05:46

This comment was marked as outdated.

Sign in to view

cleanup

fdc66ed

github-actions bot removed the no-match-files label Jun 13, 2025

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

rkazants added the pr_needs_tests label Jun 13, 2025

rkazants added this to the 2025.3 milestone Jun 13, 2025

yangsu2022 and others added 2 commits June 16, 2025 19:48

Modify according to review:

a35f4d5

- Add virtual destructor - Add random test model - Refactor Gemma3 tokenization for image markers (extra_special_tokens): - boi_token: <start_of_image> - eoi_token: <end_of_image>

Merge branch 'openvinotoolkit:master' into ys/vlm-gemma3-4b-it-25.3

c562837

This comment was marked as outdated.

Sign in to view

Wovchena requested a review from Copilot August 11, 2025 03:22

This comment was marked as outdated.

Sign in to view

Wovchena requested changes Aug 11, 2025

View reviewed changes

src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp Outdated Show resolved Hide resolved

yangsu2022 added 2 commits August 11, 2025 13:58

Merge branch 'master' into ys/vlm-gemma3-4b-it-25.3

ca12c34

Revert: add_request() to support token_type_ids only with input_ids

6241c6f

Wovchena requested a review from Copilot August 11, 2025 06:08

Copilot AI reviewed Aug 11, 2025

View reviewed changes

tests/python_tests/test_vlm_pipeline.py Show resolved Hide resolved

src/cpp/src/visual_language/gemma3/classes.cpp Show resolved Hide resolved

tests/python_tests/test_vlm_pipeline.py Show resolved Hide resolved

Wovchena reviewed Aug 11, 2025

View reviewed changes

src/cpp/src/continuous_batching/pipeline_base.cpp Outdated Show resolved Hide resolved

Wovchena added the Code Freeze label Aug 11, 2025

Wovchena reviewed Aug 11, 2025

View reviewed changes

tests/python_tests/test_vlm_pipeline.py Show resolved Hide resolved

yangsu2022 added 4 commits August 11, 2025 16:49

Cleanup

d1e9987

xfail gemma3 tests on macOS due to transformers<4.52 compatibility is…

c24b66f

…sues

Merge branch 'master' into ys/vlm-gemma3-4b-it-25.3

2d52263

Merge branch 'ys/vlm-gemma3-4b-it-25.3' of https://github.com/yangsu2…

2379273

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

Wovchena approved these changes Aug 11, 2025

View reviewed changes

popovaan requested changes Aug 11, 2025

View reviewed changes

src/cpp/src/continuous_batching/model_runner.hpp Show resolved Hide resolved

yangsu2022 added 3 commits August 11, 2025 22:04

Fix token_type_ids pointer increment to prevent data overwrite

31a41f8

Merge branch 'master' into ys/vlm-gemma3-4b-it-25.3

02bf89c

Merge branch 'ys/vlm-gemma3-4b-it-25.3' of https://github.com/yangsu2…

1d83cbb

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

github-actions bot added the category: GGUF label Aug 11, 2025

popovaan approved these changes Aug 11, 2025

View reviewed changes

popovaan reviewed Aug 11, 2025

View reviewed changes

src/cpp/src/continuous_batching/model_runner.hpp Outdated Show resolved Hide resolved

yatarkan approved these changes Aug 11, 2025

View reviewed changes

tests/python_tests/test_vlm_pipeline.py Outdated Show resolved Hide resolved

tests/python_tests/test_vlm_pipeline.py Outdated Show resolved Hide resolved

yangsu2022 added 4 commits August 12, 2025 08:33

Add conditional check for token_type_ids pointer movement

fe8d7fd

Refactor: use constant instead of hardcoded string for gemma3 macOS x…

2a6a374

…fail reason

Merge branch 'master' into ys/vlm-gemma3-4b-it-25.3

7d25109

Merge branch 'ys/vlm-gemma3-4b-it-25.3' of https://github.com/yangsu2…

6d6aa26

…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3

yangsu2022 added this pull request to the merge queue Aug 12, 2025

Merged via the queue into openvinotoolkit:master with commit 1cf1608 Aug 12, 2025
106 of 108 checks passed

Enable gemma3-4b-it in VLM Pipeline #2340

Enable gemma3-4b-it in VLM Pipeline #2340

Uh oh!

Conversation

yangsu2022 commented Jun 12, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wovchena left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!