-
Notifications
You must be signed in to change notification settings - Fork 270
Enable gemma3-4b-it in VLM Pipeline #2340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable gemma3-4b-it in VLM Pipeline #2340
Conversation
e524647
to
f2676ed
Compare
This comment was marked as outdated.
This comment was marked as outdated.
- Add virtual destructor - Add random test model - Refactor Gemma3 tokenization for image markers (extra_special_tokens): - boi_token: <start_of_image> - eoi_token: <end_of_image>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables support for the Gemma3-4b-it model in the Visual Language Model (VLM) pipeline. The implementation adds comprehensive support for Gemma3 including configuration classes, input embedders, vision encoders, and the necessary infrastructure for token type IDs which are required for distinguishing between text and image tokens in the multimodal Gemma3 model.
- Addition of Gemma3 model type and configuration parameters for image processing and token handling
- Implementation of specialized embedding classes for Gemma3 with token type ID support
- Integration of Gemma3 image tag
<start_of_image>
in documentation and API references
Reviewed Changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/python_tests/test_vlm_pipeline.py | Adds test cases for tiny-random-gemma3 model with SDPA backend requirement |
src/python/py_vlm_pipeline.cpp | Updates documentation to include gemma-3-4b-it image tag |
src/python/openvino_genai/py_openvino_genai.pyi | Adds type stub documentation for Gemma3 image tag |
src/cpp/src/visual_language/vlm_config.hpp | Defines GEMMA3 model type and configuration parameters |
src/cpp/src/visual_language/vlm_config.cpp | Registers gemma3 model type mapping |
src/cpp/src/visual_language/vision_encoder.cpp | Adds VisionEncoderGemma3 creation logic |
src/cpp/src/visual_language/processor_config.hpp | Adds Gemma3-specific image size configuration |
src/cpp/src/visual_language/processor_config.cpp | Implements Gemma3 size parameter parsing |
src/cpp/src/visual_language/pipeline.cpp | Updates attention backend requirements for Gemma3 |
src/cpp/src/visual_language/llava_next/classes.cpp | Moves merge function to utils namespace |
src/cpp/src/visual_language/llava/classes.cpp | Refactors to use shared merge utility function |
src/cpp/src/visual_language/inputs_embedder.hpp | Adds token type ID support interface |
src/cpp/src/visual_language/inputs_embedder.cpp | Implements token type ID functionality and Gemma3 embedder creation |
src/cpp/src/visual_language/gemma3/classes.hpp | Defines Gemma3 vision encoder and input embedder classes |
src/cpp/src/visual_language/gemma3/classes.cpp | Implements Gemma3-specific image processing and embedding logic |
src/cpp/src/utils.hpp | Declares shared merge function for text/image embeddings |
src/cpp/src/utils.cpp | Implements shared merge utility function |
src/cpp/src/tokenizer/chat_template_fallback_map.hpp | Adds Gemma3 chat template without trim filter |
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp | Updates interface to support token type IDs |
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp | Implements token type ID passing for speculative decoding |
src/cpp/src/sequence_group.hpp | Adds token type ID storage to sequence groups |
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp | Updates interface for token type ID support |
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp | Implements token type ID forwarding |
src/cpp/src/lm_encoding.hpp | Updates function signature for token type ID parameter |
src/cpp/src/lm_encoding.cpp | Implements token type ID tensor handling in language model |
src/cpp/src/llm/pipeline_stateful.cpp | Updates call to pass token type IDs |
src/cpp/src/continuous_batching/pipeline_impl.hpp | Adds token type ID support to pipeline interface |
src/cpp/src/continuous_batching/pipeline_impl.cpp | Implements token type ID handling in batching pipeline |
src/cpp/src/continuous_batching/pipeline_base.hpp | Updates base interface for token type ID support |
src/cpp/src/continuous_batching/pipeline_base.cpp | Implements token type ID processing in VLM generation |
src/cpp/src/continuous_batching/model_runner.hpp | Adds token type ID tensor and processing logic |
src/cpp/include/openvino/genai/visual_language/pipeline.hpp | Updates API documentation with Gemma3 image tag |
site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx | Documents Gemma3 image tag usage |
site/docs/supported-models/_components/vlm-models-table/models.ts | Adds Gemma3 to supported models table |
samples/cpp/visual_language_chat/benchmark_vlm.cpp | Updates comment about continuous batching limitations |
Comments suppressed due to low confidence (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…022/openvino.genai into ys/vlm-gemma3-4b-it-25.3
1cf1608
Ticket: CVS-166388