Skip to content

Conversation

@szeyu
Copy link
Contributor

@szeyu szeyu commented Sep 10, 2024

Support OpenVINO vision model #32

This feature request aims to integrate support for vision language models into our existing framework. Currently, our framework supports non-vision models, but there is a need to extend this support to vision models, which are loaded and processed differently.

To achieve this, I have implemented the following:

  1. Separated Initialization for Vision and Non-Vision Models:

    • Vision models are initialized using the ov_phi3_vision.py script provided by OpenVINO.
    • Non-vision models continue to be initialized using the existing methods.
  2. Quantized Vision Model Support:

    • We have added support for the quantized vision model Phi-3.5-vision-instruct-int4-ov.
    • The model can be found at the following link: Phi-3.5-vision-instruct-int4-ov

Implementation Details

  • Vision Model Initialization:

    • The OpenVinoEngine class now checks if the model is a vision model during initialization.
    • If the model is a vision model, it uses the ov_phi3_vision.py script to load and initialize the model.
    • The vision model is then processed using the AutoProcessor class from the transformers library.
  • Non-Vision Model Initialization:

    • Non-vision models are initialized using the existing methods, ensuring backward compatibility.
  • Streamlined Generation Process:

    • Both vision and non-vision models support streaming output, with timing information logged for performance analysis.
    • The generate_vision method has been updated to log prompt length, new tokens generated, time to first token, prompt tokens per second, and new tokens per second.

References

@szeyu szeyu added the type: enhancement / feature New feature or request label Sep 10, 2024
@tjtanaa tjtanaa self-requested a review September 25, 2024 21:04
self.model_path = snapshot_path

# it is case sensitive, only receive all char captilized only
self.model = OvPhi3Vision(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you help me to understand the behaviour, what if you pass in a model that is not phi3vision model, what happens? (not limited to what error it throws)
  • Add a try-catch block, if it fails, then print our a message for the user telling them that embeddedllm engine only support Phi3Vision model, then exit the program gracefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I pass in a model that is not phi3vision model, it will show error of language_model.xml not found. Hence yes, a try catch block is needed there.

@tjtanaa tjtanaa linked an issue Sep 25, 2024 that may be closed by this pull request
@szeyu szeyu merged commit aeca16a into main Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement / feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Support OpenVINO vision model

3 participants