Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include GPT-4 V model to be able to search for images and embedding images. #323

Open
9 of 13 tasks
ross-p-smith opened this issue Feb 22, 2024 · 4 comments
Open
9 of 13 tasks
Labels
enhancement New feature or request epic Large scope with many subtasks

Comments

@ross-p-smith
Copy link
Collaborator

ross-p-smith commented Feb 22, 2024

Motivation

Company data often comprises various types of images, including screenshots, maps, and diagrams. By enabling the chat admin app to ingest and process these images, it can provide more accurate and relevant responses to user queries that involve visual data. This ensures that the chat app can fully utilise all available company data to deliver an improved user experience.

Note: Image processing is only available using GPT-4 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview

How would you feel if this feature request was implemented?

gif

Requirements

  • Ensure existing application works correctly with GPT-4
  • Allow images to be uploaded via the Admin application
  • When "Reprocess all" is click via the Admin app, reprocess the images
  • When a question is asked, image data should be searched and passed to gpt-4-vision to generate a response
  • Citations should link to the image stored in blob storage
  • Stretch: Fallback to OCR/document intelligence if image of a document detected
  • Stretch: Allow images to be uploaded when chatting

Tasks

Bugs

@ross-p-smith ross-p-smith added the enhancement New feature or request label Feb 27, 2024
@adamdougal adamdougal removed their assignment Mar 12, 2024
@ross-p-smith
Copy link
Collaborator Author

Reference here: - Azure-Samples/azure-search-openai-demo#1056

@adamdougal
Copy link
Collaborator

adamdougal commented Apr 22, 2024

Update 22nd April:

After spiking possible technology choices, I believe the best way forward is to:

  • Use Azure Computer Vision to generate embeddings of the image
  • Use GPT-4-vision to generate a description of the image and text-embeddings-ada-002 embed the description
  • Store both embedding vectors in the Azure AI Search index

Then when querying, generate embeddings of the question using both Azure Computer Vision and text-embeddings-ada-002.

Note: this does require us to change the index to allow for an additional imageEmbeddings field.

I was initially going to create an ADR deciding on which tools would be best to use, but given my research, spike and investigation on how this is implemented in Azure-Samples/azure-search-openai-demo#1056, I now believe using both appoaches combined will give the best results.

Next steps are to now start building this into CWYDSA

@adamdougal
Copy link
Collaborator

Update 23rd April:

  • The computer vision and gpt-4-vision model deployment resources are now being provisioned
  • This is applied if USE_GPT4_VISION=true
  • Unfortunately, gpt-4-vision does not support function calling, so this is an additional deployment alongside another model
  • Next steps are to allow images to be uploaded via the admin app
  • It looks like some images are already able to be uploaded and parsed, but computer vision supports additional file types that need to be handled

@cecheta
Copy link
Collaborator

cecheta commented May 28, 2024

Update: 28th May

The core tasks relating to this story have been completed, namely uploading images with advanced image processing, and querying data based on these images, passing these to the LLM.

There exist some outstanding tasks regarding updating the prompts to match include the images that are passed to the LLM, and also getting it to work with integrated vectorisation. However, it may be better to move these into their own issues, so this main epic can be closed.

@ross-p-smith @adamdougal @superhindupur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic Large scope with many subtasks
Projects
None yet
Development

No branches or pull requests

5 participants