Include GPT-4 V model to be able to search for images and embedding images. #323

ross-p-smith · 2024-02-22T22:54:27Z

Motivation

Company data often comprises various types of images, including screenshots, maps, and diagrams. By enabling the chat admin app to ingest and process these images, it can provide more accurate and relevant responses to user queries that involve visual data. This ensures that the chat app can fully utilise all available company data to deliver an improved user experience.

Note: Image processing is only available using GPT-4 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview

How would you feel if this feature request was implemented?

Requirements

Ensure existing application works correctly with GPT-4
Allow images to be uploaded via the Admin application
When "Reprocess all" is click via the Admin app, reprocess the images
When a question is asked, image data should be searched and passed to gpt-4-vision to generate a response
Citations should link to the image stored in blob storage
Stretch: Fallback to OCR/document intelligence if image of a document detected
Stretch: Allow images to be uploaded when chatting

Tasks

Bugs

Unable to view images in Explore Data tab in Admin app #929

The text was updated successfully, but these errors were encountered:

ross-p-smith · 2024-03-25T20:53:42Z

Reference here: - Azure-Samples/azure-search-openai-demo#1056

adamdougal · 2024-04-22T07:56:22Z

Update 22nd April:

After spiking possible technology choices, I believe the best way forward is to:

Use Azure Computer Vision to generate embeddings of the image
Use GPT-4-vision to generate a description of the image and text-embeddings-ada-002 embed the description
Store both embedding vectors in the Azure AI Search index

Then when querying, generate embeddings of the question using both Azure Computer Vision and text-embeddings-ada-002.

Note: this does require us to change the index to allow for an additional imageEmbeddings field.

I was initially going to create an ADR deciding on which tools would be best to use, but given my research, spike and investigation on how this is implemented in Azure-Samples/azure-search-openai-demo#1056, I now believe using both appoaches combined will give the best results.

Next steps are to now start building this into CWYDSA

adamdougal · 2024-04-23T08:26:01Z

Update 23rd April:

The computer vision and gpt-4-vision model deployment resources are now being provisioned
This is applied if USE_GPT4_VISION=true
Unfortunately, gpt-4-vision does not support function calling, so this is an additional deployment alongside another model
Next steps are to allow images to be uploaded via the admin app
It looks like some images are already able to be uploaded and parsed, but computer vision supports additional file types that need to be handled

cecheta · 2024-05-28T10:08:18Z

Update: 28th May

The core tasks relating to this story have been completed, namely uploading images with advanced image processing, and querying data based on these images, passing these to the LLM.

There exist some outstanding tasks regarding updating the prompts to match include the images that are passed to the LLM, and also getting it to work with integrated vectorisation. However, it may be better to move these into their own issues, so this main epic can be closed.

@ross-p-smith @adamdougal @superhindupur

ross-p-smith assigned adamdougal Feb 23, 2024

ross-p-smith added the enhancement New feature or request label Feb 27, 2024

adamdougal removed their assignment Mar 12, 2024

adamdougal self-assigned this Apr 18, 2024

adamdougal mentioned this issue Apr 19, 2024

Spike possible image data usage implementations #713

Closed

1 task

This was referenced Apr 22, 2024

Provision AI services for image data #715

Closed

Add spike docs for using computer vision #714

Merged

Allow images to be uploaded via push model #728

Closed

adamdougal added the epic Large scope with many subtasks label May 7, 2024

cecheta mentioned this issue May 7, 2024

Semantic Kernel plugin #320

Closed

5 tasks

adamdougal assigned cecheta and superhindupur and unassigned adamdougal May 9, 2024

adamdougal assigned frtibble and unassigned superhindupur May 20, 2024

cecheta mentioned this issue May 22, 2024

Include image citations in prompt/response #964

Open

liammoat mentioned this issue May 28, 2024

Update post answering prompt to include images #993

Open

1 task

cecheta unassigned frtibble and cecheta May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include GPT-4 V model to be able to search for images and embedding images. #323

Include GPT-4 V model to be able to search for images and embedding images. #323

ross-p-smith commented Feb 22, 2024 •

edited by cecheta

Loading

ross-p-smith commented Mar 25, 2024

adamdougal commented Apr 22, 2024 •

edited

Loading

adamdougal commented Apr 23, 2024

cecheta commented May 28, 2024

Include GPT-4 V model to be able to search for images and embedding images. #323

Include GPT-4 V model to be able to search for images and embedding images. #323

Comments

ross-p-smith commented Feb 22, 2024 • edited by cecheta Loading

Motivation

How would you feel if this feature request was implemented?

Requirements

Tasks

Bugs

ross-p-smith commented Mar 25, 2024

adamdougal commented Apr 22, 2024 • edited Loading

adamdougal commented Apr 23, 2024

cecheta commented May 28, 2024

ross-p-smith commented Feb 22, 2024 •

edited by cecheta

Loading

adamdougal commented Apr 22, 2024 •

edited

Loading