GPT-4 Vision supports functions now #19

simonw · 2024-04-09T19:07:54Z

https://twitter.com/OpenAIDevs/status/1777769463258988634

GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.
https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4

simonw · 2024-04-09T19:08:16Z

This means I can get rid of this horrible hack:

datasette-extract/datasette_extract/__init__.py

Lines 284 to 318 in 7429965

    
           async def ocr_image(image_bytes): 
        
               base64_image = base64.b64encode(image_bytes).decode("utf-8") 
        
               messages = [ 
        
                   { 
        
                       "role": "system", 
        
                       "content": "Run OCR and return all of the text in this image, with newlines where appropriate", 
        
                   }, 
        
                   { 
        
                       "role": "user", 
        
                       "content": [ 
        
                           { 
        
                               "type": "image_url", 
        
                               "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 
        
                           } 
        
                       ], 
        
                   }, 
        
               ] 
        
               response = await async_client.chat.completions.create( 
        
                   model="gpt-4-vision-preview", messages=messages, max_tokens=400 
        
               ) 
        
               return response.choices[0].message.content 
        
           try: 
        
               messages = [] 
        
               if instructions: 
        
                   messages.append({"role": "system", "content": instructions}) 
        
               if content: 
        
                   messages.append({"role": "user", "content": content}) 
        
               if image_is_provided(image): 
        
                   # Run a separate thing to OCR the image first, because gpt-4-vision can't handle tools yet 
        
                   image_content = await ocr_image(await image.read()) 
        
                   if image_content: 
        
                       messages.append({"role": "user", "content": image_content}) 
        
                   else: 
        
                       raise ValueError("Could not extract text from image")

simonw · 2024-04-09T19:10:37Z

It worked against my test image:

[
  {
    "event_title": "Coastside Comedy Luau",
    "event_description": "Comedy event featuring Laurie Kilmartin, Ryan Goodcase, and Phil Griffiths, hosted by Marcus D. Includes Hawaiian buffet and welcome cocktail. Proceeds benefit Wilkinson School and Coastside Hope.",
    "event_date": "2022-05-06",
    "start_time": "18:00",
    "end_time": "22:00"
  }
]

Refs #10, #19, #20

Refs #19

simonw · 2024-04-09T22:12:27Z

Video demo: https://www.youtube.com/watch?v=g3NtJatmQR0

brianjking · 2024-04-10T01:55:56Z

Absolutely killing it, thank you, @simonw -- I've noticed some weird date issues with GPT-4 vision, wonder if that's what you saw in your demo video.

simonw added the enhancement New feature or request label Apr 9, 2024

simonw closed this as completed in bf3a67e Apr 9, 2024

simonw added a commit that referenced this issue Apr 9, 2024

Release 0.1a4

81e3819

Refs #10, #19, #20

simonw added a commit that referenced this issue Apr 9, 2024

Updated README to reflect function support for Vision

d26e022

Refs #19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-4 Vision supports functions now #19

GPT-4 Vision supports functions now #19

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

brianjking commented Apr 10, 2024

GPT-4 Vision supports functions now #19

GPT-4 Vision supports functions now #19

Comments

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

simonw commented Apr 9, 2024

brianjking commented Apr 10, 2024