Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-4 Vision supports functions now #19

Closed
simonw opened this issue Apr 9, 2024 · 4 comments
Closed

GPT-4 Vision supports functions now #19

simonw opened this issue Apr 9, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Collaborator

simonw commented Apr 9, 2024

https://twitter.com/OpenAIDevs/status/1777769463258988634

GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.
https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4

@simonw simonw added the enhancement New feature or request label Apr 9, 2024
@simonw
Copy link
Collaborator Author

simonw commented Apr 9, 2024

This means I can get rid of this horrible hack:

async def ocr_image(image_bytes):
base64_image = base64.b64encode(image_bytes).decode("utf-8")
messages = [
{
"role": "system",
"content": "Run OCR and return all of the text in this image, with newlines where appropriate",
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
}
],
},
]
response = await async_client.chat.completions.create(
model="gpt-4-vision-preview", messages=messages, max_tokens=400
)
return response.choices[0].message.content
try:
messages = []
if instructions:
messages.append({"role": "system", "content": instructions})
if content:
messages.append({"role": "user", "content": content})
if image_is_provided(image):
# Run a separate thing to OCR the image first, because gpt-4-vision can't handle tools yet
image_content = await ocr_image(await image.read())
if image_content:
messages.append({"role": "user", "content": image_content})
else:
raise ValueError("Could not extract text from image")

@simonw
Copy link
Collaborator Author

simonw commented Apr 9, 2024

It worked against my test image:

comedy-luau

[
  {
    "event_title": "Coastside Comedy Luau",
    "event_description": "Comedy event featuring Laurie Kilmartin, Ryan Goodcase, and Phil Griffiths, hosted by Marcus D. Includes Hawaiian buffet and welcome cocktail. Proceeds benefit Wilkinson School and Coastside Hope.",
    "event_date": "2022-05-06",
    "start_time": "18:00",
    "end_time": "22:00"
  }
]

CleanShot 2024-04-09 at 12 08 25@2x

@simonw simonw closed this as completed in bf3a67e Apr 9, 2024
simonw added a commit that referenced this issue Apr 9, 2024
@simonw
Copy link
Collaborator Author

simonw commented Apr 9, 2024

@brianjking
Copy link

Absolutely killing it, thank you, @simonw -- I've noticed some weird date issues with GPT-4 vision, wonder if that's what you saw in your demo video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants