This is the repository for the LinkedIn Learning course OpenAI API: Multimodal development with GPT-4o. The full course is available from LinkedIn Learning.
In this hands-on course you’ll use the OpenAI API to leverage the multimodal capabilities of GPT-4o and function calling to extract text from images, conform the data to JSON, and call functions to save the extracted data to a spreadsheet.
See the readme file in the main branch for updated instructions and information.
This repository holds example data and two Jupyter Notebooks:
data/holds a collection of images of random receipts and one wild-card.expenses.csvis the target CSV. At init, the CSV only holds column headings.gp4o-setup.pydemonstrates how to how to access gpt-4o for multimodal prompting.modular-process.pyand the module files inutils/demonstrate a comprehensive process of ingesting and interpreting multiple receipts and sending the data to a CSV file.
The first time you run a block in a Jupyter Notebook, the environment will ask you to pick an environment. Follow the instructions and pick the first available Python environment.
NOTE: The first code block may take a while to load because the environment has to load first.
It is recommended you run these exercise files in GitHub Codespaces. This gives you a pre-configured Python environment for the Jupyter Notebooks to run. To use the exercise files, follow these steps:
- In the root folder, rename the file
env-templateto.env. - Go to https://platform.openai.com/api-keys.
- Generate a new key and copy the key to your clipboard.
- In
.envadd the key without quotes or parentheses.