A simple notebook that leverages OCR capabilities of gpt-4o for natural language retreival of flowcharts and code blocks from a PDF.
utils.py: contains all the modules for using the app.app.ipynb: the main notebook for running the modules.extracted_data: folder containing the code and images inside the PDF.image_desc_vectordb: chromadb vectorstore for descriptions of images generated bymarkitdown.