A lightweight Python script that uses a local multimodal AI model through Ollama to extract table data from screenshots or table images and save the result as a .csv file.
This project is useful when you have:
- screenshots of spreadsheets or reports
- table data trapped in images
- a need to convert visual table data into CSV quickly
- a preference for local processing instead of cloud OCR tools
The script:
- Accepts an image file path as input
- Sends the image to a local Ollama vision model
- Prompts the model to return only raw CSV output
- Cleans up markdown code blocks if the model adds them anyway
- Saves the extracted result as a
.csvfile using the same base filename as the image
If you run the script against:
sales-report.png
it will create:
sales-report.csv
The script is currently configured to use:
qwen2.5vl:7bYou can change that in the script by editing the MODEL variable inside extract_csv_from_image().
- Python 3.9 or newer recommended
- Ollama installed and running locally
- A local vision-capable model pulled into Ollama
This script requires:
ollama
git clone https://github.com/DeanAmiridis/VisionCSV-Local_AI.git
cd VisionCSV-Local_AIpython3 -m venv .venv
source .venv/bin/activatepython -m venv .venv
.\.venv\Scripts\Activate.ps1pip install ollamaDownload and install Ollama from its official site.
Then verify it is available:
ollama --versionollama pull qwen2.5vl:7bRun the script by passing an image file as the argument:
python main.py /path/to/your/screenshot.pngpython main.py ./examples/table.pngIf successful, the script will create a CSV file in the current working directory.
The script is intended for image files containing tables, such as:
.png.jpg.jpeg- other image formats supported by your Ollama model setup
Best results usually come from:
- clean screenshots
- clear column headers
- minimal blur or compression
- tables with visible row and column separation
- Output filename is based on the image filename
- The CSV file is written to the current working directory
- The model is instructed to return raw CSV only
- The script strips markdown CSV code fences if the model ignores instructions
The extraction flow is simple:
ollama.chat()sends the image and prompt to the selected local model- the prompt asks for a strict 1:1 CSV representation of the table
- the response is cleaned with a regex to remove ```csv code fences
- the cleaned content is saved directly to a
.csvfile
This is an AI-based extraction workflow, not a strict parser. That means:
- accuracy depends on image quality
- merged cells or unusual table layouts may not convert perfectly
- very dense or low-resolution tables may require manual cleanup afterward
- model choice matters a lot
For more reliable output, use:
- high-resolution screenshots
- cropped images containing only the table
- simple, well-structured tables
The script currently prints a basic error message if something fails, such as:
- Ollama not running
- the model not being installed
- invalid image path
- unsupported or unreadable file
The script's built-in usage message currently says:
python main.py <screenshot.png>
But the actual uploaded filename is:
main.py
.
├── main.py
└── README.md