Skip to content

DeanAmiridis/VisionCSV-Local_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionCSV (Local Vision Model via Ollama)

A lightweight Python script that uses a local multimodal AI model through Ollama to extract table data from screenshots or table images and save the result as a .csv file.

This project is useful when you have:

  • screenshots of spreadsheets or reports
  • table data trapped in images
  • a need to convert visual table data into CSV quickly
  • a preference for local processing instead of cloud OCR tools

What the script does

The script:

  1. Accepts an image file path as input
  2. Sends the image to a local Ollama vision model
  3. Prompts the model to return only raw CSV output
  4. Cleans up markdown code blocks if the model adds them anyway
  5. Saves the extracted result as a .csv file using the same base filename as the image

Example

If you run the script against:

sales-report.png

it will create:

sales-report.csv

Current model

The script is currently configured to use:

qwen2.5vl:7b

You can change that in the script by editing the MODEL variable inside extract_csv_from_image().

Requirements

System requirements

  • Python 3.9 or newer recommended
  • Ollama installed and running locally
  • A local vision-capable model pulled into Ollama

Python package

This script requires:

  • ollama

Installation

1. Clone the repository

git clone https://github.com/DeanAmiridis/VisionCSV-Local_AI.git
cd VisionCSV-Local_AI

2. Create and activate a virtual environment (recommended)

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

Windows PowerShell

python -m venv .venv
.\.venv\Scripts\Activate.ps1

3. Install the Python dependency

pip install ollama

4. Install Ollama

Download and install Ollama from its official site.

Then verify it is available:

ollama --version

5. Pull the required model

ollama pull qwen2.5vl:7b

Usage

Run the script by passing an image file as the argument:

python main.py /path/to/your/screenshot.png

Example

python main.py ./examples/table.png

If successful, the script will create a CSV file in the current working directory.

Supported input

The script is intended for image files containing tables, such as:

  • .png
  • .jpg
  • .jpeg
  • other image formats supported by your Ollama model setup

Best results usually come from:

  • clean screenshots
  • clear column headers
  • minimal blur or compression
  • tables with visible row and column separation

Output behavior

  • Output filename is based on the image filename
  • The CSV file is written to the current working directory
  • The model is instructed to return raw CSV only
  • The script strips markdown CSV code fences if the model ignores instructions

How it works

The extraction flow is simple:

  • ollama.chat() sends the image and prompt to the selected local model
  • the prompt asks for a strict 1:1 CSV representation of the table
  • the response is cleaned with a regex to remove ```csv code fences
  • the cleaned content is saved directly to a .csv file

Notes and limitations

This is an AI-based extraction workflow, not a strict parser. That means:

  • accuracy depends on image quality
  • merged cells or unusual table layouts may not convert perfectly
  • very dense or low-resolution tables may require manual cleanup afterward
  • model choice matters a lot

For more reliable output, use:

  • high-resolution screenshots
  • cropped images containing only the table
  • simple, well-structured tables

Error handling

The script currently prints a basic error message if something fails, such as:

  • Ollama not running
  • the model not being installed
  • invalid image path
  • unsupported or unreadable file

Known issue

The script's built-in usage message currently says:

python main.py <screenshot.png>

But the actual uploaded filename is:

main.py

Project structure

.
├── main.py
└── README.md

About

Local AI engine to convert screenshots into 1:1 CSV files. Uses Qwen2.5-VL via Ollama for high-precision table reconstruction—no cloud, no data leaks, no manual entry.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages