rows_vision
is an open-source API service that extracts structured data from visual content like charts, receipts, and screenshots using vision-based classifiers and LLMs. It's built for fast local deployment, and works entirely in memory β no cloud storage required.
- 1: Line chart (single line)
- 2: Line chart (multiple lines)
- 3: Bar/column chart
- 4: Scatter plot
- 5: Pie or doughnut chart
- 6: Table
- 7: Receipt/Invoice
- 8: Other (e.g., infographic with extractable data)
- Multi-Model AI Support: Choose from Anthropic Claude, OpenAI GPT-4, Google Gemini, or Groq models
- Chart Analysis: Extract data from line charts, bar charts, scatter plots, pie charts
- Table & Receipt Processing: Parse structured data from tables and receipts
- Flexible Input: Process images from URLs or local files
- In-Memory Processing: No cloud storage required - everything runs locally
- Docker Ready: Easy deployment with Docker containers
- Production Ready: Built-in health checks, logging, and error handling
- Performance Metrics: Optional timing information for monitoring
Upload the URL of chart screenshot and receive a structured JSON like:
{
"result": [
{
"X": 130,
"Y": "2000"
},
{
"X": 90,
"Y": "2005"
},
{
"X": 30,
"Y": "2010"
},
{
"X": 25,
"Y": "2015"
},
{
"X": 60,
"Y": "2020"
},
{
"X": 110,
"Y": "2025"
}
]
}
# 1. Clone and setup
git clone https://github.com/rows/rows_vision.git
cd rows_vision
# 2. Run setup script
chmod +x setup.sh
./setup.sh
# 3. Add your API keys to .env
nano .env # Add at least one API key
# 4. Build and run with Docker
docker build -t rows-vision .
docker run -d --name rows-vision-api -p 8080:8080 --env-file .env rows-vision
# 5. Test the API (wait 30 seconds for startup)
sleep 30
curl http://localhost:8080/health
Linux/macOS:
# 1. Clone and setup
git clone https://github.com/rows/rows_vision.git
cd rows_vision
chmod +x setup.sh && ./setup.sh
# 2. Create virtual environment
python3.11 -m venv venv
source venv/bin/activate
# 3. Install dependencies and run
pip install -r requirements.txt
nano .env # Add API keys
python main.py
Windows (PowerShell):
# 1. Clone repository
git clone https://github.com/rows/rows_vision.git
cd rows_vision
# 2. Copy environment template
Copy-Item ".env.example" ".env"
# 3. Edit .env file with your API keys
notepad .env
# 4. Create and activate virtual environment
python -m venv venv
venv\Scripts\Activate.ps1
# 5. Install dependencies and run
pip install -r requirements.txt
python main.py
Windows (Git Bash - Alternative):
# If you have Git Bash installed, you can use the Linux/macOS commands:
chmod +x setup.sh
./setup.sh
# Then follow the Linux/macOS steps above
Why Docker? | Docker | Local Python |
---|---|---|
Setup Time | 5 minutes | 10-15 minutes |
Dependencies | Automatic | Manual |
Consistency | Same everywhere | "Works on my machine" |
Production Ready | Yes | Needs additional setup |
Create a .env
file with your API credentials:
# Required: At least one AI API key
API_KEY_ANTHROPIC=sk-ant-your-key-here
API_KEY_OPENAI=sk-your-key-here
API_KEY_GEMINI=AIzaSy-your-key-here
API_KEY_GROQ=gsk_your-key-here
# Optional: Model Configuration
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
OPENAI_MODEL=gpt-4o
GEMINI_MODEL=gemini-2.0-flash
GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
# Optional: Server Settings
HOST=0.0.0.0
PORT=8080
DEBUG=false
LOG_LEVEL=INFO
MAX_FILE_SIZE=10485760 # 10MB
Model | Classification | Extraction | Notes |
---|---|---|---|
anthropic |
β | β | Claude Sonnet, high accuracy |
openai |
β | β | GPT-4o, good performance |
google |
β | β | Gemini Flash, fast processing |
groq |
β | β | Llama, cost-effective |
Process an image from a URL.
Request:
curl -X POST 'http://localhost:8080/api/run' \
--header 'Content-Type: application/json' \
--data '{
"image_url": "https://pbs.twimg.com/media/GoCeF4wbwAE24ln?format=jpg&name=large",
"model_classification": "anthropic",
"model_extraction": "anthropic",
"time_outputs": true
}'
Python Example:
import requests
url = "http://localhost:8080/api/run"
payload = {
"image_url": "https://pbs.twimg.com/media/GoCeF4wbwAE24ln?format=jpg&name=large",
"model_classification": "anthropic",
"model_extraction": "anthropic",
"time_outputs": True
}
response = requests.post(url, json=payload)
print(response.json())
Response:
{
"result": [
{"Month": "January", "Sales": 1000, "Profit": 200}
],
"metrics": {
"total_time": 2.345
}
}
Process an image from URL or local file path.
{
"image_url": "https://example.com/chart.png",
// OR
"file_path": "/path/to/local/image.jpg",
"model_classification": "anthropic",
"model_extraction": "anthropic"
}
Create docker-compose.yml
:
version: '3.8'
services:
rows-vision:
build: .
container_name: rows-vision-api
ports:
- "8080:8080"
env_file:
- .env
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
docker-compose up -d
Google Cloud Run:
gcloud run deploy rows-vision --source . --platform managed --allow-unauthenticated
AWS ECS / Digital Ocean / Others: Use the Docker image built above with your preferred container orchestration platform.
# Using Gunicorn (production WSGI server)
pip install gunicorn
gunicorn --bind 0.0.0.0:8080 --workers 4 --timeout 120 main:app
# Health check
curl http://localhost:8080/health
# Docker container status
docker ps
docker logs rows-vision-api --tail 50 -f
# Resource monitoring
docker stats rows-vision-api
Supported Formats: PNG, JPG, JPEG, GIF, WEBP, HEIC
Chart Types: Line, Bar, Scatter, Pie, Tables, Receipts
Processing: In-memory, no file storage required
Architecture: Flask API + AI model backends
rows_vision/
βββ src/ # Source code
β βββ main.py # Flask application
β βββ config.py # Configuration
β βββ image_analyzer.py # Data extraction
β βββ image_classifier.py # Image classification
β βββ rows_vision.py # Main orchestrator
βββ prompts/ # AI prompt templates
βββ main.py # Application entry point
βββ requirements.txt # Dependencies
βββ Dockerfile # Container definition
βββ setup.sh # Automated setup script
βββ .env.example # Environment template
- Support user prompt for finer operations
Improve error handlingβ DoneDocker deploymentβ DoneProduction-ready loggingβ Done- Support for batch processing
- PDF processing improvements
Missing API Keys:
# Check if keys are loaded
docker run --env-file .env rows-vision python -c "import os; print('Keys loaded:', bool(os.getenv('API_KEY_ANTHROPIC')))"
Container Issues:
# Check logs
docker logs rows-vision-api
# Debug mode
docker run -it --env-file .env -e DEBUG=true rows-vision
Port Conflicts:
# Use different port
docker run -d -p 8081:8080 --env-file .env rows-vision
This project is licensed under the MIT License.
PRs and issues are welcome. Please fork the repo and submit changes via pull request.
Created by @asamagaio at Rows.