Local HTTP microservices for OCR and open-vocabulary UI detection.
computer-lab exposes EasyOCR and Grounding DINO through small Python servers, so automation systems can call vision models over HTTP instead of embedding those dependencies directly.
| Service | Default port | Endpoint | Returns |
|---|---|---|---|
| OCR | 9003 |
POST /ocr |
Text regions with text, box, center, confidence |
| Detection | 9004 |
POST /detect |
Grounded boxes with label, box, center, confidence |
Useful for:
- screenshot-to-text pipelines
- UI and desktop automation
- agent systems that need OCR or visual grounding behind a stable local API
Python 3.10+ is required. A GPU is recommended for usable detection latency; OCR can run on CPU.
git clone https://github.com/belarusian/computer-lab.git
cd computer-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip wheel
pip install -e ".[gpu]"For a CPU-oriented install:
pip install -e ".[cpu]"Run the services in separate terminals:
python servers/ocr_server.py 9003
python servers/detect_server.py 9004Health checks:
curl -s http://127.0.0.1:9003/health
curl -s http://127.0.0.1:9004/healthThe repo includes manual integration scripts rather than a full automated test suite.
export OCR_URL=http://127.0.0.1:9003/ocr
python test_ocr_locate.pyexport DETECT_URL=http://127.0.0.1:9004/detect
python test_detect_locate.py "close button"Both scripts capture the current screen with pyautogui, so they require a local desktop session.
POST /ocrRequest body: raw PNG bytes. Response: JSON list of{text, box, center, confidence}.POST /detectRequest body: JSON{ "image": "<base64 PNG>", "query": "..." }. Ifqueryis omitted, the server runs a generic open detection pass.POST /detect_rawRequest body: raw PNG bytes. Optional header:X-Query.GET /healthReturns a simple JSON health response from each server.
EASYOCR_GPU=0forces EasyOCR to stay on CPU.- The detection server uses CUDA automatically when available and falls back to CPU otherwise.
MIT. See LICENSE.