A hub for various industry-specific schemas to be used with VLMs.
-
Updated
Mar 16, 2025 - Python
A hub for various industry-specific schemas to be used with VLMs.
Yet another self-hosted AI voice assistant. GlaDOS' blazing fast pipeline with a more realistic Kokoro TTS voice and vision.
IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.
DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.
Add a description, image, and links to the vlm-ocr topic page so that developers can more easily learn about it.
To associate your repository with the vlm-ocr topic, visit your repo's landing page and select "manage topics."