A Python-based OCR tool leveraging the Llama 3.2-Vision model for highly accurate text recognition from images, preserving original formatting and structure.
- 🚀 High Accuracy: Text recognition powered by the Llama 3.2-Vision model.
- 📝 Preserves Formatting: Maintains the original structure and layout of the recognized text.
- 🖼️ Wide Format Support: Works with image formats such as
.jpg,.jpeg, and.png. - ⚡️ Customizable Output: Returns results in either Markdown or JSON format.
- 💪 Robust Error Handling: Ensures smooth processing with clear error messages for unsupported formats or invalid configurations.
- Python 3.8 or higher
- Ollama Server running locally
- Llama 3.2-Vision model installed
- Ensure the Ollama server is running before using the tool.
- Download and configure the Llama 3.2-Vision model for OCR tasks.
ollama pull llama3.2-vision
pip install ollamaocr-python
Basic Usage
from ollamaocr_python.ollamaocr import OllamaOCR
# Initialize the OCR tool
ocr = OllamaOCR()
# Perform OCR in Markdown format
markdown_result = ocr.perform_ocr("path/to/image.jpg", output_format="markdown")
print(markdown_result)
# Perform OCR in JSON format
json_result = ocr.perform_ocr("path/to/image.jpg", output_format="json")
print(json_result)
The class provides comprehensive error handling for unsupported formats or invalid configurations:
from ollamaocr_python.ollamaocr import OllamaOCR
ocr = OllamaOCR()
try:
result = ocr.perform_ocr("invalid_file.bmp", output_format="markdown")
except ValueError as e:
print(f"Error: {e}")
Modify the prompts used for OCR to suit specific requirements:
- Markdown Prompt: Preserves formatting in Markdown structure.
- JSON Prompt: Outputs results in JSON format.
Currently supports only .jpg, .jpeg, and .png image formats. Requires the Ollama server to be running locally with the Llama 3.2-Vision model installed.
This project is licensed under the MIT License.
