Ollama OCR

A Python-based OCR tool leveraging the Llama 3.2-Vision model for highly accurate text recognition from images, preserving original formatting and structure.

Features

🚀 High Accuracy: Text recognition powered by the Llama 3.2-Vision model.
📝 Preserves Formatting: Maintains the original structure and layout of the recognized text.
🖼️ Wide Format Support: Works with image formats such as .jpg, .jpeg, and .png.
⚡️ Customizable Output: Returns results in either Markdown or JSON format.
💪 Robust Error Handling: Ensures smooth processing with clear error messages for unsupported formats or invalid configurations.

System Requirements

Python 3.8 or higher
Ollama Server running locally
Llama 3.2-Vision model installed

Prerequisites

Ensure the Ollama server is running before using the tool.
Download and configure the Llama 3.2-Vision model for OCR tasks.
```
ollama pull llama3.2-vision
```

Instalation

pip install ollamaocr-python

Usage

Basic Usage

from ollamaocr_python.ollamaocr import OllamaOCR

# Initialize the OCR tool
ocr = OllamaOCR()

# Perform OCR in Markdown format
markdown_result = ocr.perform_ocr("path/to/image.jpg", output_format="markdown")
print(markdown_result)

# Perform OCR in JSON format
json_result = ocr.perform_ocr("path/to/image.jpg", output_format="json")
print(json_result)

Error Handling

The class provides comprehensive error handling for unsupported formats or invalid configurations:

from ollamaocr_python.ollamaocr import OllamaOCR

ocr = OllamaOCR()

try:
    result = ocr.perform_ocr("invalid_file.bmp", output_format="markdown")
except ValueError as e:
    print(f"Error: {e}")

Customizable Prompts

Modify the prompts used for OCR to suit specific requirements:

Markdown Prompt: Preserves formatting in Markdown structure.
JSON Prompt: Outputs results in JSON format.

Limitations

Currently supports only .jpg, .jpeg, and .png image formats. Requires the Ollama server to be running locally with the Llama 3.2-Vision model installed.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ollamaocr_python		ollamaocr_python
test		test
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama OCR

Features

System Requirements

Prerequisites

Instalation

Usage

Error Handling

Customizable Prompts

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ollama OCR

Features

System Requirements

Prerequisites

Instalation

Usage

Error Handling

Customizable Prompts

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages