Skip to content

ankitprasad81/python_ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” Python-OCR: AI Vision-Powered Text to Markdown Converter

Python 3.8+ Groq License: MIT PRs Welcome

๐Ÿš€ Transform images into beautifully formatted markdown in seconds using Groq's state-of-the-art Vision AI!

Why Python-OCR? ๐Ÿค”

  • ๐Ÿ“ธ Instant Text Extraction: Convert any image containing text into clean markdown
  • ๐ŸŽฏ High Accuracy: Powered by Groq's advanced LLaMA vision models
  • ๐ŸŽจ Format Preservation: Maintains original styling, lists, and tables
  • ๐ŸŒ Simple Web Interface: User-friendly Gradio UI, no coding needed
  • ๐Ÿ”’ Secure: Your API key, your control - no data storage

๐Ÿš€ Quick Start

Prerequisites

# Requirements
Python 3.8+
Groq API Key (Get yours at https://console.groq.com/)

โšก One-Line Installation

pip install -r requirements.txt

๐Ÿƒโ€โ™‚๏ธ Run It

python run.py

Then open http://localhost:7860 in your browser!

๐ŸŽฎ How to Use

  1. ๐Ÿ”‘ Get Your API Key

  2. ๐Ÿ–ผ๏ธ Convert Images

    • Paste your API key
    • Upload any image with text
    • Choose your model:
      • Fast Mode: llama-3.2-11b-vision-preview
      • Accurate Mode: llama-3.2-90b-vision-preview
    • Click "โœจ Convert Now!"
  3. ๐Ÿ“ Get Results

    # Your markdown appears here!
    - With perfect formatting
    - And structure preserved

๐Ÿ Python API

The Python-OCR package provides a simple yet powerful API through the OCRProcessor class.

Basic Usage

from python_ocr.processor import OCRProcessor

# Initialize with your Groq API key
processor = OCRProcessor(api_key="your_key")

# Convert image to markdown
markdown_text = processor.convert_to_markdown("path/to/image.jpg")
print(markdown_text)

Advanced Configuration

# Use the more accurate 90B model
processor = OCRProcessor(
    api_key="your_key",
    model_name="llama-3.2-90b-vision-preview"
)

API Response Format

The convert_to_markdown method returns a string with two sections:

# Raw Text
[Exact text from the image, preserving formatting]

# Content Analysis
[Detailed analysis of layout and structure]

Error Handling

try:
    markdown_text = processor.convert_to_markdown("image.jpg")
except ValueError as e:
    print(f"Configuration error: {e}")  # Invalid API key or model
except Exception as e:
    print(f"Processing error: {e}")  # Network or API errors

Supported Image Formats

  • JPEG/JPG
  • PNG
  • BMP
  • TIFF

Model Options

  1. llama-3.2-11b-vision-preview

    • Faster processing
    • Good for simple text
    • Default choice
  2. llama-3.2-90b-vision-preview

    • Higher accuracy
    • Better for complex layouts
    • Recommended for handwriting

Best Practices

  • Keep your API key secure (use environment variables)
  • Use appropriate model for your use case
  • Ensure good image quality for better results
  • Handle API rate limits in production

๐ŸŽฏ Perfect For

  • ๐Ÿ“š Documentation Teams: Convert handwritten notes to digital docs
  • ๐ŸŽ“ Students: Transform textbook pages into study notes
  • ๐Ÿ’ผ Developers: Extract code snippets from screenshots
  • ๐Ÿ“Š Analysts: Convert tables from images to markdown
  • ๐Ÿ“ Content Creators: Streamline content migration

๐Ÿ”ง Technical Details

Architecture

Python-OCR/
โ”œโ”€โ”€ python_ocr/
โ”‚   โ”œโ”€โ”€ __init__.py     # Package initialization
โ”‚   โ”œโ”€โ”€ processor.py    # Core OCR & API logic
โ”‚   โ””โ”€โ”€ interface.py    # Gradio UI
โ””โ”€โ”€ run.py             # Entry point

Dependencies

  • ๐Ÿ”„ groq: Vision API integration
  • ๐ŸŒ httpx: Modern HTTP client
  • ๐ŸŽจ gradio: Interactive UI
  • ๐Ÿ“ธ Pillow: Image processing
  • โœจ markdown: Text formatting

๐Ÿค Contributing

We love your input! Want to help? Check out our Contributing Guide.

Quick ways to contribute:

  • ๐ŸŒŸ Star this repo
  • ๐Ÿ› Report bugs
  • ๐Ÿ’ก Suggest features
  • ๐Ÿ”ง Submit PRs

๐Ÿ“ˆ Roadmap

  • Multi-language support
  • Batch processing
  • Custom markdown templates
  • API rate limiting handling
  • Enhanced error recovery

๐Ÿ’ฌ Community & Support

๐Ÿ“œ License

MIT ยฉ Ankit Kumar


Made with โค๏ธ in India ๐Ÿ‡ฎ๐Ÿ‡ณ

โญ Star us on GitHub โ€” it motivates me a lot!

About

AI-powered image to markdown conversion using Groq Vision

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages