๐ Transform images into beautifully formatted markdown in seconds using Groq's state-of-the-art Vision AI!
- ๐ธ Instant Text Extraction: Convert any image containing text into clean markdown
- ๐ฏ High Accuracy: Powered by Groq's advanced LLaMA vision models
- ๐จ Format Preservation: Maintains original styling, lists, and tables
- ๐ Simple Web Interface: User-friendly Gradio UI, no coding needed
- ๐ Secure: Your API key, your control - no data storage
# Requirements
Python 3.8+
Groq API Key (Get yours at https://console.groq.com/)pip install -r requirements.txtpython run.pyThen open http://localhost:7860 in your browser!
-
๐ Get Your API Key
- Sign up at Groq Console
- Create an API key
- Keep it secure!
-
๐ผ๏ธ Convert Images
- Paste your API key
- Upload any image with text
- Choose your model:
- Fast Mode:
llama-3.2-11b-vision-preview - Accurate Mode:
llama-3.2-90b-vision-preview
- Fast Mode:
- Click "โจ Convert Now!"
-
๐ Get Results
# Your markdown appears here! - With perfect formatting - And structure preserved
The Python-OCR package provides a simple yet powerful API through the OCRProcessor class.
from python_ocr.processor import OCRProcessor
# Initialize with your Groq API key
processor = OCRProcessor(api_key="your_key")
# Convert image to markdown
markdown_text = processor.convert_to_markdown("path/to/image.jpg")
print(markdown_text)# Use the more accurate 90B model
processor = OCRProcessor(
api_key="your_key",
model_name="llama-3.2-90b-vision-preview"
)The convert_to_markdown method returns a string with two sections:
# Raw Text
[Exact text from the image, preserving formatting]
# Content Analysis
[Detailed analysis of layout and structure]try:
markdown_text = processor.convert_to_markdown("image.jpg")
except ValueError as e:
print(f"Configuration error: {e}") # Invalid API key or model
except Exception as e:
print(f"Processing error: {e}") # Network or API errors- JPEG/JPG
- PNG
- BMP
- TIFF
-
llama-3.2-11b-vision-preview- Faster processing
- Good for simple text
- Default choice
-
llama-3.2-90b-vision-preview- Higher accuracy
- Better for complex layouts
- Recommended for handwriting
- Keep your API key secure (use environment variables)
- Use appropriate model for your use case
- Ensure good image quality for better results
- Handle API rate limits in production
- ๐ Documentation Teams: Convert handwritten notes to digital docs
- ๐ Students: Transform textbook pages into study notes
- ๐ผ Developers: Extract code snippets from screenshots
- ๐ Analysts: Convert tables from images to markdown
- ๐ Content Creators: Streamline content migration
Python-OCR/
โโโ python_ocr/
โ โโโ __init__.py # Package initialization
โ โโโ processor.py # Core OCR & API logic
โ โโโ interface.py # Gradio UI
โโโ run.py # Entry point
- ๐
groq: Vision API integration - ๐
httpx: Modern HTTP client - ๐จ
gradio: Interactive UI - ๐ธ
Pillow: Image processing - โจ
markdown: Text formatting
We love your input! Want to help? Check out our Contributing Guide.
Quick ways to contribute:
- ๐ Star this repo
- ๐ Report bugs
- ๐ก Suggest features
- ๐ง Submit PRs
- Multi-language support
- Batch processing
- Custom markdown templates
- API rate limiting handling
- Enhanced error recovery
- ๐ Found a bug? Open an issue
- ๐ก Have an idea? Start a discussion
- ๐ง Need help? Contact us
MIT ยฉ Ankit Kumar
Made with โค๏ธ in India ๐ฎ๐ณ
โญ Star us on GitHub โ it motivates me a lot!