Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Recognize text styles including size, font type, color, boldness, italics #47

Open
atlury opened this issue Mar 5, 2024 · 2 comments

Comments

@atlury
Copy link

atlury commented Mar 5, 2024

It would be great if you can look at adding a feature of style recognition and transfer. This along with layout preservation would be a great asset to the OCR pipeline.

@atlury
Copy link
Author

atlury commented Mar 5, 2024

@atlury atlury changed the title [Feature Request] Recognize text styles including size, font type, color, bold [Feature Request] Recognize text styles including size, font type, color, boldness, italics Mar 5, 2024
@atlury
Copy link
Author

atlury commented Mar 5, 2024

Kosmos-2.5: A cutting-edge multimodal literate model revolutionizing text-intensive image understanding. This looks interesting, you can probably explore a bit.

To quote
"Kosmos-2.5 excels in: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. The model can be adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant