You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great if you can look at adding a feature of style recognition and transfer. This along with layout preservation would be a great asset to the OCR pipeline.
The text was updated successfully, but these errors were encountered:
atlury
changed the title
[Feature Request] Recognize text styles including size, font type, color, bold
[Feature Request] Recognize text styles including size, font type, color, boldness, italics
Mar 5, 2024
Kosmos-2.5: A cutting-edge multimodal literate model revolutionizing text-intensive image understanding. This looks interesting, you can probably explore a bit.
To quote
"Kosmos-2.5 excels in: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. The model can be adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning."
It would be great if you can look at adding a feature of style recognition and transfer. This along with layout preservation would be a great asset to the OCR pipeline.
The text was updated successfully, but these errors were encountered: