-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Requested feature
Docling recently added support for structured data extraction using a predefined schema (see here), but only currently supports pdfs & image files. Adding support for Excel/spreadsheet files would be extremely useful for myself and (I imagine) many others. I understand this feature is still in beta, but I am willing to try to implement Excel/spreadsheet file support myself and make a PR if the Docling team supports this.
Alternatives
There are several other libraries that do this, most of them direct competitors with Docling (Llamaindex, Unstructured, etc). However, most of those competitors have only limited open-source/free options; they typically require using their paid API for the most effective solutions.
Conclusion
I am a firm believer in open-source software, and I believe adding this feature to Docling would benefit the project tremendously and encourage many users to use Docling over other closed-source competitors. As I mentioned above, I am willing to try to implement this myself and make a PR if the Docling team supports this idea.
If anyone has suggestions on how to go about implementing this, or feedback, questions, etc., please let me know.
Thank you!