Skip to content

Add PDF to supported formats; summarize content and extract tags using LLM #90

@jqnatividad

Description

@jqnatividad

The legacy Datapusher used to support PDFs, as messytables supported extracting tables from PDFs using pdftables.

That functionality has been removed, as well as Excel support.

We reenabled Excel support in DP+ using qsv.

We should re-enable PDF support again, not to extract tables for now (though there is tabula-rs), but to summarize the content for the Description field and suggest tags.

Metadata

Metadata

Assignees

No one assigned

    Labels

    1.xwill be done in DP+ 1.x - DP+ running as CKAN extensionenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions