Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Ingest API endpoint should accept Document properties #3272

Open
tillprochaska opened this issue Aug 8, 2023 · 2 comments
Open
Labels
backend Issues related to Aleph’s backend, API, CLI etc. data-desk feature-request Requests for new features or enhancements of existing features Major issue that requires attention

Comments

@tillprochaska
Copy link
Contributor

tillprochaska commented Aug 8, 2023

Is your feature request related to a problem? Please describe.
The Aleph ingest endpoint is not only used by the UI when uploading individual files, but also by tools from the ecosystem such as Memorious. The ingest endpoint allows passing some metadata that will be added to the resulting Document entity created for uploaded files.

Metadata fields include:

  • author
  • authored_at
  • countries
  • crawler
  • date
  • file_name
  • foreign_id
  • mime_type
  • retrieved_at
  • source_url
  • … and a few more

Most of these metadata fields map to FtM properties. However, the mapping is not always 1:1. For example there can only be one author in the metadata field, but FtM supports multiple values for the author property and metadata fields use snake_case whereas FtM properties are camelCase. Also, the metadata fields do not represent all Document properties.

(The current implementation has historic reasons -- in previous Aleph versions uploaded files where a concept separate from entities. At some point, it was decided to represent uploaded files as entities, too.)

This leads to a bad UX because the available metadata fields and their naming is not intuitive. Additionally, this limits the ability to preprocess documents outside of Aleph before ingestion.

Describe the solution you'd like
The ingest endpoint should accept all FtM Document properties. To make sure we do not break existing Memorious crawlers or other integrations, we should keep support for the current metadata fields, and allow passing metadata formatted as FtM properties in addition to that.

This would then allow adjusting the Memorious aleph_emit_document operation to also support any FtM property in a generic way.

Describe alternatives you've considered
-/-

Additional context

@tillprochaska tillprochaska added backend Issues related to Aleph’s backend, API, CLI etc. feature-request Requests for new features or enhancements of existing features Major issue that requires attention labels Aug 8, 2023
@tillprochaska
Copy link
Contributor Author

tillprochaska commented Aug 8, 2023

Related to #3273 (similar use case, but different issue)

@brrttwrks
Copy link

What is the status of this? This has come up again as a limitation - I need to be able to add a category to a document so journalists can filter the documents by a type. I am fine putting this in the document role or keyword properties, but I cannot set either due to the current limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issues related to Aleph’s backend, API, CLI etc. data-desk feature-request Requests for new features or enhancements of existing features Major issue that requires attention
Projects
None yet
Development

No branches or pull requests

2 participants