FEATURE: Ingest API endpoint should accept Document
properties
#3272
Labels
backend
Issues related to Aleph’s backend, API, CLI etc.
data-desk
feature-request
Requests for new features or enhancements of existing features
Major
issue that requires attention
Is your feature request related to a problem? Please describe.
The Aleph ingest endpoint is not only used by the UI when uploading individual files, but also by tools from the ecosystem such as Memorious. The ingest endpoint allows passing some metadata that will be added to the resulting
Document
entity created for uploaded files.Metadata fields include:
author
authored_at
countries
crawler
date
file_name
foreign_id
mime_type
retrieved_at
source_url
Most of these metadata fields map to FtM properties. However, the mapping is not always 1:1. For example there can only be one
author
in the metadata field, but FtM supports multiple values for theauthor
property and metadata fields usesnake_case
whereas FtM properties arecamelCase
. Also, the metadata fields do not represent allDocument
properties.(The current implementation has historic reasons -- in previous Aleph versions uploaded files where a concept separate from entities. At some point, it was decided to represent uploaded files as entities, too.)
This leads to a bad UX because the available metadata fields and their naming is not intuitive. Additionally, this limits the ability to preprocess documents outside of Aleph before ingestion.
Describe the solution you'd like
The ingest endpoint should accept all FtM
Document
properties. To make sure we do not break existing Memorious crawlers or other integrations, we should keep support for the current metadata fields, and allow passing metadata formatted as FtM properties in addition to that.This would then allow adjusting the Memorious
aleph_emit_document
operation to also support any FtM property in a generic way.Describe alternatives you've considered
-/-
Additional context
The text was updated successfully, but these errors were encountered: