-
Notifications
You must be signed in to change notification settings - Fork 0
Description
@tenkus47 @gdetari Let's discuss here about the workflow / API between the backend (this repo) and the frontend (https://github.com/OpenPecha/buddhistAI-cataloger) to get / update a volume.
- overall I think the GET we currently have is pretty good, important:
- it should return the internal id of the document (ex:
W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1in theidfield)
- it should return the internal id of the document (ex:
- the frontend saves the pagination data (
pagesfield), the existing segments, merges the chunks into the base text, and (importantly) keeps the fieldsid,rep_id,vol_id,vol_version,mw_idandwa_idin its internal representation of the volume - once the frontend is finished, it should send the following request:
POST api/v1/volumes/{id}with the internal id of the volume, so for instancePOST api/v1/volumes/W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1- payload should be
{
"rep_id": "W... or IE...",
"record_status": "..."
"vol_id": "...",
"vol_version": "... (string)",
"base_text": "the base text (not chunked)",
"segments": [
{
"cstart": ... (int),
"cend": ... (int),
"idx": ... (int),
"title_bo": "(mandatory) title, can also be a list. If there's not title in the text, annotators have to make one up and prefix it with '*' ",
"author_name_bo": "(optional) author name as present in the text, can also be a list if there's multiple",
"mw_id": "the id of the segment, unique and persistent. Must start with '{mw_id}_', for instance MW123_456 for mw_id=MW123",
"wa_id": "(mandatory for part_type = text) the id of the work (abstract text)",
"part_type": "text or editorial",
}
]
}importantly, base_text should be what the frontend actually used to calculate the character coordinates, not a copy of the original merged chunks (even though they are theoretically the same). The backend checks that they are the same. Note also that status has been changed to record_status.
Also, the frontend needs to account for cases where segments are present (imported from BDRC) but don't have cstart / cend. In that case the segments are assumed to be ordered by idx.
When the frontend needs to create a new mw_id or a segment, it should do the following:
- look at the
mw_idof the volume, let's call it{main_mw_id} - look at the
vol_idof the volume - generate a 6 character random string, composed of upper-case ASCII letters and digits, called
{rand6} - produce
{main_mw_id}_{vol_id}_BC_{rand6}