clarify API workflow for volumes

@tenkus47 @gdetari  Let's discuss here about the workflow / API between the backend (this repo) and the frontend (https://github.com/OpenPecha/buddhistAI-cataloger) to get / update a volume.

- overall I think the GET we currently have is pretty good, important:
   * it should return the internal id of the document (ex: `W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1` in the `id` field)
- the frontend saves the pagination data (`pages` field), the existing segments, merges the chunks into the base text, and (importantly) keeps the fields `id`, `rep_id`, `vol_id`, `vol_version`, `mw_id` and `wa_id`  in its internal representation of the volume
- once the frontend is finished, it should send the following request:
   * `POST api/v1/volumes/{id}` with the internal id of the volume, so for instance `POST api/v1/volumes/W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1`
   * payload should be
```json
{
  "rep_id": "W... or IE...",
  "record_status": "..."
  "vol_id": "...",
  "vol_version": "... (string)",
  "base_text": "the base text (not chunked)",
  "segments": [
    {
       "cstart": ... (int),
       "cend": ... (int),
       "idx": ... (int),
       "title_bo": "(mandatory) title, can also be a list. If there's not title in the text, annotators have to make one up and prefix it with '*' ",
       "author_name_bo": "(optional) author name as present in the text, can also be a list if there's multiple",
       "mw_id": "the id of the segment, unique and persistent. Must start with '{mw_id}_', for instance MW123_456 for mw_id=MW123",
       "wa_id": "(mandatory for part_type = text) the id of the work (abstract text)",
      "part_type": "text or editorial",
    }
  ]
}
```

importantly, base_text should be what the frontend actually used to calculate the character coordinates, not a copy of the original merged chunks (even though they are theoretically the same). The backend checks that they are the same. Note also that `status` has been changed to `record_status`.

Also, the frontend needs to account for cases where segments are present (imported from BDRC) but don't have cstart / cend. In that case the segments are assumed to be ordered by `idx`.

When the frontend needs to create a new mw_id or a segment, it should do the following:
- look at the `mw_id` of the volume, let's call it `{main_mw_id}`
- look at the `vol_id` of the volume
- generate a 6 character random string, composed of upper-case ASCII letters and digits, called `{rand6}`
- produce `{main_mw_id}_{vol_id}_BC_{rand6}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify API workflow for volumes #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

clarify API workflow for volumes #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions