Skip to content

clarify API workflow for volumes #6

@eroux

Description

@eroux

@tenkus47 @gdetari Let's discuss here about the workflow / API between the backend (this repo) and the frontend (https://github.com/OpenPecha/buddhistAI-cataloger) to get / update a volume.

  • overall I think the GET we currently have is pretty good, important:
    • it should return the internal id of the document (ex: W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1 in the id field)
  • the frontend saves the pagination data (pages field), the existing segments, merges the chunks into the base text, and (importantly) keeps the fields id, rep_id, vol_id, vol_version, mw_id and wa_id in its internal representation of the volume
  • once the frontend is finished, it should send the following request:
    • POST api/v1/volumes/{id} with the internal id of the volume, so for instance POST api/v1/volumes/W00CHZ0103341_I1CZ35_822f2e_ocrv1-ws-ldv1
    • payload should be
{
  "rep_id": "W... or IE...",
  "record_status": "..."
  "vol_id": "...",
  "vol_version": "... (string)",
  "base_text": "the base text (not chunked)",
  "segments": [
    {
       "cstart": ... (int),
       "cend": ... (int),
       "idx": ... (int),
       "title_bo": "(mandatory) title, can also be a list. If there's not title in the text, annotators have to make one up and prefix it with '*' ",
       "author_name_bo": "(optional) author name as present in the text, can also be a list if there's multiple",
       "mw_id": "the id of the segment, unique and persistent. Must start with '{mw_id}_', for instance MW123_456 for mw_id=MW123",
       "wa_id": "(mandatory for part_type = text) the id of the work (abstract text)",
      "part_type": "text or editorial",
    }
  ]
}

importantly, base_text should be what the frontend actually used to calculate the character coordinates, not a copy of the original merged chunks (even though they are theoretically the same). The backend checks that they are the same. Note also that status has been changed to record_status.

Also, the frontend needs to account for cases where segments are present (imported from BDRC) but don't have cstart / cend. In that case the segments are assumed to be ordered by idx.

When the frontend needs to create a new mw_id or a segment, it should do the following:

  • look at the mw_id of the volume, let's call it {main_mw_id}
  • look at the vol_id of the volume
  • generate a 6 character random string, composed of upper-case ASCII letters and digits, called {rand6}
  • produce {main_mw_id}_{vol_id}_BC_{rand6}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions