Batch processor enhancemenst through raw data parameter #3702

axsaucedo · 2021-10-26T13:59:17Z

As discussed, the way that this will be explored will be in a way that will address #2657, #3409, #3681 and #3408. More specifically the functionality of the batch processor will be extended to support raw json inputs in the form of valid SeldonMessage values, which will support also support a limited and specified version of microbatching. This will ensrue the float to int issue is no longer this would also encompass extending the seldon_client to support raw data for predict parameters.

For the case of micro batching, the way that it will be handled will be as follows:

Input is

{"names": ["a", "b", "c"], "data": {"ndarray": [[1,2,3]]}, "meta": { "tags": {"internal-id": 1} }}
{"names": ["a", "b", "c"], "data": {"ndarray": [[1,2,3]]}, "meta": { "tags": {"internal-id": 2} }}

If microbatch value is 1, then each request is sent as is. However if microbatching request is 2 then microbatching is limited to only ndarray and tensor data provided, and initial request is sent without the meta , and with the names of the first parameter. Similar to other requests, it would still be sent with the unique batch ID

{"names": ["a", "b", "c"], "data": {"ndarray": [[1,2,3], [1,2,3]]}, “meta”: { “tags”: {“batch-uid”: …} }}

Let's say that the response would contain the following data

{"names": ["a", "b", "c"], "data": {"ndarray": [[9,9,9], [8,8,8]]}, "meta": { "tags": {"extra_id": 0}}}

Then it the response would merge the previous meta content of each request with the meta batch params (batch uid), giving the output as following

{"names": ["a", "b", "c"], "data": {"ndarray": [[9,9,9]]}, "meta": {"tags": {“batch-uid”: …, "internal-id": 1, "extra_id": 0}}}
{"names": ["a", "b", "c"], "data": {"ndarray": [[8,8,8]]}, "meta": {"tags": {“batch-uid”: …, "internal-id": 2, "extra_id": 0}}}

The text was updated successfully, but these errors were encountered:

RafalSkolasinski · 2021-10-28T09:58:53Z

To clarify more on meta in output file. There are three inserted created by batch component, e.g. single row in output would contain:

{
  ...,
  "meta": {
    "tags": {
      "batch_id": "3d6acd6c-3744-11ec-951d-c3e49d18a2d2",
      "batch_index": 2,
      "batch_instance_id": "3d6b3f88-3744-11ec-951d-c3e49d18a2d2"
    }
}

where:

batch_id - unique for batch job as a whole, same value for each instance (row) in the input/output file
batch_index - index (number of row) at which given instance in output.txt was present in input.txt
batch_instance_id - unique identifier of each instance

Now, batch_instance_id with BATCH_SIZE=1 (each request sending only one instance) is equal to Seldon-Puid sent to the model server. This is also used as index when logging into ELK: id = {seldon-puid} as it is.

However, if BATCH_SIZE = N > 1 then mini-batch contains N instances sent with single {seldon-puid} = batch_instance_id[0]. These get logged into ELK with id = {seldon-puid}-item[n] where n = 0, ..., N - 1 identifies instances in the mini-batch, effectively being batch_instance_id[0]-item[n].

The problem is that in output.txt the instances from the said mini-batch currently have for n = 1, ..., N - 2 different batch_instance_id than the value of related seldon-puid and are present in ELK under different indices.

We should probably get these in sync and set batch_instance_id to follow the same patter, therefore for all instances grouped in single mini-batch to be {seldon-puid}-item[n] with n = 0, ..., N - 1.

axsaucedo added the triage Needs to be triaged and prioritised accordingly label Oct 26, 2021

seldondev assigned RafalSkolasinski Oct 26, 2021

seldondev removed the triage Needs to be triaged and prioritised accordingly label Oct 28, 2021 — with Board Genius Sync

RafalSkolasinski mentioned this issue Nov 2, 2021

Batch processor enhancemenst through raw data parameter #3718

Merged

axsaucedo closed this as completed in #3718 Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processor enhancemenst through raw data parameter #3702

Batch processor enhancemenst through raw data parameter #3702

axsaucedo commented Oct 26, 2021 •

edited by seldondev

RafalSkolasinski commented Oct 28, 2021

Batch processor enhancemenst through raw data parameter #3702

Batch processor enhancemenst through raw data parameter #3702

Comments

axsaucedo commented Oct 26, 2021 • edited by seldondev

RafalSkolasinski commented Oct 28, 2021

axsaucedo commented Oct 26, 2021 •

edited by seldondev