Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAGRUN (or PIPERUN) command #88

Closed
lantiga opened this issue Mar 12, 2019 · 10 comments
Closed

Add DAGRUN (or PIPERUN) command #88

lantiga opened this issue Mar 12, 2019 · 10 comments

Comments

@lantiga
Copy link
Contributor

lantiga commented Mar 12, 2019

A common pattern is enqueuing multiple SCRIPTRUN and MODELRUN commands. Setting input tensors, storing intermediates of *RUN commands and storing outputs all in keyspace means that all written keys will be AOF'd and replicated even if they will be disposed of shortly after.

DAGRUN

A general solution to this problem is to have a DAGRUN (or PIPERUN, we'll see about the name - it is a DAG rather than a pipe, so it should probably be named correctly) command that allows to run a sequence of AI.* commands, e.g.:

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          SCRIPTRUN bar baz INPUTS ~c~ OUTPUTS d

Note the key names between ~ ~: these are volatile keys (~a~ is volatile because it has small wings :-) ). A volatile key is not set into keyspace, but it just lives in-memory for the duration of the command, after which is deallocated. In the command above, ~a~, ~b~, ~c~ are volatile, so they don't touch keyspace (and are not replicated). Only the output key d is stored in keyspace, for later retrieval.

Relationship to MODELRUN that returns results

This design supersedes #85, which proposed a MODELRUN variant that returns the output. This can be now obtained by:

AI.DAGRUN MODELRUN foo INPUTS a b OUTPUTS ~c~ => TENSORGET ~c~

or, without touching keyspace at all

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          TENSORGET ~c~

Advantages

Apart from the obvious convenience, there are several advantages with this design:

  • once a DAGRUN command is sent, we can apply DAG optimization strategies; for instance, in case of ensambles, we can execute different MODELRUN subcalls in parallel (this was not possible before because each call was blocking on the client) and then join on the results to execute a further SCRIPTRUN that computes the ensembled outputs;
  • when a DAGRUN call executes or if it fails, all volatile keys are deallocated at once; volatile keys are never seen by other clients, they only exist in the context of the call; therefore, we just need a small hash object in the call context that gets naturally deallocated

Additional commands

  • DAGRUNRO: read-only variant that is amenable for execution on replicas; if a non volatile key is attempted to be written, it errors out
  • DAGRUNASYNC: fully async variant, that just returns OK, or, probably better, an id (for eventually querying the status of the run or cancel it, as future commands). The user can then listen to keyspace notifications on the output keys (or check that the key has been written, or in the future query the status of the computation). This is relevant for use in webservices in which handlers shouldn't block.
@hhsecond
Copy link

I have more thoughts on this variant but could be a bit sophisticated and might remove certain benefits that we could get from the proposed DAGRUN. I am not completely sure about the performance implication but if that's negligible, this could be a good feature to have

Proposal Update

If a user could name the DAGRUN

AI.DAGRUN dag_name MODELRUN ...

Advantages

  • User can use that name in the future in DAGRUNASYNC call
  • If user has a requirement where his DAG can't be completed with single command, he could use the name to refer to previous DAGRUN instance to access the volatile keys (it won't be volatile then :(, and it won't be removed untill user closes the DAGRUN instance) and run the same partially completed dag later. A DAGRUN command by default will close right after the execution of the command but if a user supplies a flag (like -k, k for keep), the volatile keys stays in the memory and will be deleted only when explicitly closed the DAGRUN instance.

An Example

my_dag will be available after this run

AI.DAGRUN my_dag -k TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~

my_dag will be available after this run

AI.DAGRUN my_dag -k TENSORSET ~d~ FLOAT 2 VALUES 3 4 => \
          MODELRUN bar INPUTS ~c~ ~d~ OUTPUTS ~e~

my_dag will be deleted and won't be available after this run since -k flag is not present

AI.DAGRUN my_dag MODELRUN baz INPUTS ~e~ OUTPUTS f => \
          TENSORGET f

@lantiga
Copy link
Contributor Author

lantiga commented Mar 13, 2019

Thank you @hhsecond, I'm going to think about this. One essential feature of DAG to me is that it has to be isolated: it has to clearly fail or succeed as a whole, volatile vars should not be visible to other clients or to other calls, the client should block for the duration of the DAG, etc.

This proposal adds some complexity, as we need to have the DAG persist in key-space itself if we want it to survive a failure, etc. So mainly the failure modes and AOF/replication need further consideration.

What we could do is proceed without naming the DAG for now (there's enough to implement) and keep options open using a slightly different syntax in the future

AI.DAGRUN NAME my_dag MODELRUN ...

This way we'll know if the DAG is named or not.

Note

BTW, important design point: regarding fail or succeed, for non-volatile keys we should treat them as volatile and then write them upon unblocking. This way if you run

AI.DAGRUN TENSORSET a FLOAT 2 VALUES 2 3 => MODELRUN foo INPUTS a OUTPUTS b => TENSORGET b

a should be treated as if it were volatile and should be written to keyspace (together with b) right before the DAG returns. This way we can handle failure modes more cleanly, without dangling intermediate keys if something goes wrong within the DAG.

@hhsecond
Copy link

That makes sense, let's do that. Just out of curiosity, why do you think the volatile vars should not be visible to other calls -> because of the complexity you mentioned above or are there other reasons as well?

@lantiga
Copy link
Contributor Author

lantiga commented Mar 14, 2019

Making them persistent introduces a few complications that I'd rather not address right now:

  • when do we clean them?
  • do they need to be visibile to other clients? If not, what happens if a client reconnects, do we need a mechanism to identify that that client is still the same client?
  • what happens to AOF, serialization in general and replication? If the server goes down what happens to volatiles?

Making them live within a command gives us a lot of power for very little added complication, with no impact on Redis as we know it.

@K-Jo K-Jo added this to To do in RedisAI 1.0 GA Apr 11, 2019
@K-Jo K-Jo modified the milestones: 0.2.0, 1.0 Apr 11, 2019
@lantiga lantiga added the Large label May 22, 2019
@lantiga lantiga removed this from the 1.0 milestone May 22, 2019
@lantiga lantiga removed this from To do in RedisAI 1.0 GA May 23, 2019
@K-Jo K-Jo added this to In progress in RedisAI 1.0 GA Dec 23, 2019
@lantiga lantiga moved this from In progress to To do in RedisAI 1.0 GA Dec 26, 2019
@filipecosta90
Copy link
Collaborator

filipecosta90 commented Apr 8, 2020

@lantiga @K-Jo @gkorland @hhsecond following the discussed today, bringing this subject up with the minimum POC required:

DAGRUN (1 level depth ) Design Proposal

Specify a direct acyclic graph of operations ( for now we only allow chaining ) to run within redisai.

AI.DAGRUN [PERSIST] [COMMAND1] |> [PERSIST] [COMMAND2] |> [PERSIST] [COMMANDN]

Some considerations:

By default, the results of each operation will only live during the execution of the DAG.
If the user specifies PERSIST on the start of a specific command, we persist it's outputs to the keyspace. If the first argument of argv for that operation is PERSIST the object savings are done via Key opening and call to RedisModule_ModuleTypeSetValue. When PERSIST is not present, object savings are done locally and kept only during the context of the DAG meaning that no output keys are open.

IF COMMAND1 sets a tensor, it can be referenced by any further command on the chaining. The reference order first to look locally, then search for the key in the keyspace.

Chaining operations:

To chain operations, we will use Pipe-forward operator (|>):
Pipe-forward operator lets you pass an intermediate result onto the next function

Commands in scope of the POC:

  • AI.TENSORSET

    • Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
    • Stores in RedisModuleCtx *ctx the tensor
    • If first argument of argv is not PERSIST no key opening an no RedisModule_ModuleTypeSetValue
  • AI.MODELRUN

    • Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
    • Checks first in RedisModuleCtx *ctx for input tensors presence
    • Stores in RedisModuleCtx *ctx the output tensors
    • If first argument of argv is not PERSIST no key opening and no RedisModule_ModuleTypeSetValue
  • AI.TENSORGET

    • Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
    • Checks first in RedisModuleCtx *ctx for tensor presence
    • PERSIST has no effect here

@lantiga
Copy link
Contributor Author

lantiga commented Apr 8, 2020

Hi @filipecosta90 thank you. My observations:

  • the data pipe (|>) analogy could be misleading even for this first round, because it implies "Pipe-forward operator lets you pass an intermediate result onto the next function"; indeed you might be setting two tensors in two separate commands and then run (so it's not that the output of one goes in the other); the separator could still be ok, but what we want here is a sequence of commands with a local scope;
  • we need a way to assign a key to local scope on individual keys, rather than the whole command; for instance, I may want to decide that an output for MODELRUN is returned without hitting keyspace, and the other is stored (e.g. chatbot model with persistent state). This is the reason why I had the wings sigil around the key name. Of course this might be a silly key convention, so let's look for an alternative, but let's make it key-level.
    An alternative could be to declare the local scope at the beginning of the command, as in:
AI.DAGRUN LOCALS foo bar baz |>
    AI.TENSORSET foo ... |>
    AI.TENSORSET bar ... |>
    AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz |>
    AI.TENSORGET baz

@filipecosta90
Copy link
Collaborator

we need a way to assign a key to local scope on individual keys, rather than the whole command;

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

AI.DAGRUN 
    AI.TENSORSET foo ... |>
    AI.TENSORSET bar ... |>
    PERSIST AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz |>
    AI.TENSORGET baz

If we analyze even further the example you gave, modelrun would be using 2 local "volatile" tensors ( foo and bar ) and one tensor present on the keyset ( fiz ).

Regarding the operator to use, anyone that enables us to do a quick separation is good for me :) => is a good one.

@lantiga
Copy link
Contributor Author

lantiga commented Apr 8, 2020

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

For more clarity, I want to have the freedom to identify keys as volatile (I just added volatile here in the name for more clarity):

AI.DAGRUN LOCALS volatile_foo volatile_bar volatile_baz |>
    AI.TENSORSET volatile_foo ... |>
    AI.TENSORSET volatile_bar ... |>
    AI.MODELRUN m INPUTS volatile_foo volatile_bar fiz OUTPUTS ket volatile_baz |>
    AI.TENSORGET volatile_baz

I don't like command-level PERSIST or LOCAL because they impose an unnecessary constraint, as they apply to all the output keys.

I'm leaning strongly towards declaring LOCALS as part of the command. It's very explicit and flexible.

@filipecosta90
Copy link
Collaborator

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

For more clarity, I want to have the freedom to identify keys as volatile (I just added volatile here in the name for more clarity):

AI.DAGRUN LOCALS volatile_foo volatile_bar volatile_baz |>
    AI.TENSORSET volatile_foo ... |>
    AI.TENSORSET volatile_bar ... |>
    AI.MODELRUN m INPUTS volatile_foo volatile_bar fiz OUTPUTS ket volatile_baz |>
    AI.TENSORGET volatile_baz

I don't like command-level PERSIST or LOCAL because they impose an unnecessary constraint, as they apply to all the output keys.

I'm leaning strongly towards declaring LOCALS as part of the command. It's very explicit and flexible.

You're right :) Agree that it brings more freedom and control from command based granularity to output based 👍

@lantiga
Copy link
Contributor Author

lantiga commented Apr 8, 2020

With reference to:

AI.TENSORSET

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Stores in RedisModuleCtx *ctx the tensor
If first argument of argv is not PERSIST no key opening an no RedisModule_ModuleTypeSetValue
AI.MODELRUN

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Checks first in RedisModuleCtx *ctx for input tensors presence
Stores in RedisModuleCtx *ctx the output tensors
If first argument of argv is not PERSIST no key opening and no RedisModule_ModuleTypeSetValue
AI.TENSORGET

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Checks first in RedisModuleCtx *ctx for tensor presence
PERSIST has no effect here

Apart from PERSIST (see above), it all sounds good.

What needs a bit of extra care is the fact that whatever comes after MODELRUN will have to be called from the Run_Reply callback. Since generalizing this may be brittle, I propose to keep an intermediate representation of the DAG in ctx, indicating the done/in progress/todo state, so that Run_Reply can desume what needs to be done next.

This way generalizing to multi-level DAG might not be that much of a deal.

@lantiga lantiga closed this as completed Apr 18, 2020
@lantiga lantiga moved this from In progress to Done in RedisAI 1.0 GA Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants