## Tutorial
### Quick start
Create pipeline task with task decorators:

In [None]:
import sys; sys.path.append("..")

import dagpipe


@dagpipe.task()
def run_a(param):
    return f"Output of A with {param}"

@dagpipe.task()
def run_b(a):
    return f"Output of B with {a}"

@dagpipe.task()
def run_c(b):
    return f"Output of C with {b}"

Link them:

In [None]:
a = run_a(param="Initial input for A")
b = run_b(a)
c = run_c(b)

Define pipeline:

In [None]:
simple_pipeline = dagpipe.Pipeline(
  inputs=a,
  outputs=c
)

In [None]:
simple_pipeline.run()

Visualize:  


*Note: This step would only works if you will have properly installed Graphviz on your computer.*

In [None]:
dagpipe.visualize(simple_pipeline)

### Functionality overview

#### Task
Two types of tasks are supported:
1. Function task defined with `task` decorator

In [None]:
@dagpipe.task()
def A(param):
    return f"Output of A with {param}"

@dagpipe.task()
def B(a):
    return f"Output of B with {a}"

@dagpipe.task()
def C(b):
    return f"Output of C with {b}"

@dagpipe.task()
def D(a, c):
    return f"Output of D with {a} and {c}"

2. Method tasks defined with `method_task` decorator

In [None]:
class ExampleClass:
    @dagpipe.method_task()
    def E(self, d):
        return f"Output of E with {d}"

    @dagpipe.method_task()
    def F(self, e):
        return f"Output of F with {e}"

    @dagpipe.method_task()
    def G(self, inp):
        return f"Output of G with {inp}"

    @dagpipe.method_task()
    def __call__(self, inp):
        return f"Output from Exampleclass with {inp}"

Tasks flow is defined in very intuitive way (like normal functions execution).  
Note that:
1. You can mix in single pipeline tasks with methods tasks.
2. You can put more tasks to single task   

In [None]:
example = ExampleClass()

a = A(param="Initial input for A")
b = B(a)
c = C(b)
d = D(a, c)
e = example.E(d)
f = example.F(e)
g = example.G(c)
ec = example(g)

#### Pipeline
To define pipeline simply pass input and outputs to Pipeline.  
You can define as many output as you want, but only one input.


In [None]:
pipeline = dagpipe.Pipeline(inputs=a, outputs=[f, ec])

You can run the same pipeline with different parameters. Your result would be a list with values for each defined output.

In [None]:
result = pipeline.run()
print(result)

new_result = pipeline.run(param="## New input for A ##")
print(new_result)

new_result2 = pipeline.run(param="@@ New input for A @@")
print(new_result2)


But be aware, that when you run it again without parameters it will use last defined parameter

In [None]:
result = pipeline.run()
print(result)

`visualize` function create `matplotlib` plot that you can customize.  

Alternatively you can save visualization fo file instead of plotting, by specifying parameter to_file: `dagpipe.visualize(pipeline, to_file="path/to/file")`


Tasks are named with theirs functions names. Method tasks like `ClassName.function_name`, but there is one special case: `__call__` method is named only with class name.
  

In [None]:
import matplotlib.pyplot as plt


dagpipe.visualize(pipeline)
plt.title("Example pipeline")
plt.gcf().set_size_inches(4, 6)

In [None]:
pipeline.tasks[0].name = "AAA"
dagpipe.visualize(pipeline)

In [None]:
import inspect
class Foo2:
    def __call__(self, inp):
        return inp
    
    def x(self, inp):
        return inp
    
def self_evaluated(self):
    return self


"(self" in str(inspect.signature(Foo2.x))
"(self" in str(inspect.signature(self_evaluated))

#### Output split
You have to specify `outputs_num` in Task decorator if you want to return more than one output

In [None]:
from typing import Any

@dagpipe.task()
def do_nothing(inp):
    return inp

@dagpipe.task(outputs_num=2)
def split(inp):
    print("runnin")
    return inp[0], inp[1]

@dagpipe.task()
def duplicate(inp):
    return inp*2

@dagpipe.task()
def concat(inp1, inp2):
    return inp1 + inp2

x1 = do_nothing(Any)
x2, x3 = split(x1)
x5 = duplicate(x3)
x6 = concat(x2, x5)

p = dagpipe.Pipeline(x1, [x6])

dagpipe.visualize(p)
p.run(["*first element*", "@second element@"])

#### Set name
you can set name during flow definition with `set_name` Task method. If you need to name more than one task, just pass list of names.

In [None]:
x1 = do_nothing(Any).set_name("Input Node")
x2, x3 = split(x1).set_name(["Split - left side", "Split - right side"])
x5 = duplicate(x3)
x6 = concat(x2, x5)

p = dagpipe.Pipeline(x1, [x6])

dagpipe.visualize(p)

#### Sequential pipe
if your flow is really straightforward you can define it as `sequence`

In [None]:
p = dagpipe.Pipeline.sequential([
    do_nothing,
    duplicate,
    duplicate,
])
dagpipe.visualize(p)
p.run("1")

#### conditional_stops
You can stop pipeline execution conditionally at any step if you specify `conditional_stops` argument in pipeline definition. It is a dict where key is `Task.name` and value is function that would take Task output and will convert it to boolean value. If True returned pipeline execution would be stopped and given task output returned with `StoppingTaskHolder` as second list element.

In [None]:
def is_string(x):
    return isinstance(x, str)

p = dagpipe.Pipeline.sequential(
    tasks_sequence=[do_nothing, duplicate, duplicate],
    conditional_stops={"do_nothing": is_string}
)

print("noramlr run", p.run(5))
print("stopped run", p.run("5"))

outputs = p.run("5")
if dagpipe.StoppingTaskHolder.in_(outputs):
    print("wrong input format")

dagpipe.visualize(p)

#### StoppingTaskHolder
you can easily filter out runs that stopped early with `StoppingTaskHolder.in_` method. 

In [None]:
outputs = p.run("5")
if dagpipe.StoppingTaskHolder.in_(outputs):
    print("wrong input format")


### Dig deeper 
`dagpipe.task` decorator change function behavior in a way that when it is called it saves base function with its arguments to `Task` object instead of calling it. 

In [None]:
@dagpipe.task()
def do_nothing(x):
  return x

x = 1
print(f"x type before foo processing - {type(x)}")
x = do_nothing(x)
print(f"x type after  foo processing - {type(x)}")

You can check values that are stored by Task x:

In [None]:
print("Stored function:", x.func)
print("Stored args: \t", x.args)
print("Stored kwargs: \t", x.kwargs)

If you want run stored function, it is possible with `run` method, but normally `Pipeline` object would do it for you.

In [None]:
x.run()

Task stores information about input arguments that that are provided to it, so when you put another Task as input to your task, it will have information about function that should be run before.  

In [None]:
@dagpipe.task()
def foo2(x):
  return x

inp = 1
x = do_nothing(inp)
x2 = foo2(x)

print(f"input arg for x2 is '{x2.args[0]}' which type is {type(x2.args[0])}")

Basically tuple (*task*, *task argument*) is edge of directional computing graph. `dagpipe.Pipeline` collect information about all edges, and sort them in right execution order. After that you can run whole pipeline. 

In [None]:
pipeline = dagpipe.Pipeline(input=x, outputs=[x2])
pipeline.run()