# Numpy Example of Graph Construction and Execution

## Graph Construction

Source nodes for the graph are created with the `source` function taking arguments an array of `Payload` for each node to be created and a list of dimension names. 

In [1]:
import numpy as np 
from cascade.fluent import Action, Payload, source

payloads = np.empty((4, 5), dtype=object)
payloads[:] = Payload(np.random.rand, [2, 3])
start = source(payloads, ["x", "y"])
start

<cascade.fluent.MultiAction at 0x14ccfedf0cd0>

Plotting the graph now, we will have 20 nodes

In [2]:
from cascade.graph.pyvis import to_pyvis

net_graph = to_pyvis(start.graph(), notebook=True)
net_graph.show("start.html")



With the functions such as `mean`, `std`, `minimum`, `maximum` etc we can reduce the array of nodes along a specified dimension
all the way down to a single node

In [3]:
single = start.mean("x").minimum("y")

From our initial payload in creating the source nodes, we chose a random array of shape (2, 3) inside each node. We can use `expand` to expose one of these internal dimensions into the array of nodes. To do this we need specify a new name for the dimension, its size and the axis of the internal array to take values from. After the operation, internally in each node we have arrays of shape (2,).

In [4]:
expanded = single.expand("z", 3, 1, 0)
expanded.nodes

We can broadcast to match the shape of another existing set of nodes, which in this case creates duplicates of the single existing 
node along the z dimension. Note, this is an operation purely on the nodes of the graph and no operations are performed to the underlying arrays in each node.

In [5]:
single.broadcast(expanded).nodes

### Low Level Operations

There are various low level functions `map`, `reduce`, and `transform` which allow the application of user-defined functions onto the array of nodes. The `map` operation applies a single payload to all nodes, or if a array of payloads is provided of the same shape as the array of nodes, then each node will get a unique payload applied to it.

In [6]:
# Single payload that scales the array in each node by 2
expanded.map(Payload(lambda x: x * 2)).nodes

In [7]:
# Or we can scale the array in each node by a different value 
mapped = expanded.map([Payload(lambda x, a: x * a) for x in range(1, 4)])

Arbitrary reduction operations can be applied with the `reduce` operation, which takes arguments `Payload` and the name of the dimension the reduction should be performed along. If no dimension name is supplied, the reduction is performed along the first axis. The higher level functions `mean`, `std`, `minimum`, `maximum` are all `reduce` operations with a pre-defined payload.

Finally, we have `transform` which allows the shape of the array of nodes in the subsequent action to be changed. The operation takes 
- a function of the form `func(action: Action, arg: Any) -> Action`
- a list of values for `arg` 
- a name for new dimension along which `arg` varies

The resulting nodes will be output of `func` with the different values of `arg`.

In [8]:
def _example_transform(action: Action, threshold: float) -> Action:
    ret = action.map(Payload(lambda x: x if x > threshold else 0))
    ret._add_dimension("threshold", threshold)
    return ret

mapped.transform(_example_transform, [0, 1, 10], "threshold").nodes

## Graph Execution

Example graph composed by combining the various operations detailed in the Graph Construction section

In [9]:
import numpy as np 
from cascade.fluent import Payload, source

payloads = np.empty((4, 5), dtype=object)
payloads[:] = Payload(np.random.rand, [2, 3])
graph = (
    source(payloads, ["x", "y"])
    .mean("x")
    .minimum("y")
    .expand("z", 3, 1, 0)
    .map([Payload(lambda x, a=a: x * a) for a in range(1, 4)])
    .map(Payload(lambda x: print("RESULT", x)))
    .graph()
)

At the moment, the graphs can be executed using Dask but the Dask dynamic scheduling.

In [10]:
import os
from cascade.executor import DaskExecutor
from cascade.scheduler import Schedule

os.environ["DASK_LOGGING__DISTRIBUTED"]="debug"
schedule = Schedule(graph, None, {})
executor = DaskExecutor(schedule)
executor.execute(memory_limit="10GB", n_workers=1, threads_per_worker=1)

2023-12-07 14:40:31,559 - distributed.system - DEBUG - Setting system memory limit based on cgroup value defined in /sys/fs/cgroup/memory/memory.soft_limit_in_bytes


RESULT [1.46478444 0.65478046]
RESULT [0.59523715 2.82901838]
RESULT [0.56622669 0.81399934]


2023-12-07 14:40:32,151 - distributed.worker - DEBUG - Attempted to close worker that is already Status.closed. Reason: worker-close
