# A Crossflow workflow

This notebook illustrates a basic Crossflow workflow, with scatter, parallel processing, and gather steps.

The workflow:

1. Splits an input text file into pieces
2. In parallel, reverses the order of the lines in each piece
3. Stitches the reversed pieces back together

In [None]:
from crossflow import clients, tasks
from pathlib import Path

Start a client that serves a temporary compute cluster that is launched on the current machine:

In [None]:
client = clients.Client()
client.client

Create a text file of 25 lines:

In [None]:
here = Path('.')
input_file = here /'input.txt'
with input_file.open('w') as f:
    for i in range(25):
        f.write('line {}\n'.format(i))

Create the three tasks required: one to split up the initial text file, one to reverse the order of the lines, one to join the pieces back together again.

We are going to use the standard unix `split`, `tail` and `cat` commands, to illustrate how tools usually used via the command line can be converted into Python functions.

**Note**: some flavours of Unix do not support `tail -r`; in such cases `tac` will do the same job.

In [None]:
# Create a SubprocessTask that will split up the input file:
splitter = tasks.SubprocessTask('split -l 5 input.txt')
splitter.set_inputs(['input.txt'])
splitter.set_outputs(['xaa', 'xab', 'xac', 'xad', 'xae'])

# Create a SubprocessTask to reverse the order of the lines in a file:
reverser = tasks.SubprocessTask('tail -r input > output')
#reverser = tasks.SubprocessTask('tac input > output')
reverser.set_inputs(['input'])
reverser.set_outputs(['output'])

# Create a Subprocesstask that will join input files together:
joiner = tasks.SubprocessTask('cat * > output')
joiner.set_inputs(['*'])
joiner.set_outputs(['output'])

Here is the workflow, using the client's .submit() and .map() methods:

In [None]:
# First split the file into pieces:
pieces = client.submit(splitter, input_file)
# 'pieces' is a tuple, convert to a list and process each piece in parallel:
reversed_pieces = client.map(reverser, list(pieces))
# Stitch the reversed pieces back together again:
output = client.submit(joiner, reversed_pieces)

The client returns its outputs as `Futures`, while these can be passed as-is between tasks, when it comes to getting at the final data, you need to call their .result() method:

In [None]:
output_filehandle = output.result()
# print the contents of the output FileHandle:
print(output_filehandle.read_text())