# Making an image of the Mandelbrot set using `htcondor-dags`

## Making a Mandelbrot set image locally

We'll use `goatbrot` (https://github.com/beejjorgensen/goatbrot) to make the image.
It can be run from the command line, and takes a series of options to specify which part of the Mandelbrot set to draw, as well as the properties of the image itself.

`goatbrot` options:
- `-i 1000` The number of iterations.
- `-c 0,0` The center point of the image region.
- `-w 3` The width of the image region.
- `-s 1000,1000` The pixel dimensions of the image.
- `-o test.ppm` The name of the output file to generate.

In [None]:
! ./goatbrot -i 1000 -c 0,0 -w 3 -s 1000,1000 -o test.ppm
! convert test.ppm test.png

## What is the workflow?

We can parallelize this calculation by drawing sub-regions of the full region ("tiles") we want and stitching them together into a single image using `montage`.

In [None]:
from graphviz import Digraph
import itertools

num_tiles_per_side = 2

dot = Digraph()

dot.node('montage')
for x, y in itertools.product(range(num_tiles_per_side), repeat = 2):
    n = f'tile_{x}-{y}'
    dot.node(n)
    dot.edge(n, 'montage')

dot

## Describing `goatbrot` as an HTCondor job

We describe a job using a `Submit` object. 
It corresponds to the submit *file* used by the command line tools.
It mostly behaves like a standard Python dictionary.

In [None]:
import htcondor

tile_description = htcondor.Submit(
    executable = 'goatbrot',
    arguments = '-i 10000 -c $(x),$(y) -w $(w) -s 500,500 -o tile_$(tile_x)-$(tile_y).ppm',
    log = 'mandelbrot.log',
    output = 'goatbrot.out.$(tile_x)_$(tile_y)',
    error = 'goatbrot.err.$(tile_x)_$(tile_y)',
    request_cpus = '1',
    request_memory = '128MB',
    request_disk = '1GB',
)

print(tile_description)

Notice the heavy use of macros to specify the tile.
Those aren't built-in submit macros; instead, we will plan on passing their values in through **itemdata**.
In `htcondor-dags`, itemdata and DAGMan's `VARS` are largely synonymous.

In [None]:
def make_tile_vars(num_tiles_per_side, width = 3):
    width_per_tile = width / num_tiles_per_side
    
    centers = [
        width_per_tile * (n + 0.5 - (num_tiles_per_side / 2)) 
        for n in range(num_tiles_per_side)
    ]
    
    vars = []
    for (tile_y, y), (tile_x, x) in itertools.product(enumerate(centers), repeat = 2):
        var = dict(
            w = width_per_tile,
            x = x,
            y = -y,
            tile_x = str(tile_x).rjust(5, '0'),
            tile_y = str(tile_y).rjust(5, '0'),
        )
        
        vars.append(var)
        
    return vars

In [None]:
tile_vars = make_tile_vars(2)
for var in tile_vars:
    print(var)

## Describing montage as an HTCondor job

Now we can write the `montage` job description. 
The problem is that the arguments and input files depend on how many tiles we have, which we don't know ahead-of-time.
We'll take the brute-force approach of just writing a function that takes the tile `vars` we made in the previous section and using them to build the `montage` job description.

This would be a good place for improvements:
- Removing duplicated knowledge between the information here and in `make_tile_vars`, like the output filename.
- Using fancier submit description magic to handle the input files (although `montage` still cares about the specific order of the files).

In [None]:
def make_montage_description(tile_vars):
    num_tiles_per_side = int(len(tile_vars) ** .5)
    
    input_files = [f'tile_{d["tile_x"]}-{d["tile_y"]}.ppm' for d in tile_vars]
    
    return htcondor.Submit(
        executable = '/usr/bin/montage',
        arguments = f'{" ".join(input_files)} -mode Concatenate -tile {num_tiles_per_side}x{num_tiles_per_side} mandelbrot.png',
        transfer_input_files = ', '.join(input_files),
        log = 'mandelbrot.log',
        output = 'montage.out',
        error = 'montage.err',
        request_cpus = '1',
        request_memory = '128MB',
        request_disk = '1GB',
    )

In [None]:
montage_description = make_montage_description(make_tile_vars(2))

print(montage_description)

## Describing the DAG using `htcondor-dags`

Now that we have the job descriptions, all we have to do is use `htcondor-dags` to tell DAGMan about the dependencies between them.

**Important Concept:** the code from `dag = dags.DAG()` onwards only defines the **topology** (or **structure**) of the DAG. 
The `tile` layer can be flexibly grown or shrunk by adjusting the `tile_vars` without changing the topology, and this can be clearly expressed in the code.
The `tile_vars` are driving the creation of the DAG.

In [None]:
import htcondor_dags as dags

num_tiles_per_side = 2

tile_vars = make_tile_vars(num_tiles_per_side)

dag = dags.DAG()

tile_layer = dag.layer(
    name = 'tile',
    submit_description = tile_description,
    vars = tile_vars,
)

montage_layer = tile_layer.child(
    name = 'montage',
    submit_description = make_montage_description(tile_vars),
)

In [None]:
print(dag.describe())

## Write the DAG to disk

We still need to write the DAG to disk to get DAGMan to work with it.

In [None]:
from pathlib import Path
import shutil

dag_dir = Path('mandelbrot-dag').absolute()

# blow away any old stuff
shutil.rmtree(dag_dir, ignore_errors = True)

dag_file = dag.write(dag_dir)
shutil.copy2('goatbrot', dag_dir)

print('DAG directory:', dag_dir)
print('DAG file:', dag_file)

## Submit the DAG via the bindings

Since `8.9.3`, we can now do the equivalent of `condor_submit_dag` from the Python bindings!  

In [None]:
dag_submit = htcondor.Submit.from_dag(str(dag_file), {'force': 1})

print(dag_submit)

In [None]:
import os
os.chdir(dag_dir)

schedd = htcondor.Schedd()
with schedd.transaction() as txn:
    cluster_id = dag_submit.queue(txn)
    
print(cluster_id)

os.chdir('..')

For crude progress tracking, we can just tail the DAGMan log:

In [None]:
! tail -f mandelbrot-dag/dagfile.dag.dagman.out