
# Example: Character frequencies in the Loremipsum

There are multiple ways to setup and run this example:

1. [Launch notebook on Binder](https://mybinder.org/v2/gh/riga/law/master?filepath=examples%2Floremipsum%2Findex.ipynb)
2. [Static notebook on GitHub](https://github.com/riga/law/blob/master/examples/loremipsum/index.ipynb)
3. Docker: `docker run -ti riga/law:example loremipsum`
4. Local: `source setup.sh`

### Introduction

This example demonstrates how to create and run a simple law task tree.

The actual payload of the tasks is rather trivial. Six different versions of the [lorem ipsum](https://www.lipsum.com) dummy text are fetched from a website. Per version, the character frequencies are measured, and merged and visualized in the end.

**You might want to checkout the implemention of the tasks below in ([tasks.py](https://github.com/riga/law/blob/master/examples/loremipsum/tasks.py)) while executing the notebook.**

Resources: [luigi](http://luigi.readthedocs.io/en/stable), [law](http://law.readthedocs.io/en/latest)

**Before you proceed**, load the law ipython magics:

- `%law`: runs the passed line in a subprocess
- `%ilaw`: runs the passed line interactively in the current process (for tasks defined in notebooks)

Since we do not define any tasks in this notebook, we are fine with `%law`.

In [1]:
import law
law.contrib.load("ipython")
law.ipython.register_magics(init_cmd="source setup.sh", line_cmd="source setup.sh", log_level="INFO")

[0;49;32mINFO[0m: [0;49;39mlaw.contrib.ipython.magic[0m - [0;49;39mrunning initialization command 'source setup.sh'[0m
[0;49;32mINFO[0m: [0;49;39mlaw.contrib.ipython.magic[0m - [0;49;39mmagics successfully registered: %law, %ilaw[0m


This is not specific to law but helps setting up the dependencies (luigi and six) in the example directory of this notebook.

This is equivalent to `source setup.sh` when running the commands in a terminal.

---

### 1. Let law index your the tasks and their parameters (optional)

Note that indexing is only required for auto-completion in the command line and therefore not that important for this notebook. However, it is a convenient feature to show your available tasks and complete their parameters when working with a terminal.

In [2]:
%law index --verbose

indexing tasks in 1 module(s)
loading module 'tasks', [1;49;39mdone[0m

module '[1;49;39mtasks[0m', 4 task(s):
    - [0;49;32mShowFrequencies[0m
    - [0;49;32mFetchLoremIpsum[0m
    - [0;49;32mCountChars[0m
    - [0;49;32mMergeCounts[0m

written 4 task(s) to index file '/law/examples/loremipsum/.law/index'


Besides, while *indexing* always sounds cumbersome, the law index file is just a human-readable file summarizing your tasks, the corresponding python modules, and their parameters. Have a look at the index file if you're interested. Note that the output of the cell below might be hidden.

In [3]:
%law index --show

tasks:ShowFrequencies:fetch-output log-file print-deps print-output print-status remove-output slow
tasks:FetchLoremIpsum:fetch-output file-index log-file print-deps print-output print-status remove-output slow
tasks:CountChars:fetch-output file-index log-file print-deps print-output print-status remove-output slow
tasks:MergeCounts:fetch-output log-file print-deps print-output print-status remove-output slow



### 2. Check the status of the ShowFrequencies task

Now, we want to use the `law run` command for the first time. But to begin with, we add a parameter `--print-status -1` to the command:

In [4]:
%law run ShowFrequencies --print-status -1

print task status with max_depth -1 and target_depth 0

0 > [0;49;32mShowFrequencies[0m([1;49;34mslow[0m=False)
│
└──1 > [0;49;32mMergeCounts[0m([1;49;34mslow[0m=False)
   │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_merged.json)
   │       [1;49;31mabsent[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
   │  │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_1.json)
   │  │       [1;49;31mabsent[0m
   │  │
   │  └──3 > [0;49;32mFetchLoremIpsum[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
   │           [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/loremipsum_1.txt)
   │             [1;49;31mabsent[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=2, [1;49;34mslow[0m=False)
   │  │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=l

You should see that all output targets are absent and no task is complete yet.

Although `law run` was called, no task was actually executed. A few parameters will make law to only print helpful information and then terminate, such as the status of a certain task (`ShowFrequencies` above) and its **recursive** dependencies. The value given to `--print-status` defines the recursion level, where `0` is the task given to `law run` itself.

In [5]:
%law run ShowFrequencies --print-status 0

print task status with max_depth 0 and target_depth 0

0 > [0;49;32mShowFrequencies[0m([1;49;34mslow[0m=False)


Other so-called *interactive* parameters are `--print-deps`, `--print-output`, `--fetch-output` and `--remove-output`. Use the help to find out more about these parameters. Note that the output of the cell below might be hidden.

In [6]:
%law run ShowFrequencies --help

usage: law run [--local-scheduler [CORE_LOCAL_SCHEDULER]]
               [--module CORE_MODULE] [--help [CORE_HELP]]
               [--help-all [CORE_HELP_ALL]]
               [--ShowFrequencies-log-file SHOWFREQUENCIES_LOG_FILE]
               [--log-file LOG_FILE]
               [--ShowFrequencies-print-deps SHOWFREQUENCIES_PRINT_DEPS]
               [--print-deps PRINT_DEPS]
               [--ShowFrequencies-print-status SHOWFREQUENCIES_PRINT_STATUS]
               [--print-status PRINT_STATUS]
               [--ShowFrequencies-print-output SHOWFREQUENCIES_PRINT_OUTPUT]
               [--print-output PRINT_OUTPUT]
               [--ShowFrequencies-remove-output SHOWFREQUENCIES_REMOVE_OUTPUT]
               [--remove-output REMOVE_OUTPUT]
               [--ShowFrequencies-fetch-output SHOWFREQUENCIES_FETCH_OUTPUT]
               [--fetch-output FETCH_OUTPUT]
               [--ShowFrequencies-slow [SHOWFREQUENCIES_SLOW]] [--slow [SLOW]]
               [Required root ta

### 3. Run the ShowFrequencies task

Now we run the task and all its dependencies with a single command.

In [7]:
%law run ShowFrequencies

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mShowFrequencies_False_9ba313c494[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mMergeCounts_False_9ba313c494[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mCountChars_6_False_d07face2c7[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mFetchLoremIpsum_6_False_d07face2c7[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mCountChars_5_False_675cf0b527[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39m

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 27097] Worker Worker(salt=905796400, workers=1, host=the_host, username=marcel, pid=27097) [0;49;32mdone[0m      [1;49;39m[0;49;32mFetchLoremIpsum[0m[0m([1;49;34mfile_index[0m=2, [1;49;34mslow[0m=False)[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mFetchLoremIpsum_2_False_5579299431[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 27097] Worker Worker(salt=905796400, workers=1, host=the_host, username=marcel, pid=27097) [0;49;36mrunning[0m   [1;49;39m[0;49;32mCountChars[0m[0m([1;49;34mfile_index[0m=2, [1;49;34mslow[0m=False)[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 27097] Worker Worker(salt=905796400, workers=1, host=the_host, username=marcel, pid=27097) [0;49;32mdone[0m      [1;49;39m[0;49;32mCountChars[0m[0m([1;49;34mfile_index[0m=2, [1;49;3

The task execution should be successful within a few seconds. You can scroll through the output and read the logs to get a sense of the way luigi is building up the dependency tree, followed by the scheduling of tasks, and eventually closing with an execution summary.

Also, you might want to add the ``--slow`` parameter to make the tasks somewhat slower in order to see the progress logs appearing in the output. This is of course not a feature of law, but only implemented by the tasks in this example 😉.

### 4. Check the status again

As above, we add `--print-status -1` again, to see the task status, represented by the existence of their output targets.

In [8]:
%law run ShowFrequencies --print-status -1

print task status with max_depth -1 and target_depth 0

0 > [0;49;32mShowFrequencies[0m([1;49;34mslow[0m=False)
│
└──1 > [0;49;32mMergeCounts[0m([1;49;34mslow[0m=False)
   │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_merged.json)
   │       [1;49;32mexistent[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
   │  │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_1.json)
   │  │       [1;49;32mexistent[0m
   │  │
   │  └──3 > [0;49;32mFetchLoremIpsum[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
   │           [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/loremipsum_1.txt)
   │             [1;49;32mexistent[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=2, [1;49;34mslow[0m=False)
   │  │     [0;49;36mLocalFileTarget[0m([1;49;34mfs

Note that the `ShowFrequencies` task itself has no outputs. It is run **once**, but every time it is invoked, independent of the presence of a persistent file. The other tasks do have outputs, which we are going to delete in the next step.

### 5. Remove outputs interactively

As mentioned above, another interactive parameter to pass to `law run` commands is `--remove-output`. The passed value is interpreted as the recursion depth of dependent tasks whose output should be removed as well.

However, in order to avoid removing files by mistake, law interactively asks for confirmation before irreversibly removing anything. The prompt looks like this:

```shell
> law run ShowFrequencies --remove-output N

remove task output with max_depth N
removal mode? [i*(interactive), d(dry), a(all)]
```

The default mode (marked with \*) is *interactive* (type 'i'), which means that law traverses the task tree interactively and asks for confirmation on every target. *dry* mode (type 'd') traverses the tree without actually removing anything. The *all* mode should be handled with care. Once you type 'a', the outputs of all tasks down to the requested recursion depth are removed.

To avoid interactive prompts in this example notebook, you can either do (though **not** recommended)

```shell
> echo a | law run ShowFrequencies --remove-output N
```

or add the mode with a comma to the value of the `--remove-output` parameter. Here, we only want to remove the outputs down to the `CountChars` task, i.e., at a depth of 2 (see the task tree above in the `--print-status` outputs). This way, the `FetchLoremIpsum` outputs are preserved.

In [9]:
%law run ShowFrequencies --remove-output 2,a

remove task output with max_depth 2
selected [1;49;34mall mode[0m mode

0 > [0;49;32mShowFrequencies[0m([1;49;34mslow[0m=False)
│
└──1 > [0;49;32mMergeCounts[0m([1;49;34mslow[0m=False)
   │     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_merged.json)
   │       [1;49;31mremoved[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
   │        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_1.json)
   │          [1;49;31mremoved[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=2, [1;49;34mslow[0m=False)
   │        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_2.json)
   │          [1;49;31mremoved[0m
   │
   ├──2 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=3, [1;49;34mslow[0m=False)
   │        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=

Verify your action by printing the status of, let's say, the first `CountChars` task.

In [10]:
%law run CountChars --file-index 1 --print-status -1

print task status with max_depth -1 and target_depth 0

0 > [0;49;32mCountChars[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
│     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/chars_1.json)
│       [1;49;31mabsent[0m
│
└──1 > [0;49;32mFetchLoremIpsum[0m([1;49;34mfile_index[0m=1, [1;49;34mslow[0m=False)
         [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$DATA_PATH/loremipsum_1.txt)
           [1;49;32mexistent[0m
