# Squaring a Number via HTCondor

This is a testing notebook for thinking about HTCondor job submission from Python.

---

Suppose that you have been given the task of squaring a number, like `2`. You might simply do

In [1]:
2 ** 2

4

Which will, of course, work. However, once the work that we want to perform becomes much more complicated, and once we want to run it somewhere that isn't in our notebook (perhaps because it takes a very long time, or needs more resources, like memory, than we have locally), we need to think more deeply about what we're doing. We will take this simple example of squaring a number and think about how to turn it into a Jupyter Notebook-based workflow that can be run on an HTCondor pool.

HTCondor requires that
1. Our work is wrapped up in single **function** that it can run.
1. The **inputs** to that function are provided as data encoded in a **file**.
1. The **outputs** of that function are produced as data encoded in a **file**.

These requirements are not particularly strict. Any function will do, and any kind of data encoding will do. From this list, we see that our specific task is to
1. Write a function that squares a number that it reads from an input file, and write the result to an output file.
1. Write an input file containing a number.
1. Read an output file containing a number.

Let's begin by rewriting the above computation in a more general way. We will define a Python function that squares numbers.

In [2]:
def square(x):
    return x ** 2

We can use it like this:

In [3]:
x = 2
y = square(x)
print(y)

4


Note that we have explicitly separated the workflow into steps. We define the inputs (`x = 2`), pass them to the function (`square(x)`), and then retrieve output from the function (`y = `). This separation is critical, because it lets us replace individual steps with other methods, as we'll do below.

Based on our above guidelines, we know that what we really need to do is read and write the input and output from files. The Python standard library's `pathlib` module provides very convenient ways to write and read files:

In [4]:
from pathlib import Path

In [5]:
test_file = Path('test')
test_file.write_text('Hello world!')  # this writes "Hello world!" to the file
test_file.read_text()                 # this reads the text back from the file

'Hello world!'

We can store a number in a file by turning it into a string when we write it, then turning it back into an `int` when we read it out:

In [6]:
number_file = Path('number_test')
number_file.write_text(str(5))
number = int(number_file.read_text())
print(number, type(number))

5 <class 'int'>


Now that we know how to write files, we can write a **wrapper** function around `square` that lets it take input from a file and write output to a file. We will pass in both files as `Path` objects, like we used above.

In [7]:
def square_wrapper(input_file, output_file):
    x = int(input_file.read_text())
    
    y = square(x)
    
    output_file.write_text(str(y))

Let's test that it works:

In [8]:
x = 2

input_file = Path('input')
input_file.write_text(str(x))

output_file = Path('output')

In [9]:
square_wrapper(input_file, output_file)

In [10]:
y = int(output_file.read_text())
print(y)

4


Now we have a workflow that satisfies HTCondor's requirements. We will now import the tool that will let us run this workflow on HTCondor.

In [11]:
from htcondor_job import Task, TaskState

The `Task` object represents the work that we want done. To make a `Task`, we need to give it two things: the function to run, and the input file.

In [12]:
task = Task(
    function = square_wrapper,
    input_file = input_file,
)
task

Task [TaskState.Unsubmitted] square_wrapper(input)

Note that the task is in the `Unsubmitted` state. We have not yet told HTCondor to actually run the task. To do so, we `submit` the task. HTCondor will then schedule it for execution.

In [13]:
task.submit()

Task [TaskState.Unsubmitted] square_wrapper(input)

The state of a `Task` is available through the attribute `Task.state`. This attribute will be updated in the background for you.

In [14]:
possible_states = "\n  ".join(str(t) for t in TaskState)
print(f'The possible task states are:\n  {possible_states}\n')
print(f'The current state of task is {task.state}')

The possible task states are:
  TaskState.Unsubmitted
  TaskState.Idle
  TaskState.Running
  TaskState.Submitted
  TaskState.Held
  TaskState.Completed
  TaskState.Removed

The current state of task is TaskState.Unsubmitted


Wait for completion:

In [15]:
import time

while task.state is not TaskState.Completed:
    print(task.state)
    time.sleep(1)
print(task.state)   # print out the final state

TaskState.Unsubmitted
TaskState.Idle
TaskState.Idle
TaskState.Idle
TaskState.Idle
TaskState.Idle
TaskState.Running
TaskState.Completed


Read the task's output file:

In [16]:
y = int(task.output_file.read_text())
print(y)

4


If we put it all together in one cell, we can write out workflow like this:

In [17]:
x = 2

input_file = Path('input')
input_file.write_text(str(x))

task = Task(
    function = square_wrapper,
    input_file = input_file,
)
task.submit()

while task.state is not TaskState.Completed:
    time.sleep(1)
    
y = int(task.output_file.read_text())
print(y)

4


Note how this looks like a "blown-up" version of the original local workflow:

In [18]:
x = 2
y = square(x)
print(y)

4
