# Remote manager tutorial
We will see here the engin which is employed by BigDFT to run calculations on a remote supercomputer. After this tutorial you will be able to trigger the usage of the nodes from the local workstation.

The `remotemanager` package will have to be installed on your machine (or in the colab session)

In [9]:
!pip install -U remotemanager

Defaulting to user installation because normal site-packages is not writeable
Collecting remotemanager
  Downloading remotemanager-0.3.3-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 1.2 MB/s eta 0:00:011
Installing collected packages: remotemanager
  Attempting uninstall: remotemanager
    Found existing installation: remotemanager 0.3.2
    Uninstalling remotemanager-0.3.2:
      Successfully uninstalled remotemanager-0.3.2
Successfully installed remotemanager-0.3.3


Begin with the common imports, `Dataset` and `URL`. As these two are always used, they are available from `remotemanager` root.

In [1]:
from remotemanager import Dataset, URL
from remotemanager.serialisation import serialjson

Define a simple function to run:

In [2]:
def basic_function(inp):
    import time
    
    time.sleep(1)
    
    return inp*inp

For the moment, we're just running on our local machines

In [3]:
url = URL()
#url = URL(host='vega') #switch on this after the first walkthrough of this notebook

# The basis concept: Dataset

We here see the main concept of the remotemanager spirit. A function can ran multiple times on a remote machine with different values of its arguments.

This will be useful to control the running of different calculations of more complex data submissions.

As basic setup is done, lets create the Dataset

In [4]:
ds = Dataset(function=basic_function,
             url=url,
             # script='module load Python', # we need python to execute a python function, such script loads it  
             # serialiser=serialjson(),  # needs updating as yaml isn't available
            )

The Dataset stores the _function_, the Runners store the _arguments_

Right now all we have is a function, need to create the args:

In [5]:
values = [1, 3, 7, 50]

for val in values:
    ds.append_run(args={'inp': val})

Now we have all the material required:

- Function
- Connection
- Arguments

Time to run:

Here we run the dataset.

**WARNING**: If you have timeout problems in this part, try to increase the `timeout` value in the URL class attributes.

In [6]:
#ds.url.timeout = 120
#ds.url.max_timeouts = 4
ds.run(force=True)

The below cell is useful to wait on a run function, there are two sections to it:

### `print(ds.run_cmds)`

This checks the commands that were used to launch the command were okay. If there was any errors, you'll see them here.

### `while not ds.all_finished: ...`

This block waits for the dataset to be completed. `Dataset.all_finished` only returns `True` when all the runners are completed.

Note: You can also use `Dataset.is_finished` to see the state on a per-runner basis.

**WARNING (for the exercise)**: If you have an error like 'python not found' here, define a script header in the dataset.

In [7]:
print(ds.run_cmds)

import time
while not ds.all_finished:
    print('dataset not finished yet, sleeping for 1s')
    time.sleep(1)

[]


If we've made it through the wait block then we must have results, let's fetch them:

In [8]:
ds.fetch_results()

Now they're fetched, we can access them via the `results` property without having to talk to the remote again:

In [9]:
ds.results

[1, 9, 49, 2500]

In [10]:
ds.runners[0].history

{'2022-11-16 11:44:19/0': 'created',
 '2022-11-16 11:49:34/0': 'submitted',
 '2022-11-16 12:45:25/0': 'submitted',
 '2022-11-16 12:59:01/0': 'submitted',
 '2022-11-16 13:01:23/0': 'submitted',
 '2022-11-16 13:04:30/0': 'submitted',
 '2022-11-16 14:30:01/0': 'submitted',
 '2022-11-16 14:30:39/0': 'resultfile created remotely',
 '2022-11-16 14:32:24/0': 'completed'}

# Exercise
Run this simple function on vega.
Change the URL to `vega` (it should be enough to put in the URL command the same host you put for your `ssh` connection - e.g. `ssh vega`).
Pay attention to the various steps of the procedure.

**Important**: contact us in case you are using a setup that would not work for the excercise. You can run the following tutorials also on google colab in case there are difficulties in installing a jupyter notebook environment on your workstation.