# Quickstart

In this tutorial, we will introduce the `remotemanager` library. Previously, we have seen how PyBigDFT can be useful for setting up and running BigDFT calculations. Unfortunately, most calculations we want to run are too computationally demanding for your workstation. This isn't limited to BigDFT runs: you may want to easily access a more powerful machine for pre/post-processing. 

The `remotemanager` library is designed with this in mind. What `remotemanager` does it allows you to run any python function you define on a remote computer. This will allow you to easily mix the interactive experience of your Jupyter notebook with the power of a supercomputer.

## Installation

Installation can be done via a pip install:

`pip install remotemanager` for the most recent stable version.

We will also get some other goodies for this notebook.

In [None]:
! pip install -U remotemanager
! pip install -q requests
! pip install -q scipy
! pip install -q jsonpickle
! pip install -q dill

Defaulting to user installation because normal site-packages is not writeable
Collecting remotemanager
  Downloading remotemanager-0.6.1-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: remotemanager
  Attempting uninstall: remotemanager
    Found existing installation: remotemanager 0.5.17
    Uninstalling remotemanager-0.5.17:
      Successfully uninstalled remotemanager-0.5.17
Successfully installed remotemanager-0.6.1


## Computer Creation
First, we need to define the remote machine we are interested in running on. To do so, we use the URL class. In the simplest case, we can define our own workstation as the computer.

In [1]:
# Google Colab seems far too verbose so we turn down the logging...
from remotemanager import Logger
Logger.level = "CRITICAL"

In [2]:
from remotemanager import URL
connection = URL(host='localhost')

This example connection is simply pointed at `localhost`, however you may define a connection to a machine with address or IP:

`connection = URL(user='username', host='remote.connection.address')`

`connection = URL(user='username', host='192.168.123.456')`

.. note::
    The only requirement for `URL` to function is that you must be able to ssh into the remote machine without any additional prompts from the remote. For connection difficulties regarding permissions, see the [relevant section](../Introduction.html#Connecting-to-a-Remote-Machine) of the introduction.
    
We can also access some of the predefined computers which contain some predefined options and environment settings. You can further [build your own custom computer](https://l_sim.gitlab.io/remotemanager/tutorials/Submitting%20Via%20Scheduler.html).

In [3]:
from remotemanager.connection.computers.base import BaseComputer
archer_connection = BaseComputer.from_repo(name = "archer2")
archer_connection.mpi = 16
archer_connection.omp = 8
archer_connection.time = 100

polling url https://gitlab.com/l_sim/remotemanager-computers/-/raw/main/storage/archer2.yaml
Grabbed file 'archer2.yaml'


Computer classes automatically generate jobscripts which are used to run your python function.

In [4]:
print(archer_connection.script())

128 total mpi requested
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=8
#SBATCH --walltime=00:01:40
#SBATCH --ntasks=16
#SBATCH --job-name=test_job
#SBATCH --qos=standard
#SBATCH --export=none



For basic commands, url provides a `cmd` method, which will execute any strings given

In [6]:
connection.raise_errors=False

In [7]:
connection.cmd('hostname')

bigdft/sdk:oneapi2023

## Datasets
The `remotemanager` library is able to execute user defined python functions at the location of choice. Below is a basic function example which will serve our purposes for this guide

.. note::
    The function must stand by itself when running, so any imports or necessary functionality should be contained within

In [8]:
def multiply(a, b):
    import time
    
    time.sleep(1)
    
    return a * b

This function would run just fine on any workstation, however imagine that the function is something significantly more demanding. We would need to connect to some more powerful resources for this.

For function execution, we require a `Dataset`. Think of this dataset as a container for your function, with calculations to be added later on.

Like `URL`, this can be imported directly from `remotemanager`

To create a dataset, the only requirement is a callable function object. You must pass this object to the Dataset

.. note::
    When passing a function to the dataset, do not call it within the assigment. For example, for our multiply function, we should pass `function=multiply` _not_ `function=multiply()`

Here we are additionally specifying the `local_dir` and the `remote_dir`, which tells the Dataset where to put all relevant files on the local and remote machines, respectively.

If it suits your workflow, you can additionally specify a `run_dir` when appending a run. This is an additional folder within `remote_dir` where the script will be executed from. Thus, any files created by your function will be placed here.

In [9]:
from remotemanager import Dataset

ds = Dataset(function=multiply,
             url=connection,
             local_dir='temp_local',
             remote_dir='temp_remote')



### Creating runs

As the dataset is simply a container for the function, it is essentially useless in this state. To get some use out of it, we must append some runs.

To do this we use the `Dataset.append_run()` method. This will take the arguments in `dict` format, and store them for later.

You may do this in any way you see fit, the important part is to pass a dictionary which contains all ncessary arguments for the running of your function:

In [10]:
runs = [[21, 2],
        [64, 8],
        [10, 7]]

for run in runs:
    
    a = run[0]
    b = run[1]
    
    arguments = {'a': a, 'b': b}
    
    ds.append_run(arguments=arguments)

runner runner-3 already exists
runner runner-3 already exists
runner runner-3 already exists


### Running and Retrieving your results

Now we have created a dataset and appended some runs, we can launch the calculations. This is done via the Dataset.run() method

Once the runs have completed, you can retrieve your results with `ds.fetch_results()`, and access them via `ds.results` once this is done

.. note::
    Be aware that the `fetch_results` method does not return your results, simply stores them in the `results` property.

In [11]:
ds.run()

assessing run for runner dataset-62eb4971-runner-0... checks passed, running
assessing run for runner dataset-62eb4971-runner-1... checks passed, running
assessing run for runner dataset-62eb4971-runner-2... checks passed, running


RuntimeError: received the following stderr: 
/bin/bash: /opt/intel/oneapi/intelpython/python3.9/lib/libtinfo.so.6: no version information available (required by /bin/bash)


In [None]:
# fetch the results, this loads them into the ds.results property for later access
import time

while not ds.is_finished:
    time.sleep(3)
_ = ds.fetch_results()

In [None]:
# access this property any time after the results have been fetched. 
# This prevents the dataset attempting to poll the remote each time

print(ds.results)

## Magic Commands
For some tasks, the use of a dataset can be overkill and make it hard to read your notebook. For simple remote tasks, we have created the `sanzu` magic commands, which automatically wrap Jupyter cells for remote execution.

In [None]:
%load_ext remotemanager

We define an arbitrary cell in a Jupyter notebook, and decorate it so that it runs on the remote machine. The `%%sanzu` magic takes the same list of arguments you would send to a dataset.

In [None]:
%%sanzu url=connection, remote_dir="rmagic"
from time import sleep
sleep(5)

If you run the cell above again, you will find that the result has been cached, and the function returns instantly. Now let's try a more useful function.

In [None]:
%%sanzu url=connection
%%sargs N = 1000, hermitian=True
%%sargs idx = 0
from numpy.random import rand
from scipy.linalg import eigh

mat = rand(N, N)
if hermitian:
    mat += mat.T

vals, vecs = eigh(mat)
vals[idx]

In this case, we're build a random matrix, and computing its lowest eigenvalue. The `%%sargs` lines were used  In Jupyter, the last line of a cell is automatically returned. We can access this result through the magic dataset.

In [None]:
print(magic_dataset.results)

## Serialization and File Passing
The way that `remotemanager` works is that it serializes whatever objects are passed to file. This can lead to some trip ups.

In [None]:
from uuid import uuid1

In [None]:
%%sanzu
%%sargs x = uuid1()
print(x)

This fails, because the default serializer writes json, and uuid is not a simple type. To pass this kind of type, we need to switch the serialiser to something more flexible. We recommend `jsonpickle` and `dill`.

In [None]:
from remotemanager.serialisation import serialjsonpickle

In [None]:
%%sanzu serialiser = serialjsonpickle()
%%sargs x = uuid1()
x

In [None]:
print(magic_dataset.results)

You need to take care of the remote environment and make sure any type that you send is available there. If the remote computer's Python environment is very limited, sending files back and forth can be a good option.

In [None]:
with open("send_me.txt", "w") as ofile:
    ofile.write("test")

In [None]:
%%sanzu extra_files_send = ["send_me.txt"]
%%sanzu extra_files_recv = ["recv_me.txt"]
with open("send_me.txt") as ifile:
    f = next(ifile)
with open("recv_me.txt", "w") as ofile:
    ofile.write(f + " worked")

In [None]:
with open("temp_runner_local/recv_me.txt") as ifile:
    print(next(ifile))

## Exercise
Install remotemanager on your own machine and access the Saga supercomputer.

In [None]:
def calculate(hgrid):
    from BigDFT.Calculators import SystemCalculator
    from BigDFT.Inputfiles import Inputfile
    from BigDFT.Database.Molecules import get_molecule
    
    sys = get_molecule("N2")
    inp = Inputfile()
    inp.set_hgrid(hgrid)
    calc = SystemCalculator()
    log = calc.run(sys=sys, input=inp, name=str(hgrid))
    
    return log.energy

In [None]:
from remotemanager import URL  # Replace this with the computer of your choice
from remotemanager import Dataset  # Store a set of remote calculations
url = URL()
ds = Dataset(function=calculate, url=url)
for h in [0.3, 0.35, 0.4]:
    ds.append_run({"hgrid": h})
ds.run()

In [None]:
from time import sleep
while not all(ds.is_finished):
    sleep(10)
ds.fetch_results()
print(ds.results)

In [None]:
%%sanzu url=url
%%sargs hgrid = 0.35
from BigDFT.Logfiles import Logfile
log = Logfile("log-" + str(hgrid) + ".yaml")
log.log["Memory Consumption Report"]["Memory occupation"]

In [None]:
print(magic_dataset.results)