Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote jobs #6

Merged
merged 6 commits into from
Dec 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,5 +102,68 @@ The former is faster because a new klayout instance is not created, but of cours

Usage examples for klayout and non-klayout layout clients are included in this repo in the "examples" folder.


## Long shot wishlist item: remote jobs
Tests run a lot. Integration tests of entire wafers can take a really long time. Yet they can be parallelized – on the good side of Amdahl's law. This also has relevance to dataprep

Here is the process

1. [laptop] initiates klayout IPC configured for incoming connections (firewalled)
1. [laptop (or CI)] command for test
1. [laptop] sends IPC port info to remote HPC
1. [laptop] rsyncs ref_layouts and the new code, changes only, to the remote computer
- you might also have to synchronize lygadgets, or just do it manually
- can we override rsync's archive detection with geometry instead of binary
- no it is based on file modification time, so we are good
1. [laptop] sends command to remote computer to initiate test
1. [remote HPC] runs tests on all cores using pytest-xdist and klayout tiling
1. [remote HPC] sends progress reports
- pipe pytest output by redirecting stdout?
1. [remote HPC] if there are errors, sends the layout pairs to the IPC server on the laptop
1. [remote HPC] sends some kind of completion signal
1. [laptop] rsyncs run_layouts for further examination
- only the failures?

Following are the steps to enabling this

#### Network IPC (Done)
Run a server on one computer. Configure something in lyipc in the second computer. Send lyipc commands. At first, do load with the gds already on the first computer. Next, combine with rsync and gds on local computer with client.load

#### stdout piping (don't bother)
Test script will look something like
```python
with redirect_stdout():
print('heyo')
```
This script should be initiated by the laptop but run on the HPC.

I got this working, but its not live.

#### remote build (Done)
1. [laptop user] lyipc-job script.py
1. [laptop] rsync script.py
1. [HPC] python script.py
1. [HPC] rsync output.gds

Should this use container functions?

#### file transfer and IPC and lytest (done)
Set some configuration of lytest, which sets some configuration of lyipc. Run `lytest diff file1.gds file2.gds`. These files are shipped to remote. XOR is run there. Error is detected and sent back to the klayout GUI of the first computer. This will involve actual file transfer.

Edit: this did not set anything in lytest. It was a matter of lyipc:`set_target_hostname` and the HPC using `ship_file` to get things back down.

Notes: to send a file for testing, to call commands and get printouts, to rsync (either direction) -- you need a one-way RSA authorization. If you want to run remote tests that pop up in the local GUI, that currently *requires a two-way RSA authorization*. When the HPC is running, its lytest has the ball (kind of). It decides when to send a pair of files to lyipc. Then lyipc notices that it has to ship those files remotely, requiring rsync. Huh, what if the QTcpSocket in lyipc could send a notice back down that said: rsync this thing from me; it is ready.


#### script building
Not yet... just need to sync the ref_layouts.
lytest not only sends a ref_layout but also a script. This scripted layout is built remotely. The XOR is done remotely.


#### tiling
To take full advantage, we eventually want to distribute tiles over cores. At first, we will get good results from xdist alone... when it comes to testing, but not dataprep.

Remember to pip install pytest-xdist. The error message is not helpful.

#### Author: Alex Tait, June 2018
#### National Institute of Standards and Technology, Boulder, CO, USA
3 changes: 2 additions & 1 deletion examples/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.gds
*.gds
remote_jobs
12 changes: 7 additions & 5 deletions lyipc/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@

import lyipc
from lygadgets import isGSI

import socket
import os
from lyipc.client.remotehost import set_target_hostname, get_target_hostname

# Determine which socket class to use
if not isGSI():
Expand All @@ -19,11 +21,11 @@
elif sys.version_info[0] == 3:
import PyQt5.QtNetwork
from PyQt5.QtNetwork import QTcpSocket
localhost = PyQt5.QtNetwork.QHostAddress.LocalHost
# localhost = '127.0.0.1'
else:
from lygadgets import pya
from pya import QTcpSocket
localhost = 'localhost'
# localhost = 'localhost'


def send(message='ping 1234', port=lyipc.PORT):
Expand All @@ -35,7 +37,7 @@ def send(message='ping 1234', port=lyipc.PORT):
psock = QTcpSocket()
if not isGSI():
payload = payload.encode()
psock.connectToHost(localhost, port)
psock.connectToHost(get_target_hostname(incl_user=False), port)
if psock.waitForConnected():
psock.write(payload)
if psock.waitForReadyRead(3000):
Expand All @@ -46,7 +48,7 @@ def send(message='ping 1234', port=lyipc.PORT):
else:
raise Exception('Not acknowledged')
else:
print('Connection Fail! (tried {}:{})'.format(localhost, port))
print('Connection Fail! (tried {}:{})'.format(get_target_hostname(incl_user=False), port))
# psock.close()


Expand Down
4 changes: 4 additions & 0 deletions lyipc/client/general.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
''' These are functions that are independent of the layout package '''
from __future__ import print_function
from lyipc.client import send
from lyipc.client.remotehost import is_host_remote, ship_file
from functools import lru_cache
import os
import time
Expand All @@ -19,7 +20,10 @@ def load(filename, mode=None):
- 1: making a new view
- 2: adding the layout to the current view (mode 2)
'''
# if it's remote, we have to ship the file over first
filename = fast_realpath(filename)
if is_host_remote():
filename = ship_file(filename) # filename has changed to reflect what it is on the remote system
tokens = ['load', filename]
if mode is not None:
tokens.append(str(mode))
Expand Down
91 changes: 91 additions & 0 deletions lyipc/client/remotehost.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import os
import shutil
import socket
import subprocess

target_host = None
def set_target_hostname(hostalias, persist=False):
''' if it is a remote, you must have already set up an RSA key and alias in your ~/.ssh/config file.
On that computer, this computer's RSA key needs to be in ~/.ssh/authorized_keys.
Instructions: https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2

Example: atait@tait-computer

persist means through the terminal session. It sets environment variable
not persist means through this python session. It doesn't last as long. It takes precedence.
'''
global target_host
if hostalias == 'localhost':
hostalias = socket.gethostbyname(hostalias)
if persist:
os.environ['LYIPCTARGET'] = target_host
else:
target_host = hostalias


def get_target_hostname(incl_user=True):
if target_host is not None:
host = target_host
else:
try:
host = os.environ['LYIPCTARGET']
except KeyError:
return socket.gethostbyname('localhost')
if not incl_user:
host = host.split('@')[-1]
return host


# set_target_hostname('localhost')


def is_host_remote():
return get_target_hostname() != socket.gethostbyname('localhost')


def call_report(command, verbose=True):
if verbose:
print = lambda *args: None
print('\n[[' + ' '.join(command) + ']]\n')
try:
ret = subprocess.check_output(command).decode()
except subprocess.CalledProcessError as err:
print(err.output.decode())
raise
else:
print(ret)
return ret


def call_ssh(command):
# command[0] = '"' + command[0]
# command[-1] = command[-1] + '"'
ssh_command = ['ssh', '-qt', get_target_hostname()] # q silences welcome banner, t retains text coloring
ssh_command += command
return call_report(ssh_command)


def host_HOME():
return call_ssh(['echo', '$HOME']).strip()


def rsync(source, dest, verbose=True):
rsync_call = ['rsync', '-avzh']
rsync_call += [source, dest]
call_report(rsync_call, verbose=verbose)


def ship_file(local_file):
''' returns the name of the remote file
This currently assumes that the host has the same operating system separator as this one (e.g. "/")
'''
if not is_host_remote():
return local_file
# where are we going to put it
local_file = os.path.realpath(local_file)
# rel_filepath = os.sep.join(local_file.split(os.sep)[-3:-1]) # pick off a few directories to avoid name clashes
rel_filepath = ''
remote_path = os.path.join('tmp_lypic', rel_filepath)
remote_file = os.path.join(remote_path, os.path.basename(local_file))
rsync(local_file, get_target_hostname() + ':' + remote_path)
return os.path.join(host_HOME(), remote_file)
5 changes: 4 additions & 1 deletion lyipc/interpreter.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
- execute a macro
'''
from __future__ import print_function
from lygadgets import message, isGSI
from lygadgets import message, message_loud, isGSI
import lyipc.server
import os
import traceback
Expand Down Expand Up @@ -55,6 +55,9 @@ def parse_message(msg):
main = pya.Application.instance().main_window()
main.cm_reload()

elif tokens[0] == 'ping':
message_loud('I heard something')

elif tokens[0] == 'load':
filename = os.path.realpath(tokens[1])
message(filename)
Expand Down