# Run ATS Demo 01 on NERSC (Cori)

## Before running this notebook

You must first **use the nersc_login.ipynb notebook to login to NERSC**, which saves a "NEWT session id" to your home directory (`~/.newt_sessionid.txt`). That file is read in when using this notebook. The session id typically expires every 12-14 days.

In [1]:
import os

# Change working dir so that we can import local module
working_dir = os.path.join(os.getcwd(), os.pardir)
os.chdir(working_dir)
print('Working directory {}'.format(working_dir))
from reshpc import nersc_interface as rn


Working directory /home/john/projects/resonant-hpc/git/resonantrpc/dev/jupyter/notebooks/..


In [2]:
nersc = rn.NerscInterface()
nersc.login()

sending command...


'OK'

In [3]:
# Generate new folder name yymmdd_hhmm
import datetime
now = datetime.datetime.now()
datetime_code = now.strftime('%y%m%d-%H%M')
datetime_code

'201023-1055'

In [4]:
# Generate path on Cori $SCRATCH folder
scratch_folder = nersc.get_scratch_folder()
nersc_folder = '{}/reshpc/demo01/{}'.format(scratch_folder, datetime_code)
nersc_folder

sending command...


'/global/cscratch1/sd/johnt/reshpc/demo01/201023-1055'

In [5]:
nersc.make_folder(nersc_folder)

sending command...


'OK'

In [6]:
# Upload demo 01 baseline file
xml_file = 'demo_01.xml'
local_file = os.path.join(working_dir, 'data', xml_file)
assert os.path.exists(local_file)

nersc.upload_file(local_file, nersc_folder)

sending command...


'OK'

In [7]:
# Specify SLURM script
ats = '/project/projectdirs/m2398/ideas/ats/install/cori/ats-0.88-basic/RelWithDebInfo/PrgEnv-gnu-6.0.5/bin/ats'
timeout_min = 5
nodes = 1
cores_per_node = 1

slurm_commands = [
    '#!/bin/bash',
    '#SBATCH --account=m2398',
    '#SBATCH --chdir={}'.format(nersc_folder),
    '#SBATCH --partition=debug',
    '#SBATCH --time=0:{}:0'.format(timeout_min),
    '#SBATCH --nodes={}'.format(nodes),
    '#SBATCH --tasks-per-node={}'.format(cores_per_node),
    '#SBATCH --constraint=haswell',
    'ulimit -s unlimited',
    'srun {} --xml_file={}'.format(ats, xml_file),
    ''
]
slurm_string = '\n'.join(slurm_commands)

In [8]:
# Submit job
result = nersc.submit_job(slurm_string, nersc_folder)
print(result)

uploading slurm script...
submitting job...
{'status': 'OK', 'jobid': '35432279', 'error': ''}


## Check job status

If the job was submitted OK, use the next cell to monitor its status. Execute the next cell until you see that the job has finished processing.


If the job successfully runs, the Job State will typically traverse these values:

| Job State Code | Description|
|--------|------------|
| PD | pending |
| R  | running |
| CG | completing |
| CD | completed |

If problems occur, the most relevant Job State codes are

| Job State Code | Description|
|--------|------------|
| F   | failed |
| RNF | node failed |
| OOM | out of memory |
| TO  | timeout |

All of the SLURM job state codes are listed at https://slurm.schedmd.com/squeue.html#lbAG


In [10]:
# Execute this cell until the job state is complete,
# generally this means the job state is NOT one of "PD", "R", "CG".

job_id = result.get('jobid')
if job_id:
    info = nersc.get_job_info(job_id, verbose=True)
#     print(info)
    state = info.get('status')
    print('Job State: {}'.format(state))
    if state in ['PD', 'R', 'CG']:
        print('NOT done => continue checking')

sending command...
Job State: CD


In [11]:
# Once the job is done, list the contents of the NERSC folder
# nersc_folder = '/global/cscratch1/sd/johnt/reshpc/201021-114130'
for item in nersc.list_folder(nersc_folder, glob_pattern='*_mesh.*'):
    name = item.get('name')
    size = int(item.get('size', 0))
    row = '{:8}  {}'.format(size, name)
    print(row)

sending command...
     255  visdump_mesh.VisIt.xmf
   20496  visdump_mesh.h5
     596  visdump_mesh.h5.0.xmf


In [13]:
# Download the visdump files to a local folder
# With the current api, we explicitly generate a tgz file on Cori then download and untar it.

tarfile_name = 'data.tgz'
nersc_tarfile = '{}/reshpc/{}'.format(scratch_folder, tarfile_name)
nersc.make_tgzfile(nersc_folder, glob_pattern='*_mesh.*', tarfile=nersc_tarfile)

# Create local data folder and download tar file
local_folder = os.path.expanduser('~/.reshpc/data/{}'.format(datetime_code))
if os.path.exists(local_folder):
    shutil.rmtree(local_folder)
os.makedirs(local_folder)
nersc.download_file(nersc_tarfile, local_folder)


sending command...
sending command...


'OK'

In [16]:
# Expand tarfile
import tarfile

# tarfile_name = 'data.tgz'
local_tarfile = os.path.join(local_folder, tarfile_name)
tar = tarfile.open(local_tarfile)
tar.extractall(local_folder)
tar.close()

# Optional
# os.remove(local_tarfile)


In [17]:
# List downloaded files
dirs = os.listdir(local_folder)
dirs.sort()
for file in dirs:
    print(file)


data.tgz
visdump_mesh.VisIt.xmf
visdump_mesh.h5
visdump_mesh.h5.0.xmf
