diff --git a/content/docs/batch-computing/lotus-cluster-specification.md b/content/docs/batch-computing/lotus-cluster-specification.md index 433c2d3bf..ac5f9264f 100644 --- a/content/docs/batch-computing/lotus-cluster-specification.md +++ b/content/docs/batch-computing/lotus-cluster-specification.md @@ -1,6 +1,5 @@ --- aliases: /article/4932-lotus-cluster-specification -date: 2023-03-13 13:50:34 description: LOTUS cluster specification slug: lotus-cluster-specification tags: @@ -9,7 +8,7 @@ tags: title: LOTUS cluster specification --- -## LOTUS nodes +## Current cluster specification LOTUS is a cluster of over 300 nodes/hosts and 19000 CPU cores. A node/host is an individual computer in the cluster with more than 1 processor. Each @@ -17,8 +16,21 @@ node/host belongs to a specific host group. The number of processors (CPUs or cores) per host is listed in Table 1 with the corresponding processor model and the size of the physical memory RAM available per node/host. +**Table 1**. LOTUS cluster specification + +**Current** host groups + +Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM +---|---|---|---|--- +broadwell256G | 37 | Intel Xeon E5-2640-v4 "Broadwell" | 20 | 256 GB +skylake348G | 151 | Intel Xeon Gold-5118 "Skylake" | 24 | 348 GB +epyctwo1024G | 200 | AMD | 48 | 1024 GB | +{.table .table-striped} + +## Selection of specific processor model + To select a node/host with a specific processor model and memory, add the -following SLURM directive to your job script +following Slurm directive to your job script ```bash #SBATCH --constraint="" @@ -31,27 +43,24 @@ For example ``` {{< alert type="info" >}} -`intel` and `amd` node types are defined in the SLURM configuration as a feature: - - * For any Intel node type use `#SBATCH --constraint="intel"` - * For a specific Intel CPU model use the host group name (see Table 1) - * e.g. `#SBATCH --constraint="skylake348G"` - * For AMD use ` #SBATCH --constraint="amd"` +Further notes + +`intel` and `amd` node types are defined in the Slurm configuration as a feature: +- For any Intel node type use `#SBATCH --constraint="intel"` +- For a specific Intel CPU model use the host group name (see Table 1) + - e.g. `#SBATCH --constraint="skylake348G"` +- For AMD use ` #SBATCH --constraint="amd"` +- There are 10 nodes of node type `skylake348G` with SSD disk mounted on /tmp +- LOTUS nodes of node type `epyctwo1024` are not available yet on the `par-multi` queue {{< /alert >}} +{{< alert type="danger" >}} +If you choose to compile code for specific architectures, do not expect it to run elsewhere in the system. +{{< /alert >}} -**Table 1**. LOTUS cluster specification - -**Current** host groups - -Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM ----|---|---|---|--- -broadwell256G | 37 | Intel Xeon E5-2640-v4 "Broadwell" | 20 | 256 GB -skylake348G | 151 | Intel Xeon Gold-5118 "Skylake" | 24 | 348 GB -epyctwo1024G | 200 | AMD | 48 | 1024 GB | -{.table .table-striped} +## Retired host groups no longer in use -**Retired** host groups: no longer in use +(For reference only) Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM ---|---|---|---|--- @@ -59,11 +68,3 @@ Host group name | Number of nodes/hosts | Processor model | CPUs per host | ~~ivybridge2000G~~ | ~~3~~ -retired | ~~Intel Xeon E7-4860-v2 "Ivy Bridge"~~ | ~~48~~ | ~~2048 GB~~ {.table .table-striped} -**Notes** - - * There are 10 nodes of node type `skylake348G` with SSD disk mounted on /tmp - * LOTUS nodes of node type `epyctwo1024` are not available yet on the `par-multi` queue - -{{< alert type="danger" >}} -If you choose to compile code for specific architectures, do not expect it to run elsewhere in the system. -{{< /alert >}} diff --git a/content/docs/batch-computing/orchid-gpu-cluster.md b/content/docs/batch-computing/orchid-gpu-cluster.md index 09553d954..56320b7c6 100644 --- a/content/docs/batch-computing/orchid-gpu-cluster.md +++ b/content/docs/batch-computing/orchid-gpu-cluster.md @@ -9,23 +9,31 @@ type: docs This article provides details on JASMIN's GPU cluster, named **Orchid**. +## GPU cluster spec + +The JASMIN GPU cluster is composed of 16 GPU nodes: + +- 14 x standard GPU nodes with 4 GPU Nvidia A100 GPU cards each +- 2 x large GPU nodes with 8 Nvidia A100 GPU cards + +{{< image src="img/docs/gpu-cluster-orchid/file-NZmhCFPJx9.png" caption="ORCHID GPU cluster" >}} + ## Request access to Orchid -Access to the GPU cluster is controlled by being a member of the Slurm account -`orchid`. You can request access to this via the link below which will -direct you to the ORCHID service page on the JASMIN accounts portal: +Access to the GPU cluster (and a GPU interactive node) is controlled by being a member of the Slurm account +`orchid`. Please request access via the link below which will +take you to the ORCHID service page on the JASMIN accounts portal: -https://accounts.jasmin.ac.uk/services/additional_services/orchid/ +{{}} **Note:** In the supporting info on the request form, please provide details -on the software and the workflow that you will use/run on the GPU cluster (or -the interactive GPU node) +on the software and the workflow that you will use/run on ORCHID. ## Test a GPU job Testing a job on the JASMIN Orchid GPU cluster can be carried out in an -interactive mode by launching a pseudo-shell terminal SLURM job from a JASMIN -scientific server e.g. sci2: +interactive mode by launching a pseudo-shell terminal Slurm job from a JASMIN +scientific server e.g. `sci2`: {{}} srun --gres=gpu:1 --partition=orchid --account=orchid --pty /bin/bash @@ -35,11 +43,10 @@ srun --gres=gpu:1 --partition=orchid --account=orchid --pty /bin/bash {{}} ## you are now on gpuhost16 {{}} - The GPU node gpuhost016 is allocated for this interactive session on LOTUS -Note that for batch mode, a GPU job is submitted using the SLURM command +Note that for batch mode, a GPU job is submitted using the Slurm command 'sbatch': {{}} @@ -53,27 +60,26 @@ or by adding the following preamble in the job script file #SBATCH --gres=gpu:1 ``` -Note 1: `gpuhost015 `and `gpuhost016`are the two largest nodes with 64 CPUs and +Note 1: `gpuhost015` and `gpuhost016` are the two largest nodes with 64 CPUs and 8 GPUs. Note 2: **CUDA Version: 11.6** -Note 3: The SLURM batch queue 'orchid' has a maximum runtime of 24 hours and +Note 3: The Slurm batch partition/queue `orchid` has a maximum runtime of 24 hours and the default runtime is 1 hour. The maximum number of CPU cores per user is limited to 8 cores. If the limit is exceeded then the job is expected to be in -a pending state with the reason being **QOSGrpCpuLimit** +a pending state with the reason being {{}}QOSGrpCpuLimit{{}} ## GPU interactive node -There is also an interactive GPU node `gpuhost001.jc.rl.ac.uk` (same spec as -Orchid) that you can ssh into it from the JASMIN login server to prototype and -test your GPU code prior to using the batch GPU cluster Orchid +There is an interactive GPU node `gpuhost001.jc.rl.ac.uk`, with the same spec as +other Orchid nodes, that you can access via a login server to prototype and +test your GPU code prior to running as a batch job. - -{{}} +{{}} ssh -A gpuhost001.jc.rl.ac.uk {{}} -{{}} +{{}} ## you are now on gpuhost001 {{}} @@ -86,13 +92,3 @@ ssh -A gpuhost001.jc.rl.ac.uk - Singularity 3.7.0 - which supports NVIDIA/GPU containers - SCL Python 3.6 -The SLURM queue is `orchid` with maximum runtime of 24 hours and -default runtime 1 hour. - -## GPU cluster spec - -The JASMIN GPU cluster is composed of 16 GPU nodes: -- 14 x standard GPU nodes with 4 GPU Nvidia A100 GPU cards each -- 2 x large GPU nodes with 8 Nvidia A100 GPU cards - -{{< image src="img/docs/gpu-cluster-orchid/file-NZmhCFPJx9.png" caption="ORCHID GPU cluster" >}} diff --git a/content/docs/batch-computing/slurm-queues.md b/content/docs/batch-computing/slurm-queues.md index 5db160016..edde458eb 100644 --- a/content/docs/batch-computing/slurm-queues.md +++ b/content/docs/batch-computing/slurm-queues.md @@ -18,15 +18,15 @@ submissions to the LOTUS and ORCHID clusters. The Slurm queues in the LOTUS cluster are: - * `test` - * `short-serial` - * `long-serial` - * `par-single` - * `par-multi` - * `high-mem` - * `short-serial-4hr` (see Note 3) - -Each queue has an attribute of run-length limits (e.g. short, long) and +- `test` +- `short-serial` +- `long-serial` +- `par-single` +- `par-multi` +- `high-mem` +- `short-serial-4hr` + +Each queue is has attributes of run-length limits (e.g. short, long) and resources. A full breakdown of each queue and its associated resources is shown below in Table 1. @@ -34,18 +34,21 @@ shown below in Table 1. Queues represent a set of pending jobs, lined up in a defined order, and waiting for their opportunity to use resources. The queue is specified in the -job script file using SLURM scheduler directive `#SBATCH -p ` where `` is the name of the -queue/partition (Table 1. column 1) +job script file using Slurm scheduler directive like this: + +```bash +#SBATCH -p ` +``` + +where `` is the name of the queue/partition (Table 1. column 1) Table 1 summarises important specifications for each queue such as run time -limits and the number of CPU core limits. If the queue is not specified, SLURM +limits and the number of CPU core limits. If the queue is not specified, Slurm will schedule the job to the queue `short-serial` by default. Table 1. LOTUS/Slurm queues and their specifications -Queue name | Max run time | Default run time | Max CPU cores -per job | Max CpuPer -UserLimit | Priority +Queue name | Max run time | Default run time | Max CPU cores per job | MaxCpuPerUserLimit | Priority ---|---|---|---|---|--- `test` | 4 hrs | 1hr | 8 | 8 | 30 `short-serial` | 24 hrs | 1hr | 1 | 2000 | 30 @@ -53,25 +56,24 @@ UserLimit | Priority `par-multi` | 48 hrs | 1hr | 256 | 300 | 20 `long-serial` | 168 hrs | 1hr | 1 | 300 | 10 `high-mem` | 48 hrs | 1hr | 1 | 75 | 30 -`short-serial-4hr` ( **Note 3** ) | 4 hrs | 1hr | 1 | 1000 | 30 +`short-serial-4hr`
(**Note 3**) | 4 hrs | 1hr | 1 | 1000 | 30 {.table .table-striped} -**Note 1** : Resources that the job requests must be within the resource +**Note 1** : Resources requested by a job must be within the resource allocation limits of the selected queue. **Note 2:** The default value for `--time=[hh:mm:ss]` (predicted maximum wall -time) is 1 hour for the six SLURM queues. If you do not specify this option +time) is 1 hour for the all queues. If you do not specify this option and/or your job exceeds the default maximum run time limit then it will be -terminated by the SLURM scheduler. +terminated by the Slurm scheduler. -**Note 3** : A user must specify the SLURM job account `--account=short4hr` -when submitting a batch job to the provisional SLURM partition `short- -serial-4hr` +**Note 3** : A user must specify the Slurm job account `--account=short4hr` +when submitting a batch job to the `short-serial-4hr` queue. ## State of queues -The Slurm command `sinfo `reports the state of queues/partitions and nodes -managed by SLURM. It has a wide variety of filtering, sorting, and formatting +The Slurm command `sinfo` reports the state of queues and nodes +managed by Slurm. It has a wide variety of filtering, sorting, and formatting options. {{}} @@ -97,14 +99,14 @@ as they implement different job scheduling and control policies. ## 'sinfo' Output field description: -By default, the SLURM command 'sinfo' displays the following information: +By default, the Slurm command 'sinfo' displays the following information: - * **PARTITION** : Partition name followed by "*" for the default queue/partition - * **AVAIL** : State/availability of a queue/partition. Partition state: up or down. - * **TIMELIMIT** : The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days- hours:minutes :seconds . e.g. 2-00:00:00 is two days maximum runtime limit - * **NODES** : Count of nodes with this particular configuration e.g. 48 nodes - * **STATE** : State of the nodes. Possible states include: allocated, down, drained, and idle. For example, the state "idle" means that the node is not allocated to any jobs and is available for use. - * **NODELIST** List of node names associated with this queue/partition +- **PARTITION** : Partition name followed by **\*** for the default queue/partition +- **AVAIL** : State/availability of a queue/partition. Partition state: up or down. +- **TIMELIMIT** : The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days- hours:minutes :seconds . e.g. 2-00:00:00 is two days maximum runtime limit +- **NODES** : Count of nodes with this particular configuration e.g. 48 nodes +- **STATE** : State of the nodes. Possible states include: allocated, down, drained, and idle. For example, the state "idle" means that the node is not allocated to any jobs and is available for use. +- **NODELIST** List of node names associated with this queue/partition The `sinfo` example below, reports more complete information about the partition/queue short-serial @@ -116,12 +118,12 @@ sinfo --long --partition=short-serial (out)short-serial* up 1-00:00:00 1-infinite no NO all 48 idle host[146-193] {{}} -## How to choose a SLURM queue/partition +## How to choose a Slurm queue/partition ### Test queue -The test queue `test` can be used to test new workflows and also to help new -users to familiarise themselves with the SLURM batch system. Both serial and +The `test` queue can be used to test new workflows and also to help new +users to familiarise themselves with the Slurm batch system. Both serial and parallel code can be tested on the `test`queue. The maximum runtime is 4 hrs and the maximum number of jobs per user is 8 job slots. The maximum number of cores for a parallel job e.g. MPI, OpenMP, or multi-threads is limited to 8 @@ -171,7 +173,7 @@ submitted to the `par-single` queue . Each thread should be allocated one CPU core. Oversubscribing the number of threads to the CPU cores will cause the job to run very slow. The number of CPU cores should be specified via the submission command line `sbatch -n ` or by adding the -SLURM directive `#SBATCH -n `in the job script file. An +Slurm directive `#SBATCH -n `in the job script file. An example is shown below: {{}} @@ -179,7 +181,7 @@ sbatch --ntasks=4 --partition=par-single < myjobscript {{}} Note: Jobs submitted with a number of CPU cores greater than 16 will be -terminated (killed) by the SLURM scheduler with the following statement in the +terminated (killed) by the Slurm scheduler with the following statement in the job output file: #### par-multi @@ -187,16 +189,16 @@ job output file: Distributed memory jobs with inter-node communication using the MPI library should be submitted to the `par-multi` queue . A single MPI process (rank) should be allocated a single CPU core. The number of CPU cores should be -specified via the SLURM submission command flag `sbatch -n ` or by adding the SLURM directive `#SBATCH -n ` +specified via the Slurm submission command flag `sbatch -n ` or by adding the Slurm directive `#SBATCH -n ` to the job script file. An example is shown below: {{}} sbatch --ntasks=4 --partition=par-multi < myjobscript {{}} -Note 1: The number of CPU cores gets passed from SLURM submission flag `-n` . +Note 1: The number of CPU cores gets passed from Slurm submission flag `-n` . Do not add the `-np` flag to `mpirun` command . -Note 2: SLURM will reject a job that requires a number of CPU cores greater +Note 2: Slurm will reject a job that requires a number of CPU cores greater than the limit of 256. diff --git a/content/docs/batch-computing/slurm-scheduler-overview.md b/content/docs/batch-computing/slurm-scheduler-overview.md index 5a9f5595f..d9422a4df 100644 --- a/content/docs/batch-computing/slurm-scheduler-overview.md +++ b/content/docs/batch-computing/slurm-scheduler-overview.md @@ -5,11 +5,9 @@ title: Slurm scheduler overview weight: 20 --- -This article gives an overview of the Slurm Scheduler. - ## What is a Job Scheduler? -A job scheduler, or "batch" scheduler, is a tool that manages how user jobs +A job or batch scheduler, is a tool that manages how user jobs are queued and run on a set of compute resources. In the case of LOTUS the compute resources are the set of compute nodes that make up the [LOTUS hardware]({{< ref "lotus-cluster-specification" >}}). Each user can submit jobs to the scheduler which then decides which jobs to run and where to @@ -23,9 +21,9 @@ those resources. scheduler deployed on JASMIN. It allows users to submit, monitor, and control jobs on the LOTUS cluster. -## General principles for working with SLURM +## General principles for working with Slurm -Before learning how to use SLURM, it is worthwhile becoming familiar with the +Before learning how to use Slurm, it is worthwhile becoming familiar with the basic principles of scheduler operation in order to get the best use out of the LOTUS cluster. Scheduler software exists simply because the amount of jobs that users wish to run on a cluster at any given time is usually greatly in @@ -54,11 +52,11 @@ jobs that are run (bottom row). ## LOTUS queues -There are five standard SLURM queues for batch job submissions to LOTUS +There are five standard Slurm queues (also known as "partitions" in Slurm terminology) for batch job submissions to the LOTUS cluster: `short-serial`, `long-serial`, `par-single`, `par-multi` and `high-mem`. The default queue is `short-serial`. For testing new workflows, the additional queue `test`is recommended. The specification of each queue is -described in detail in this article: [SLURM queues on LOTUS]({{< ref "slurm-queues" >}}) +described in detail in this article: [Slurm queues on LOTUS]({{< ref "slurm-queues" >}}) Queues other than the five standard queues with the test queue should be ignored unless you have been specifically instructed to use them. @@ -86,5 +84,5 @@ The typical workflow for setting up and running LOTUS jobs is as follows: Occasionally a project has a specific requirement for a collection of compute nodes that involve the provision of a project-specific queue. If you are working on such a project your project lead will provide guidance on which -queue to use. Please [contact us](http://www.jasmin.ac.uk/help/contact/) If +queue to use. Please contact the helpdesk if you are interested in setting up a project-specific queue. diff --git a/content/docs/data-transfer/transfers-from-archer2.md b/content/docs/data-transfer/transfers-from-archer2.md index 5127109f4..828932bd6 100644 --- a/content/docs/data-transfer/transfers-from-archer2.md +++ b/content/docs/data-transfer/transfers-from-archer2.md @@ -9,10 +9,10 @@ title: Transfers from ARCHER2 This article explains how to transfer data between ARCHER2 and JASMIN. It covers: - * The choice of available tools / routes - * Example of how to use the currently-recommended method +- The choice of available tools / routes +- Example of how to use the currently-recommended method -## Choice of available Tools/Routes +## Choice of available Tools/Routes See [JASMIN external connections]({{< ref "jasmin-external-connections" >}}) and [Data Transfer Tools]({{< ref "data-transfer-tools" >}}) for general @@ -26,9 +26,9 @@ conditions. If you want to try **all** the options available, you will need: - * [hpxfer](https://accounts.jasmin.ac.uk/services/additional_services/hpxfer/) (high-performance data transfer) access role on JASMIN, in addition to the [jasmin-login](https://accounts.jasmin.ac.uk/services/login_services/jasmin-login/) role. - * a login account at ARCHER2 - * (only for certificate-based GridFTP) to have registered the subject of your JASMIN-issued short-term credential with ARCHER support. +- [hpxfer](https://accounts.jasmin.ac.uk/services/additional_services/hpxfer/) (high-performance data transfer) access role on JASMIN, in addition to the [jasmin-login](https://accounts.jasmin.ac.uk/services/login_services/jasmin-login/) role. +- a login account at ARCHER2 +- (only for certificate-based GridFTP) to have registered the subject of your JASMIN-issued short-term credential with ARCHER2 support. Check the examples in the linked documentation articles and ensure that you use them between the hosts used in the examples. Not all services connect over @@ -36,51 +36,46 @@ all routes to/from all hosts! NOTE: - * Enquiries about access to or use of ARCHER2 should be directed to ARCHER2 support ([support@archer2.ac.uk](mailto:support@archer2.ac.uk)) - * Enquiries about access to or use of JASMIN should be directed to JASMIN support (use beacon, below-right or [support@jasmin.ac.uk](mailto:mailto:support@jasmin.ac.uk)) +- Enquiries about access to or use of ARCHER2 should be directed to ARCHER2 support ([support@archer2.ac.uk](mailto:support@archer2.ac.uk)) +- Enquiries about access to or use of JASMIN should be directed to JASMIN support (use beacon, below-right or [support@jasmin.ac.uk](mailto:mailto:support@jasmin.ac.uk)) -Table 1, below, shows recommended combinations of hosts & tools for transfers -between RDF and JASMIN. -Note that (until +## Available transfer methods + +### Basic SSH transfer + +[**scp/rsync/sftp**]({{< ref "rsync-scp-sftp" >}}): Simple transfers using easy method, pushing data to general purpose xfer nodes. Convenient, but limited performance. -[scp/rsync/sftp]({{< ref "rsync-scp-sftp" >}}) -| Simple transfer using easy method to general purpose xfer nodes. -Convenient. ----|--- _source_ | _dest_ | _notes_ -`login.archer2.ac.uk` | `xfer1.jasmin.ac.uk` | over 10G JANET, but to -virtual machine at JASMIN end +--- | --- | --- +`login.archer2.ac.uk` | `xfer1.jasmin.ac.uk` | over 10G JANET, but to virtual machine at JASMIN end `login.archer2.ac.uk` | `xfer2.jasmin.ac.uk` | same -[GridFTP over SSH]({{< ref "gridftp-ssh-auth" >}}) +{.table .table-striped} -2nd choice method +### GridFTP over SSH -| GridFTP performance with convenience of SSH. Requires persistent ssh agent -on local machine where you have your JASMIN key. +[GridFTP over SSH]({{< ref "gridftp-ssh-auth" >}}): GridFTP performance with convenience of SSH. Requires persistent ssh agent +on local machine where you have your JASMIN key. **2nd choice method** + +_source_ | _dest_ | _notes_ +--- | --- | --- `login.archer2.ac.uk` | `hpxfer1.jasmin.ac.uk` | over 10G JANET -`login.archer2.ac.uk` | `hpxfer2.jasmin.ac.uk` | over 10G JANET -hpxfer2 is configured for longer distances but can be useful if hpxfer1 is -busy -[GridFTP using certificate auth]({{< ref "gridftp-cert-based-auth" >}}) -(now working again!) +`login.archer2.ac.uk` | `hpxfer2.jasmin.ac.uk` | over 10G JANET
hpxfer2 is configured for longer distances but can be useful if hpxfer1 is busy +{.table .table-striped} -1st choice method +### GridFTP using certificate auth -| Fully-featured GridFTP. Suitable for person-not-present transfers & long- -running ARCHER2 workflows. -`login.archer2.ac.uk` | `gridftp1.jasmin.ac.uk` | over 10G JANET. -Dedicated GridFTP server. -**No need for persistent SSH agent at ARCHER2 end** - -Table 1: comparison of current methods and routes for transferring data -between RDF and JASMIN. +[GridFTP using certificate auth]({{< ref "gridftp-cert-based-auth" >}}): Fully-featured GridFTP. Suitable for person-not-present transfers & long- +running ARCHER2 workflows. **1st choice method** -## +_source_ | _dest_ | _notes_ +--- | --- | --- +`login.archer2.ac.uk` | `gridftp1.jasmin.ac.uk` | over 10G JANET.
Dedicated GridFTP server.
**No need for persistent SSH agent at ARCHER2 end** +{.table .table-striped} ## 1st choice method: example -The now-recommended method for transfers between ARCHER2 and JASMIN is using +The currently-recommended method for transfers between ARCHER2 and JASMIN is using globus-url-copy with the concurrency option, as described below, but using certificate-based authentication rather than SSH. This will work for person- not-present transfers, so is suitable for long-running processes on ARCHER2 @@ -92,23 +87,22 @@ ARCHER2. This method **does not** require you to use your JASMIN SSH key. It involves: - * obtaining tools to communicate with JASMIN's short-lived credentials service - * using the service to obtain a credential (it should last for 30 days, but a new one can be obtained at any time) - * using the credential to initiate a transfer (this what you would need to repeat for each transfer) +- obtaining tools to communicate with JASMIN's short-lived credentials service +- using the service to obtain a credential (it should last for 30 days, but a new one can be obtained at any time) +- using the credential to initiate a transfer (this what you would need to repeat for each transfer) A fuller explanation of the process is given in this document: - * [Data Transfer Tools: GridFTP (certificate-based authentication)]({{< ref "gridftp-cert-based-auth" >}}) +- [Data Transfer Tools: GridFTP (certificate-based authentication)]({{< ref "gridftp-cert-based-auth" >}}) Once you have done these steps, you should be able to obtain a short-term credential as follows (do this command at the ARCHER2 end, after having downloaded the onlineca script as described in the document mentioned above): - - - $ ./onlineca-get-cert-wget.sh -U https://slcs.jasmin.ac.uk/certificate/ -l USERNAME -o ./cred.jasmin - $ chmod 600 cred.jasmin - +{{}} +./onlineca-get-cert-wget.sh -U https://slcs.jasmin.ac.uk/certificate/ -l USERNAME -o ./cred.jasmin +chmod 600 cred.jasmin +{{}} Note that the path `./` is used for the script `onlineca-get-cert-wget.sh`, but you should use the path to wherever you saved it. Alternatively, if you @@ -116,22 +110,22 @@ make yourself a `bin` directory and add that to your `PATH`, then you don't need to specify the path. 2\. Load the `gct` module (to make the current `globus-url-copy` command -available in your path on ARCHER2) +available in your path on ARCHER2). + +Once loaded, check with `which` to see that you have the `globus-url-copy` command available to you. - - - $ module load gct - $ which globus-url-copy - /work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy - +{{}} +module load gct +which globus-url-copy +(out)/work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy +{{}} 3\. Transfer a single file to your home directory on JASMIN (limited space, but to check things work) - - - $ globus-url-copy -vb -cred cred.jasmin src/data gsiftp://gridftp1.jasmin.ac.uk/~/ - +{{}} +globus-url-copy -vb -cred cred.jasmin SRC/FILE gsiftp://gridftp1.jasmin.ac.uk/DEST/FILE +{{}} Note that we specify the credentials file `cred.jasmin` and use the protocol `gsiftp://` with no need to specify the username in the connection string @@ -148,10 +142,9 @@ transfer protocol) 4\. Recursively transfer a directory of files, using the concurrency option for multiple parallel transfers - - - $ globus-url-copy -vb -cd -r -cc 4 -cred cred.jasmin src/data/ gsiftp://gridftp1.jasmin.ac.uk/path/dest/data/ - +{{}} +globus-url-copy -vb -cd -r -cc 4 -cred cred.jasmin SRC/DATA/ gsiftp://gridftp1.jasmin.ac.uk/DEST/DATA/ +{{}} **NOTE:** The `-cc` option initiates the parallel transfer of several files at a time, which achieves good overall transfer rates for recursive directory @@ -170,18 +163,17 @@ is still under investigation. Single-stream transfers (omitting the `-p N Here, the options used are (see `man globus-url-copy` for full details): - - - -vb | -verbose-perf - During the transfer, display the number of bytes transferred - and the transfer rate per second. Show urls being transferred - -concurrency | -cc - Number of concurrent ftp connections to use for multiple transfers. - -cd | -create-dest - Create destination directory if needed - -r | -recurse - Copy files in subdirectories - +```txt +-vb | -verbose-perf + During the transfer, display the number of bytes transferred + and the transfer rate per second. Show urls being transferred +-concurrency | -cc + Number of concurrent ftp connections to use for multiple transfers. +-cd | -create-dest + Create destination directory if needed +-r | -recurse + Copy files in subdirectories +``` Experiment with different concurrency options (4 is a good start, more than 16 would start to "hog" resources so please consider @@ -189,48 +181,44 @@ would start to "hog" resources so please consider 5\. Use the sync option to synchronise 2 directories between source and target file systems: - - - $ globus-url-copy -vb -cd -r -cc 4 -sync -cred cred.jasmin src/data/ gsiftp://gridftp1.jasmin.ac.uk/path/dest/data/ - +{{}} +globus-url-copy -vb -cd -r -cc 4 -sync -cred cred.jasmin SRC/DATA/ gsiftp://gridftp1.jasmin.ac.uk/DEST/DATA/ +{{}} -where `src/data/` and `/path/dest/data/` are source and destination paths, +where `SRC/DATA/` and `/DEST/DATA/` are source and destination paths, respectively (include trailing slash). Options are as before but with: - - - -sync - Only transfer files where the destination does not exist or differs - from the source. -sync-level controls how to determine if files - differ - +```txt +-sync + Only transfer files where the destination does not exist or differs + from the source. -sync-level controls how to determine if files + differ +``` Note that the default sync level is 2, see level descriptions below, which only compares time stamps. **If you want to include a file integrity check using checksums, you need to use`-sync-level 3` but there may be a performance cost.** - - - -sync-level - Choose criteria for determining if files differ when performing a - sync transfer. Level 0 will only transfer if the destination does - not exist. Level 1 will transfer if the size of the destination - does not match the size of the source. Level 2 will transfer if - the timestamp of the destination is older than the timestamp of the - source, or the sizes do not match. Level 3 will perform a checksum of - the source and destination and transfer if the checksums do not match, - or the sizes do not match. The default sync level is 2. - +```txt +-sync-level + Choose criteria for determining if files differ when performing a + sync transfer. Level 0 will only transfer if the destination does + not exist. Level 1 will transfer if the size of the destination + does not match the size of the source. Level 2 will transfer if + the timestamp of the destination is older than the timestamp of the + source, or the sizes do not match. Level 3 will perform a checksum of + the source and destination and transfer if the checksums do not match, + or the sizes do not match. The default sync level is 2. +``` So a full sync including comparison of checksums would be: - - - $ globus-url-copy -vb -cd -r -cc 4 -sync -sync-level 3 -cred cred.jasmin src/data/ gsiftp://gridftp1.jasmin.ac.uk/path/dest/data/ - +{{}} +globus-url-copy -vb -cd -r -cc 4 -sync -sync-level 3 -cred cred.jasmin src/data/ gsiftp://gridftp1.jasmin.ac.uk/path/dest/data/ +{{}} ## 2nd choice method: example @@ -242,8 +230,8 @@ log in to ARCHER2. You will need to have loaded into your SSH agent: - * The SSH key associated with your JASMIN account - * The SSH key associated with your ARCHER2 account, if you have one (it is recommended to use a different one than for JASMIN, if so) +- The SSH key associated with your JASMIN account +- The SSH key associated with your ARCHER2 account, if you have one (it is recommended to use a different one than for JASMIN, if so) You also need to ensure that you connect with the -A option for agent forwarding, to enable the JASMIN key to be available for the onward @@ -254,42 +242,43 @@ It should stay on your local machine. This does mean that you need an ssh- agent running on your local machine, so this method may not work for long- running continuous processes that need to spawn transfers. - - - $ ssh-add #(path to your JASMIN ssh key file on your local machine) - $ ssh-add #(path to your ARCHER2 ssh key if you have one, on on your local machine) - $ ssh-add -l # check both keys are loaded (are both key signatures listed in the output?) - $ ssh -A @login.archer2.ac.uk - (you are prompted for your password by the ARCHER2 system, whether or not you use an SSH key with your ARCHER2 account) - +{{}} +ssh-add #(path to your JASMIN ssh key file on your local machine) +ssh-add #(path to your ARCHER2 ssh key if you have one, on on your local machine) +ssh-add -l # check both keys are loaded (are both key signatures listed in the output?) +ssh -A @login.archer2.ac.uk +##(ARCHER2 now uses multi-factor auth at this stage) +{{}} 2\. Load the `gct` module (to make the current `globus-url-copy` command available in the path) - - - $ module load gct - $ which globus-url-copy - /work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy - +{{}} +module load gct +which globus-url-copy +(out)/work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy +{{}} 3\. Transfer a single file to your home directory on JASMIN (limited space, but to check things work) - - - $ globus-url-copy -vb sshftp://@hpxfer1.jasmin.ac.uk/~/ - + +{{}} +globus-url-copy -vb sshftp://@hpxfer1.jasmin.ac.uk/~/ +{{}} Obviously, replace `` with the path to the file you want to transfer. From here on, the commands are the same as described above in the "1st choice method" but simply replace -`-cred cred.jasmin gsiftp://gridftp1.jasmin.ac.uk` +```bash +-cred cred.jasmin gsiftp://gridftp1.jasmin.ac.uk +``` with -`sshftp://@hpxfer1.jasmin.ac.uk` - +```bash +sshftp://@hpxfer1.jasmin.ac.uk +```