-
Notifications
You must be signed in to change notification settings - Fork 2
BasicUsage
ChinaSRC is a network of servers that are pooled together to maximize their computational capabilities for specific purposes — often for computationally-intensive requirements such as data processing, simulations and modeling of SKA precursor/ pathfinder.
For novice or first-time ChinaSRC users, ChinaSRC O&M Team prepared this basic step for guidance on how to start your ChinaSRC journey prior to running actual jobs.
- Log in to their ChinaSRC accounts;
- Perform file and folder transfers to (upload) and from (download) the ChinaSRC;
- Use environment modules;
- Manage their Anaconda environments and packages;
- Create SLURM job scripts; and
- Run and manage their SLURM jobs.
After apply the aacount, the user can now log in to the ChinaSRC.
🔗 http://chinasrc.shao.ac.cn:8882
To log in to the ChinaSRC, use this command in your local machine's terminal:
$ ssh -p 20002 username@chinasrcyun.shao.ac.cn
After successfully logging in, the ChinaSRC's welcome page will be displayed as following:
[username@workstation ~]$
The ChinaSRC is composed of the following nodes (servers):
- Login node
- This is where users log in to the ChinaSRC. DO NOT run jobs or programs here.
- Compute nodes
- X86 nodes x 15. Every node has:
- Intel(R) Xeon(R) Gold 5218 CPU @ 2.3GHz
- 32 logical CPUs
- 768 GB RAM
- X86 nodes x 8. Every node has:
- Intel(R) Xeon(R) Gold 6132 CPU @ 2.6GHz
- 28 logical CPUs
- 1TB RAM
- ARM nodes x 10. Every node has:
- Kunpeng 920 CPU @ 2.6GHz
- 96 logical CPUS
- 1TB RAM
- GPU nodes
- 1 Intel(R) Xeon(R) 2690 @ 2.6GHz with 4 NVIDIA Tesla V100, 256GB RAM, 28 cores
- 1 Intel(R) Xeon(R) 6152 @ 2.3GHz with 4 NVIDIA Tesla V100 (NVLINK), 1TB RAM, 44 cores
- 1 Intel(R) Xeon(R) 6140 @ 2.3GHz with 8 NVIDIA Tesla V100 (NVLINK 32GB) , 512GB RAM, 36 cores
- 1 Intel(R) Xeon(R) 5320 @ 2.2GHz with 4 NVIDIA Ampere A40 (40GB), 512GB RAM, 36 cores
- X86 nodes x 15. Every node has:
Currently, ChinaSRC have 5.1 PB storage and extended 6 PB storage in this year.
Each user has the following default storage quotas:
- Home (
/home/username
): 50 GB - Group folders (
/groups/group_name/home/share/
: No limitation
The ChinaSRC is regularly undergoing maintenance and streamlining operations, so this may change in the future with prior notice to users.
The home folder is intended for long-term data storage, while the group folders maybe last for a while, so jobs can be performed either in their home or group folders.
Remote file transfers via the terminal can be done using scp
or rsync
.
Here I recommend you to use rsync (Because the rsync
will detects the difference between the source and destination files when the transfer begin, it will terminate if there is no change). All of the commands listed here should be done on the local computer for both upload and download operations.
In your computer, upload files with rsync
using the following command:
$ rsync --rsh='ssh -p 20002' -avzu local_files username@chinasrc.yun.shao.ac.cn:/home/username/
Just remember add the specific PORT for the transfer
For more information about rsync
and its options, refer to its manual pages using man rsync
.
Modules allow program installations with different versions to be used without them interfering with each other, thus effectively keeping each version in a sandboxed environment. In other words, modules allow programs to be used in isolation from others which avoids possible incompatibilities and inconsistencies. However, it should be noted that the ChinaSRC Team is gradually doing away with modules in favor of Anaconda environments, but modules are still used for programs that are not available in the Anaconda repository (anaconda.org).
Before you going to using the module, please import the environment first using :
$ module use /home/software/modulefiles/
Modules have the format <module_name>/<version>
, for example: wsclean/cpu-2.9
.
List Available Modules
Without any argument, this command will list all available versions of all installed modules. When one or more module names are provided, the available versions for the modules are listed:
$ module avail [<module1/version> <module2/version> ...]
For example, running module avail
without additional arguments will print the following example list of modules which is not exhaustive as it is constantly being updated:
$ module avail
----------------------------------------------------------- /home/software/modulefiles/ -----------------------------------------------------------
aocommon/arm-3.0 duchamp/cpu-1.6.2 lapack/arm-3.8.0 pgplot/cpu-5.2
aocommon/cpu-3.0 dysco/arm-1.1 lapack/cpu-3.10.0-gcc-4.8.5 pgplot/cpu-5.2-gcc-4.8.5
aoflagger/arm-v2.12.1 dysco/cpu-1.1 lapack/cpu-3.10.0-gcc-7.3.0 pgplot/cpu-5.2-gcc-7.3.0
aoflagger/cpu-3.0.0-gcc-4.8.5 dysco/cpu-1.2-gcc-4.8.5 lapack/cpu-3.8.0 pgplot/gpu-5.2
aoflagger/cpu-3.0.0-gcc-7.3.0 erfa/cpu-1.5.0 lapack/cpu-3.8.0-gcc-4.8.5 prefactor/arm-3.1-gcc-9.3.0
aoflagger/cpu-gui-v2.12.1 erfa/cpu-2.0.0-gcc-4.8.5 lapack/gpu-3.8.8 prefactor/cpu-3.1-gcc-9.3.0
aoflagger/cpu-v2.12.1 erfa/cpu-2.0.0-gcc-7.3.0 libsla/arm-master python/arm-2.7.14
askapsoft/1.12.0 EveryBeam/cpu-master-20210630 libsla/cpu-master python/cpu-2.7.14
boost/arm-1.65.1 EveryBeam/cpu-master-gcc-7.3.0 lua/cpu-5.3.6-gcc-4.8.5 python/cpu-3.8.0-gcc-4.8.5-vast
boost/cpu-1.65.1 factor/arm-1.4-gcc-9.3.0 lua/cpu-5.3.6-gcc-7.3.0 python/cpu-3.8.12-gcc-4.8.5
boost/cpu-1.76.0-gcc-4.8.5 factor/cpu-1.4-gcc-9.3.0 miriad/2007 python/cpu-3.8.12-gcc-7.3.0
boost/cpu-1.76.0-gcc-7.3.0 fftw/arm-3.8.8 miriad/cpu-2007 python/cpu-3.9.2-gcc-9.3.0
casacore/arm-2.4.1 fftw/cpu-3.3.10-gcc-4.8.5 Montage/cpu-6.0 python/gpu-2.7.14
casacore/cpu-2.4.1 fftw/cpu-3.3.10-gcc-7.3.0 mpich/cpu-2-1.5rc3 RTS/cpu-master
casacore/cpu-3.3.0-gcc-4.8.5 fftw/cpu-3.8.8 mpich/cpu-3.2.1 RTS/gpu-master
casacore/cpu-3.3.0-gcc-7.3.0 fftw/gpu-3.8.8 mpich/cpu-3.2.1-gcc-4.8.5 sextractor/cpu-2.25.0
casacore/cpu-3.4.0-gcc-7.3.0 gcc/7.3.0 mwa-reduce/arm-master stilts/arm-3.1-4
cfitsio/arm-3450 gcc/7.3.0-new mwa-reduce/cpu-master stilts/cpu-3.1-4
cfitsio/cpu-3450 gcc/9.3.0 mwa-reduce/cpu-master-2021 swarp/arm-2.38.0
cfitsio/cpu-4.0.0-gcc-4.8.5 hdf5/arm-1.10.4 mwa-reduce/cpu-master-2022 swarp/cpu-2.38.0
cfitsio/cpu-4.0.0-gcc-7.3.0 hdf5/cpu-1.10.4 MWA_Tools/arm-mwa-sci wcslib/arm-6.2
cfitsio/gpu-3450 hdf5/cpu-1.10.4-gcc-7.3.0 MWA_Tools/cpu-mwa-sci wcslib/cpu-6.2
chgcentre/arm-wsclean2.6 hdf5/cpu-1.12.1-gcc-7.3.0 MWA_Tools/cpu-mwa-sci-wsclean-2.9 wcslib/cpu-7.7-gcc-4.8.5
chgcentre/cpu-wsclean2.6 hdf5/cpu-1.13.1-gcc-4.8.5 MWA_Tools/mwa-sci wcslib/cpu-7.7-gcc-7.3.0
cmake/cpu-3.15.2-gcc-7.3.0 hdf5/cpu-1.13.1-gcc-7.3.0 MWA_Tools/mwa-sci.old wcslib/gpu-6.2
cmake/cpu-3.15.2-gcc-7.3.0-new hdf5/gpu-1.10.4 openmpi/cpu-2.0.2 wcstools/cpu-3.9.6
cmake/cpu-3.20.0 Healpix/arm-heapy openmpi/cpu-4.0.1 wsclean/arm-2.6
cmake/cpu-3.20.0-gcc-7.3.0 Healpix/cpu-f90 openmpi/gpu-4.0.1 wsclean/cpu-2.6
cotter/arm-master Healpix/cpu-f90-gcc-4.8.5 pal/cpu-0.9.8 wsclean/cpu-2.9
cotter/cpu-4.6-gcc-4.8.5 Healpix/cpu-heapy pal/cpu-0.9.8-gcc-4.8.5 wsclean/cpu-2.9-gcc-7.3.0
cotter/cpu-4.6-gcc-7.3.0 Healpix/gpu-cxx pal/cpu-0.9.8-gcc-7.3.0 wsclean/cpu-3.0-gcc-7.3.0
cotter/cpu-master Healpix/gpu-f90 pgplot/arm-5.2
-------------------------------------------- /opt/app/spack/share/spack/modules/linux-centos7-haswell ---------------------------------------------
autoconf-2.69-gcc-4.8.5-6k6kik7 isl-0.18-gcc-4.8.5-igs522o ncurses-6.2-gcc-4.8.5-tbpd5z4
autoconf-archive-2019.01.06-gcc-4.8.5-7rxz2yv isl-0.21-gcc-4.8.5-ikicpxe perl-5.30.2-gcc-4.8.5-uay4u7v
automake-1.16.2-gcc-4.8.5-ipyg4ha libiconv-1.16-gcc-4.8.5-qazxaa4 pkgconf-1.6.3-gcc-4.8.5-2qrpgpd
binutils-2.34-gcc-4.8.5-2csi6vr libsigsegv-2.12-gcc-4.8.5-ymriiur pkgconf-1.7.3-gcc-4.8.5-z3r4unw
bzip2-1.0.8-gcc-4.8.5-ersrl36 libtool-2.4.6-gcc-4.8.5-fzl2npj readline-8.0-gcc-4.8.5-3jeiguw
diffutils-3.7-gcc-4.8.5-jknorwe libxml2-2.9.10-gcc-4.8.5-3foymu4 tar-1.32-gcc-4.8.5-v3iynan
gcc-10.1.0-gcc-4.8.5-2new4ox m4-1.4.18-gcc-4.8.5-7x2wh2t texinfo-6.5-gcc-4.8.5-fjg3jyt
gcc-7.5.0-gcc-4.8.5-of6wn6o mpc-1.1.0-gcc-4.8.5-g6zd7ob xz-5.2.5-gcc-4.8.5-rcyjfkv
gdbm-1.18.1-gcc-4.8.5-7xh2soi mpc-1.1.0-gcc-4.8.5-kv3zuys zlib-1.2.11-gcc-4.8.5-pkmj6e7
gettext-0.20.2-gcc-4.8.5-kapb6qj mpfr-3.1.6-gcc-4.8.5-nol4vkt zstd-1.4.5-gcc-4.8.5-3boiaus
gmp-6.1.2-gcc-4.8.5-zn55wh7 mpfr-4.0.2-gcc-4.8.5-kluqbcj
------------------------------------------ /opt/app/spack/share/spack/modules/linux-centos7-cascadelake -------------------------------------------
autoconf-2.69-gcc-10.1.0-2c3fdjr libpciaccess-0.13.5-gcc-10.1.0-lw54lde openmpi-2.1.6-gcc-10.1.0-tvthe74
autoconf-archive-2019.01.06-gcc-10.1.0-z7nw2bb libsigsegv-2.12-gcc-10.1.0-cw7jinv openmpi-3.1.6-gcc-10.1.0-2mcmstt
automake-1.16.2-gcc-10.1.0-nvemodk libtool-2.4.6-gcc-10.1.0-w2dkpic perl-5.30.2-gcc-10.1.0-vncpxeg
environment-modules-4.5.1-gcc-10.1.0-gfocl6n libxml2-2.9.10-gcc-10.1.0-wfhddkv pkgconf-1.7.3-gcc-10.1.0-uignu4o
gcc-10.1.0-gcc-10.1.0-3o3bvj2 m4-1.4.18-gcc-10.1.0-twi7kfh readline-8.0-gcc-10.1.0-y5e4cch
gdbm-1.18.1-gcc-10.1.0-pdpplkc mpc-1.1.0-gcc-10.1.0-gc7nbvo tcl-8.6.8-gcc-10.1.0-bxxjpg6
gmp-6.1.2-gcc-10.1.0-lo2ohmr mpfr-4.0.2-gcc-10.1.0-ckcev3b util-macros-1.19.1-gcc-10.1.0-6ijs7w4
hwloc-1.11.11-gcc-10.1.0-ezt4eyv ncurses-6.2-gcc-10.1.0-tsvqpzn xz-5.2.5-gcc-10.1.0-2zdmlkh
isl-0.21-gcc-10.1.0-g45gmhu numactl-2.0.12-gcc-10.1.0-ckp5im3 zlib-1.2.11-gcc-10.1.0-yor3u7m
libiconv-1.16-gcc-10.1.0-lpidrg4 openmpi-2.0.0-gcc-10.1.0-65nxxvq zstd-1.4.5-gcc-10.1.0-rtzjnll
--------------------------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------------------------
dot module-git module-info modules null use.own
---------------------------------------------------------------- /etc/modulefiles -----------------------------------------------------------------
mpi/mpich-3.0-x86_64 mpi/mpich-3.2-x86_64 mpi/mpich-x86_64
-------------------------------------------------------------- /opt/app/modulefiles ---------------------------------------------------------------
mpich/3.0.4 mpich/3.2 openmpi/4.0.4/gcc openmpi/4.0.4/intel openmpi/4.1.4/gcc singularity/3.8.7
On the other hand, when using the command module avail wsclean
for example, the available versions of the wsclean
module are listed:
$ module avail wsclean
----------------------------------------------------------- /home/software/modulefiles/ -----------------------------------------------------------
wsclean/arm-2.6 wsclean/cpu-2.6 wsclean/cpu-2.9 wsclean/cpu-2.9-gcc-7.3.0 wsclean/cpu-3.0-gcc-7.3.0
$ module use /home/software/modulefiles
$ module load MWA_Tools/cpu-mwa-sci-wsclean-2.9
- cotter / mwa-reduce / wsclean
$ module use /home/software/modulefiles
$ module load askapsoft/1.12.0
- mslist/ readms / smear...
Anaconda is a package and environment manager written primarily in Python. Its official website is anaconda.org.
$ source /opt/app/anaconda3/bin/activate
$ conda activate # activate base env
Caution
Creating environments may significantly use computational resources which is not allowed in the login node. This operation should be performed in a compute node. Therefore, the commands discussed here should be submitted as a SLURM job. Refer to the next SLURM section on how to submit a job.
Default Way
To create an Anaconda environment, simply use the following command template:
$ conda create --name magnetism python=3.7
# But on ChinaSRC using the command:
$ srun -N 1 -p hw-32C768G --comment=group_name conda create --name magnetism python=3.7
To activate an environment, use the following command template:
$ conda activate <env_name|env_path>
SLURM is the job and resource manager used in the ChinaSRC. Its online documentation is at https://slurm.schedmd.com/documentation.html.
These are the job parameters that are required prior to running any job:
-
--comment
: (string) group account where job quotas are set; -
--partition
: (string) which partition the job will be submitted to; -
--nodes
: (integer) number of nodes to request; -
--ntasks
: (integer) total number of CPUs to request; -
--output
: (string) job log file
On the other hand, these are some of the optional job parameters:
-
--ntasks-per-node
: (integer) specify the number of CPUs per node to be requested (must not contradict--ntasks
if also specified); -
--mem
: (string) memory per node (e.g., 40GB, 80GB, etc.); -
--job-name
: (string) name for the job; will be displayed in job monitoring commands (as discussed later); -
--error
: (string) job error file; recommended to not define this parameter and use only--output
instead; -
--requeue
: (no arg) make job eligible for requeue;
For other parameters or more info regarding the above listed parameters, see the sbatch
manual using the following command or go to the online manual.
A job script is submitted to allocate resources for a job. The previously discussed job parameters and the commands to be used to run the job are placed here.
Here is a sample job script named job.sbatch
where comments have been included to describe what each block does:
#!/bin/bash
#SBATCH --account=<slurm_group_acct>
#SBATCH --partition=<partition>
#SBATCH --nodes=<num_nodes>
#SBATCH --ntasks=<num_cpus>
#SBATCH --job-name="<jobname>"
#SBATCH --output="%x.out" ## <jobname>.<jobid>.out
##SBATCH --ntasks-per-node=1 ## optional
##SBATCH --mem=24G ## optional: mem per node
##SBATCH --error="%x.%j.err" ## optional; better to use --output only
your_program_here
Submit Job Script
It is recommended to submit the job inside the folder containing the job script. It is also recommended that any and all input and/or output files be within the same folder where the job script is located. This is to avoid changing working directories which may result in confusion and possible errors in accessing files/folders. For example, if the job folder is at /home/username/myjob
where all the necessary input files are stored together with the job script named job.sbatch
:
$ cd /home/username/myjob/
$ sbatch job.sbatch
Show Job Queue
If no argument is passed, all jobs in the queue will be displayed.
$ squeue [-u <username> ] [-p <partition>] [-w <nodelist>]
Show Job Parameters
$ scontrol show job <job_id>
Check Node and/or Partition Status
$ sinfo [-p <partition> | -n <nodelist>]
Cancel Job(s)
You may only cancel jobs created under your account.
$ scancel <job_id1> [<job_id2> ...]