-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hi,when i use sscha to simulation the H3S example, I have a problem.
The sscha.Cluster module was designed for a specific workflow:
Standard workflow: the user runs the main Python script on a local workstation or on the cluster’s login node. The script submits and distributes computational tasks to remote compute nodes via ssh and scp, and manages the files.
My actual situation: I run the main Python script directly on a compute node (after submitting it with sbatch), and, for security reasons, this compute node itself has SSH service disabled.
This creates a fundamental conflict: I am executing a module that requires SSH for its operation inside an environment (my HPC compute node) that cannot accept SSH connections.
what should i do?
the output is
(base) [login1 H3S]$ cat 2out.dat
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: cpu11
Local device: mlx5_0
--------------------------------------------------------------------------
ssh: Could not resolve hostname none: Name or service not known
Error with cmd: ssh None 'echo "/public/home/yan/qe/H3S"'
EXITSTATUS: 255; attempt = 1
THREAD 41879 EXECUTE COMMAND: ssh None 'echo "/public/home/yan/qe/H3S"'
Traceback (most recent call last):
File "/public/home/yan/qe/H3S/H3S_relax.py", line 126, in <module>
my_hpc.setup_workdir()
File "/public/home/yan/apps/anaconda3/envs/sscha/lib/python3.10/site-packages/sscha/Cluster.py", line 1400, in setup_workdir
workdir = self.parse_string(self.workdir)
File "/public/home/yan/apps/anaconda3/envs/sscha/lib/python3.10/site-packages/sscha/Cluster.py", line 1453, in parse_string
status, output = self.ExecuteCMD(cmd, return_output = True, raise_error= True)
File "/public/home/yan/apps/anaconda3/envs/sscha/lib/python3.10/site-packages/sscha/Cluster.py", line 402, in ExecuteCMD
raise IOError("Error while communicating with the cluster. More than %d attempts failed." % (i+1))
OSError: Error while communicating with the cluster. More than 1 attempts failed.
`
my input.py
`#my_hpc = sscha.Cluster.Cluster(mpi_cmd=r"srun -n 40",AlreadyInCluster=True)
my_hpc = sscha.Cluster.Cluster(mpi_cmd=r"srun -n 40")
#my_hpc.hostname = "login1"
my_hpc.workdir = "/public/home/yan/qe/H3S/run"
my_hpc.binary = "/public/home/apps/qe/qe-7.3.1/bin/pw.x -npool NPOOL -i PREFIX.pwi > PREFIX.pwo"
#Then we need to specify if some modules must be loaded in the submission script
my_hpc.load_modules = """##!/bin/bash
#SBATCH --job-name=sscha
#SBATCH --partition=cpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --time=14-00:00:00
source /public/env/intel2021
source /public/env/openmpi-4.1.5_icc
"""
my_hpc.n_cpu = 40 # We will use 40 processors
my_hpc.n_nodes = 1 #In 1 node
my_hpc.n_pool = 4 # This is an espresso specific tool, the parallel CPU are divided in 4 pools
#We can also choose in how many batch of jobs we want to submit simultaneously, and how many configurations for each job
my_hpc.batch_size = 4
my_hpc.job_number = 8
#In this way we submit 10 jobs, each one with 10 configurations (overall 100 configuration at time)
my_hpc.set_timeout(300) # We give 30 seconds of timeout
my_hpc.time = "00:20:00" # We can specify the time limit for each job,
my_hpc.setup_workdir()