Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A minor issue when running on cluster with SLURM submit job #11

Closed
ctminh opened this issue Jan 8, 2021 · 3 comments
Closed

A minor issue when running on cluster with SLURM submit job #11

ctminh opened this issue Jan 8, 2021 · 3 comments

Comments

@ctminh
Copy link

ctminh commented Jan 8, 2021

Hi *, I tried to run an example of queue_test on a Linux cluster with SLURM-job submission. The local_queue access (i.e., push, get) is ok, but the global_queue put/get (hcl queue) is suspended. Do you know what is the problem? Or did I miss something about the configuration?
I run the test in .../hcl/blob/dev/test/queue_test.cpp
The job was submitted on 4 nodes, 1 rank/node.

@ChristopherHogan
Copy link
Collaborator

Hi. There is some documentation for running on a cluster on our wiki. The important part is to correctly populate the hostfile in the test directory.

@ChristopherHogan
Copy link
Collaborator

Also, you need the hostnames or addresses of the server processes in the server_list file. More details can be found under SERVER_LIST_PATH in the Structure Initialization section of the README.

@ctminh
Copy link
Author

ctminh commented Jan 11, 2021

Thanks Christopher! It's true that the problem in the server_list file, just fixed it and it's fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants