-
Notifications
You must be signed in to change notification settings - Fork 81
Closed
Description
我在尝试使用 Hydro 搭建并行计算课程的 OJ 平台,使用到了 go-judge 作为后台 judge 服务器。然而,我无法正常使用 mpirun -np 8 foo 启动 MPI 程序。
我尝试直接向 localhost:5050/run 地址发送 POST 请求来启动 MPI 程序,得到了这样的输出:
[
{
"status": "Nonzero Exit Status",
"exitStatus": 1,
"time": 25031000,
"memory": 7532544,
"runTime": 27193668,
"files": {
"stderr": "[executor_server:00026] opal_ifinit: unable to find network interfaces.\n--------------------------------------------------------------------------\nmpirun has detected an attempt to run as root.\n\nRunning as root is *strongly* discouraged as any mistake (e.g., in\ndefining TMPDIR) or bug can result in catastrophic damage to the OS\nfile system, leaving your system in an unusable state.\n\nWe strongly suggest that you run mpirun as a non-root user.\n\nYou can override this protection by adding the --allow-run-as-root option\nto the cmd line or by setting two environment variables in the following way:\nthe variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this\nprotection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and\nadd one more layer of certainty that you want to do so.\nWe reiterate our advice against doing so - please proceed at your own risk.\n--------------------------------------------------------------------------\n",
"stdout": ""
}
}
]为方便查看,将 stderr 的部分展开如下:
[executor_server:00026] opal_ifinit: unable to find network interfaces.
--------------------------------------------------------------------------
mpirun has detected an attempt to run as root.
Running as root is *strongly* discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.
We strongly suggest that you run mpirun as a non-root user.
You can override this protection by adding the --allow-run-as-root option
to the cmd line or by setting two environment variables in the following way:
the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this
protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and
add one more layer of certainty that you want to do so.
We reiterate our advice against doing so - please proceed at your own risk.
--------------------------------------------------------------------------根据这段报错的说法,我在启动 MPI 的命令中加入了 --allow-run-as-root 选项,然而,出现了新的问题:
[executor_server:00027] opal_ifinit: unable to find network interfaces.
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
[executor_server:00027] *** Process received signal ***
[executor_server:00027] Signal: Segmentation fault (11)
[executor_server:00027] Signal code: Address not mapped (1)
[executor_server:00027] Failing at address: 0x20
[executor_server:00027] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45320)[0x7be55b445320]
[executor_server:00027] [ 1] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_home_directory+0x28)[0x7be5582e4ac8]
[executor_server:00027] [ 2] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_mca_base_var_cache_files+0x32)[0x7be5582e0d52]
[executor_server:00027] [ 3] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_mca_base_var_init+0x247)[0x7be5582e16a7]
[executor_server:00027] [ 4] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_init_util+0x89)[0x7be5582af609]
[executor_server:00027] [ 5] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x77)[0x7be5582afc27]
[executor_server:00027] [ 6] /lib/x86_64-linux-gnu/libpmix.so.2(PMIx_server_init+0x30e)[0x7be558274d3e]
[executor_server:00027] [ 7] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so(ext3x_server_init+0x251)[0x7be55ad97731]
[executor_server:00027] [ 8] /lib/x86_64-linux-gnu/libopen-rte.so.40(pmix_server_init+0x31f)[0x7be55b81d44f]
[executor_server:00027] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so(+0x3f18)[0x7be55b2c1f18]
[executor_server:00027] [10] /lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x2aa)[0x7be55b86d15a]
[executor_server:00027] [11] /lib/x86_64-linux-gnu/libopen-rte.so.40(orte_submit_init+0x911)[0x7be55b8170e1]
[executor_server:00027] [12] /usr/bin/mpirun(+0x11e8)[0x5c6e2379c1e8]
[executor_server:00027] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7be55b42a1ca]
[executor_server:00027] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7be55b42a28b]
[executor_server:00027] [15] /usr/bin/mpirun(+0x1415)[0x5c6e2379c415]
[executor_server:00027] *** End of error message ***因此,我想要咨询的问题是:
- 我能否在 go-judge 上启动 MPI 程序?
- 如果可以,我应当如何启动?
- 最后,我能否在 go-judge 上不以 root 身份启动 MPI(或别的类型的)程序?
我所使用的 CPU 和操作系统为:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu Noble Numbat (development branch)"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
80 Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz我没有将 go-judge 安装到系统目录下;我启动 go-judge 的方法是:
$ nohup sudo ./go-judge/usr/bin/go-judge &
[1] 1104542
$ nohup: ignoring input and appending output to 'nohup.out'最后,我使用 python 脚本向 go-judge 发送 POST 请求,脚本的内容为:
# compile.py
import requests
import json
url = 'http://localhost:5050/run'
data = {
'cmd': [
{
'args': ['/usr/bin/mpic++', 'test.cpp', '-o', 'test'],
'env': ['PATH=/usr/bin:/bin'],
'files': [
{'content': ''},
{'name': 'stdout', 'max': 10240},
{'name': 'stderr', 'max': 10240}
],
'cpuLimit': 10000000000,
'memoryLimit': 104857600,
'procLimit': 50,
'copyIn': {
'test.cpp': {
'content':
'''
#include "mpi.h"
int main(int argc, char ** argv) {
MPI_Init(&argc, &argv);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
'''
}
},
'copyOut': ['stdout', 'stderr'],
'copyOutCached': ['test']
}
]
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36 MicroMessenger/7.0.9.501 NetType/WIFI MiniProgramEnv/Windows WindowsWechat",
"Accept-Encoding": "gzip, deflate",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Connection": "keep-alive",
"Content-Type": "application/json",
"Host": "lazytools.feidee.cn"
}
response = requests.post(url, data=json.dumps(data), headers=headers)
j = json.loads(response.text)
print(json.dumps(j, indent=4))import requests
import json
import argparse
parser = argparse.ArgumentParser(description='Delete a file')
parser.add_argument('--id', type=str, help='The file ID to delete')
args = parser.parse_args()
url = 'http://localhost:5050/run'
data = {
'cmd': [
{
'args': ['mpirun', '-np', '8', '--allow-run-as-root', 'test'],
'env': ['PATH=/usr/bin:/bin'],
'files': [
{'content': '1 1'},
{'name': 'stdout', 'max': 10240},
{'name': 'stderr', 'max': 10240}
],
'cpuLimit': 10000000000,
'memoryLimit': 104857600,
'procLimit': 50,
'copyIn': {
'test': {'fileId': f'{args.id}'}
}
}
]
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36 MicroMessenger/7.0.9.501 NetType/WIFI MiniProgramEnv/Windows WindowsWechat",
"Accept-Encoding": "gzip, deflate",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Connection": "keep-alive",
"Content-Type": "application/json",
"Host": "lazytools.feidee.cn"
}
response = requests.post(url, data=json.dumps(data), headers=headers)
j = json.loads(response.text)
print(json.dumps(j, indent=4))执行的指令及输出为:
$ python3 compile.py
[
{
"status": "Accepted",
"exitStatus": 0,
"time": 822298000,
"memory": 62218240,
"runTime": 824558182,
"files": {
"stderr": "[executor_server:00012] opal_ifinit: unable to find network interfaces.\n",
"stdout": ""
},
"fileIds": {
"test": "7IJSAUUO"
}
}
]
$ python3 run.py --id 7IJSAUUO
[
{
"status": "Signalled",
"exitStatus": 11,
"time": 58446000,
"memory": 7520256,
"runTime": 253555407,
"files": {
"stderr": "[executor_server:00018] opal_ifinit: unable to find network interfaces.\nhwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.\n[executor_server:00018] *** Process received signal ***\n[executor_server:00018] Signal: Segmentation fault (11)\n[executor_server:00018] Signal code: Address not mapped (1)\n[executor_server:00018] Failing at address: 0x20\n[executor_server:00018] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45320)[0x7400e8845320]\n[executor_server:00018] [ 1] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_home_directory+0x28)[0x7400e56e4ac8]\n[executor_server:00018] [ 2] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_mca_base_var_cache_files+0x32)[0x7400e56e0d52]\n[executor_server:00018] [ 3] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_mca_base_var_init+0x247)[0x7400e56e16a7]\n[executor_server:00018] [ 4] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_init_util+0x89)[0x7400e56af609]\n[executor_server:00018] [ 5] /lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x77)[0x7400e56afc27]\n[executor_server:00018] [ 6] /lib/x86_64-linux-gnu/libpmix.so.2(PMIx_server_init+0x30e)[0x7400e5674d3e]\n[executor_server:00018] [ 7] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so(ext3x_server_init+0x251)[0x7400e848e731]\n[executor_server:00018] [ 8] /lib/x86_64-linux-gnu/libopen-rte.so.40(pmix_server_init+0x31f)[0x7400e8d0944f]\n[executor_server:00018] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so(+0x3f18)[0x7400e87aaf18]\n[executor_server:00018] [10] /lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x2aa)[0x7400e8d5915a]\n[executor_server:00018] [11] /lib/x86_64-linux-gnu/libopen-rte.so.40(orte_submit_init+0x911)[0x7400e8d030e1]\n[executor_server:00018] [12] /usr/bin/mpirun(+0x11e8)[0x63b8bae9b1e8]\n[executor_server:00018] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7400e882a1ca]\n[executor_server:00018] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7400e882a28b]\n[executor_server:00018] [15] /usr/bin/mpirun(+0x1415)[0x63b8bae9b415]\n[executor_server:00018] *** End of error message ***\n",
"stdout": ""
}
}
]Metadata
Metadata
Assignees
Labels
No labels