Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行Dynamic_train.py时报错Exception in thread Thread-2:parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]: #15

Open
Senwei-Huang opened this issue Apr 27, 2022 · 5 comments

Comments

@Senwei-Huang
Copy link

Senwei-Huang commented Apr 27, 2022

作者您好,我运行QuadrupedalRobots/ETGRL/train.py训练是没有问题的,但是运行Dynamic_train.py时出现了下面3个问题,查看Dynamic_train.py同级目录里是有./model/Dynamic_parallel_model.py文件的,请问是什么原因造成的呢?

Exception in thread Thread-2:
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:

[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

parl.remote.exceptions.FutureFunctionError: There is an error raised when calling the future function __init__.

完整报错信息

[04-26 20:29:14 MainThread @Dynamic_train.py:71] args:Namespace(K=20, alg='ga', eval=0, gamma=1, load='', outdir='Dynamic', sigma=0.1, steps=10000, suffix='exp0', thread=2, xparl='192.168.30.145:8037')
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend
raise e
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend
self._xparl_remote_wrapper_obj = remote_wrapper(
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init
raise RemoteError('init', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:
[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
traceback:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection
cls = load_remote_class(message[1])
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class
with open(file_name + '.py') as t_file:
FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend
raise e
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend
self._xparl_remote_wrapper_obj = remote_wrapper(
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init
raise RemoteError('init', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:
[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
traceback:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection
cls = load_remote_class(message[1])
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class
with open(file_name + '.py') as t_file:
FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

Traceback (most recent call last):
File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/Dynamic_train.py", line 74, in
main()
File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/Dynamic_train.py", line 72, in main
model.train(args.steps)
File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/model/Dynamic_parallel_model.py", line 159, in train
mean_re = self.update(epoch)
File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/model/Dynamic_parallel_model.py", line 128, in update
future_objects.append(self.agent_list[i].batch_sample_episodes(param=solutions[i*self.K:(i+1)*self.K,:],K = self.K))
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 144, in getattr
raise self._xparl_remote_object_exception
parl.remote.exceptions.FutureFunctionError: There is an error raised when calling the future function __init__.
You can see the detailed error message above, which is printed by another thread.

Process finished with exit code 1

环境

Ubuntu 18.04
python 3.8
parl = 1.4.0
torch = 1.7.0
rlschool = 1.0.2

@xueeinstein

@Senwei-Huang Senwei-Huang changed the title 运行Dynamic_train.py时报错Exception in thread Thread-2 运行Dynamic_train.py时报错Exception in thread Thread-2:parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]: Apr 27, 2022
@Senwei-Huang
Copy link
Author

我在PaddlePaddle / PARL下找到了两个类似的问题:
第一个(2020.7.13)
主要的问题是当前运行的代码依赖于文件夹内的代码,xparl默认是只是分发当前文件夹中的.py文件。导致这个问题在于PARL/xparl的设计是针对多机并行设计的,需要把当前工作目录的代码分发到不同机器上(在单机上也用同样的逻辑),并提供了一个单机内的并行的解决思路:

export PYTHONPATH=./:$PYTHONPATH
xparl start --port XXXX 
python main.py

但是这个方法并不有效。

第二个(2020.8.7)
XPARL会自动分发当前文件夹中的所有文件,但其子文件夹除外。XPARL不可能分发所有子文件夹,因为用户可能有复杂的目录结构,有时甚至有大文件。要显式分发所需的文件,请尝试以下API:
请注意,您必须传递文件,而不是直接传递文件夹。
法1:
修改
parl.connect(xparl_addr)

parl.connect(xparl_addr, distributed_files=['./model/Dynamic_parallel_model.py','./alg/es.py','./model/__init__.py'])

这个方法也无效。

法2:

git clone -b xparl_submod https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .

这个方法执行不了
报错:

正克隆到 'PARL'...
fatal: 远程分支 xparl_submod 在上游 origin 未发现

您好,请问你们当时是怎么跑通的,咋到我这就会出现问题了呢?能不能给点解决的建议,谢谢。@xueeinstein @TomorrowIsAnOtherDay

@TomorrowIsAnOtherDay
Copy link
Contributor

TomorrowIsAnOtherDay commented Apr 27, 2022

运行Dynamic_train.py时出现了下面3个问题
我看了下Dynamic_train.py是没有并行代码的,好奇你这个并行的错误怎么来的,是你自己改造并行版本的吗?

@Senwei-Huang
Copy link
Author

没有改造,Dynamic_train.py的这行代码:

model = ES_ParallelModel(mean_dict=MEAN_DICT,gait=GAIT_LIST,K=args.K,thread = args.thread,sigma=args.sigma,
                             dynamic_param=dynamic_param,outdir=outdir,alg=args.alg,xparl_addr = args.xparl)

调用了./model/Dynamic_parallel_model.py的ES_ParallelModel类,Dynamic_parallel_model.py里面有并行代码,就下面这行:

@parl.remote_class(wait=False)
class RemoteESAgent(object):

@TomorrowIsAnOtherDay

@TomorrowIsAnOtherDay
Copy link
Contributor

抱歉,今天会议比较多,我们尽快看下这个问题。

@Senwei-Huang
Copy link
Author

不着急,您先忙。
@TomorrowIsAnOtherDay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants