Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto-Schedule][Fix] Fix hang while tune model through rpc #9032

Merged
merged 4 commits into from Sep 23, 2021

Conversation

echuraev
Copy link
Contributor

No description provided.

@echuraev
Copy link
Contributor Author

@shingjan please take a look on this PR. After #8492 the tuning doesn't work in proper way. It hangs. The problem is in the resources management and global dictionary with arguments. The issue is that global arguments don't release managed resources, and the RPC hangs till they won't be released.
I don't know why you added args as an argument to _timed_eval_func and _rpc_run?
With this fix, we don't have this hang during tuning.

@vinx13 vinx13 self-assigned this Sep 16, 2021
@vinx13
Copy link
Member

vinx13 commented Sep 16, 2021

args was added because some workloads like sparse require predefined inputs that shouldn’t be randomly generated. These inputs are looked up from a global dictionary that won’t be propagated to popen workers. Therefore we get these predefined arguments from main process and passed it through the workers. Is the hang related to this global dictionary? With this PR I’m afraid this lookup won’t work because it is called from the worker side.

@echuraev
Copy link
Contributor Author

echuraev commented Sep 16, 2021

args was added because some workloads like sparse require predefined inputs that shouldn’t be randomly generated. These inputs are looked up from a global dictionary that won’t be propagated to popen workers. Therefore we get these predefined arguments from main process and passed it through the workers. Is the hang related to this global dictionary? With this PR I’m afraid this lookup won’t work because it is called from the worker side.

Yes, the hang related to the global dictionary. In this case I have another workaround how to fix the hang. In the _rpc_run and _timed_eval_func we can create a copy of global arguments and work with them. In this case it will work, I already tested it. I mean to do something like this:

            assert len(args) == len(build_res.args)
+          import copy
+          loc_args = copy.deepcopy(args)
            # pylint: disable=consider-using-enumerate
-            for idx in range(len(args)):
-                if args[idx] is None:
+            for idx in range(len(loc_args)):
+                if loc_args[idx] is None:
                    build_res_arg = build_res.args[idx]
                    empty_array = ndarray.empty(
                        get_const_tuple(build_res_arg.shape), build_res_arg.dtype, dev
                    )
                    random_fill(empty_array)
-                    args[idx] = empty_array
+                    loc_args[idx] = empty_array
                else:
-                    args[idx] = ndarray.array(args[idx], dev)
+                    args[idx] = ndarray.array(args[idx], dev)

I don't need this workaround due to it is necessary to do a deep copy of arguments. What do you think about it?

@vinx13
Copy link
Member

vinx13 commented Sep 16, 2021

Sorry I’m a bit confused. Is it because args not being properly managed? It should be a python array with numpy ndarray as elements. It will be pickled and sent to workers and then be unpickled. Do you know exactly what resource is not freed?

@echuraev
Copy link
Contributor Author

Is it because args not being properly managed?

I think so. In case when the args is a global dictionary then during the tuning some object hold the rpc_endpoint and we cannot destroy it. This is why this hang happens.

Do you know exactly what resource is not freed?

I tried to find the exact resource which hold the resources and leads to hang. But didn't find it. Now I thought about it and I suppose it can be the empty ndarrays which were created. But I have to check it. Maybe the good solution will be to regenerate such arguments which were not predefined for each run.

@vinx13
Copy link
Member

vinx13 commented Sep 16, 2021

But I have to check it. Maybe the good solution will be to regenerate such arguments which were not predefined for each run.

I think this is already the case. Currently, in the main process side, we look up the dictionary via prepare_runner_args, which create an python array of arguments that only contains predefined args (these args are converted to numpy array to make them pickable). For those not predefined, None is added to that array as a placeholder. Then, this array is sent to workers. In each worker, it replaces None with randomly generated tvm.ndarray.
Empty arrays created in timed_rpc_run should be destroy after running the program in the device.
Also if you can provide steps to reproduce it I can definitely take a look

@echuraev
Copy link
Contributor Author

Empty arrays created in timed_rpc_run should be destroy after running the program in the device.

Who will guarantee that the arrays will be destroyed? We have a global variable which was passed to the function and in this function new arrays were created in this global dictionary. When the program was run in the device, these arrays won't be destroyed due to they are the part of global object. Or I'm wrong?

I updated this PR with another fix for this problem. I just create a local copy of the arguments in _rpc_run and _timed_eval_func. In this case, then we go out of the function scope the local objects will be destroyed, and it guarantees to us that the arrays were destroyed.

Another workable fix is to set arguments to None after function execution. For example:

            assert len(args) == len(build_res.args)
+            indices = []
            # pylint: disable=consider-using-enumerate
            for idx in range(len(args)):
                if args[idx] is None:
                    build_res_arg = build_res.args[idx]
                    empty_array = ndarray.empty(
                        get_const_tuple(build_res_arg.shape), build_res_arg.dtype, dev
                    )
                    random_fill(empty_array)
+                    indices.append(idx)
                    args[idx] = empty_array
                else:
                    args[idx] = ndarray.array(args[idx], dev)
            dev.sync()

            # First run for check that the kernel is correct
            func.entry_func(*args)
            dev.sync()

            costs = time_f(*args).results

+            for idx in indices:
+                args[idx] = None

In this case, the arrays also will be destroyed.

What about steps to reproduce this issue:

  1. Run rpc_tracker
  2. Connect Android or iOS device to the tracker by using AndroidRPC or ios_rpc applications
  3. Run tuning session on this device after several trials the tuning will hang.

@vinx13
Copy link
Member

vinx13 commented Sep 17, 2021

I see. If I understand correctly, the issue is when popen worker got a copy of args here https://github.com/apache/tvm/blob/main/python/tvm/exec/popen_worker.py#L77 and call timed_rpc_run, it create new ndarrays as elements of args. When timed_rpc_run finishes, it doesn't free these ndarrays. This looks like python gc issue? Note that args is not a global dictionary, it is a local copy in each worker, seems you don't need to record the indices of ndarrays (in the alternative fix you provided). Did you try set args=None or set all elements to None (the predefined arrays are also converted to ndarray from numpy array here in each popen workers locally, which should also be freed) Otherwise the alternative fix looks good to me as it avoids deep copying

@echuraev
Copy link
Contributor Author

Did you try set args=None or set all elements to None (the predefined arrays are also converted to ndarray from numpy array here in each popen workers locally, which should also be freed) Otherwise the alternative fix looks good to me as it avoids deep copying

Yes, I did it and it works fine. I wrote this code with indices just because I wasn't sure that I can set None to the args which were predefined. In this case, if you are ok with alternative fix (with setting None to arguments) then, I'll update this PR tomorrow. Thank you very much.
Only one thing that I don't really like in this fix is that we need to set these elements to None in both cases (after executed function and also in case when exception was generated).

@vinx13
Copy link
Member

vinx13 commented Sep 17, 2021

Alternatively we can try creating a local array without deep copy. I think the issue if the arguments sent to Popen workers are not freed immediately (not sure why), if so creating another array should work

loc_args = []
for arg in args:
  if arg is None:
     loc_args.append(empty_array)
  else:
     loc_args.append(ndarray.array(arg))

@echuraev
Copy link
Contributor Author

@vinx13 sorry for the delay, I was busy at the weekends. I updated the PR and added fix with local arguments. Could you please take a look once again?

Copy link
Contributor

@shingjan shingjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Probably need to re-trigger PI. BTW, is this an issue that PopenPoolExecutor will always have with parameter/argument passing?

@vinx13
Copy link
Member

vinx13 commented Sep 20, 2021

@shingjan not sure why args sent to popenworker is not immediately GC'ed

@shingjan
Copy link
Contributor

@echuraev after #9052 merges, you can rebase and the CI issue should be fixed.

@echuraev
Copy link
Contributor Author

@echuraev after #9052 merges, you can rebase and the CI issue should be fixed.

Thank you! Will wait when this PR will be merged.

@echuraev echuraev force-pushed the echuraev/fix_rpc_hangs branch 2 times, most recently from cd14bbb to 80df07a Compare September 22, 2021 11:26
Copy link
Member

@vinx13 vinx13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some minor change needed

python/tvm/auto_scheduler/measure.py Outdated Show resolved Hide resolved
@echuraev
Copy link
Contributor Author

Found some minor change needed

Sorry, only now I saw that you added the comment in this PR. Thank you for the change.

@vinx13 vinx13 merged commit ee6fe12 into apache:main Sep 23, 2021
@vinx13
Copy link
Member

vinx13 commented Sep 23, 2021

Thanks @echuraev @shingjan this PR is merged!

@echuraev echuraev deleted the echuraev/fix_rpc_hangs branch September 24, 2021 10:36
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* [Auto-Schedule][Fix] Fix hang while tune model through rpc

* Fix problem with hang by using deep copy

* Fix with local args

* Update python/tvm/auto_scheduler/measure.py

Co-authored-by: Wuwei Lin <vincentl13x@gmail.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* [Auto-Schedule][Fix] Fix hang while tune model through rpc

* Fix problem with hang by using deep copy

* Fix with local args

* Update python/tvm/auto_scheduler/measure.py

Co-authored-by: Wuwei Lin <vincentl13x@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants