Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speed]speed up python executor in fluid #8729

Closed
jacquesqiao opened this issue Mar 5, 2018 · 3 comments
Closed

[Speed]speed up python executor in fluid #8729

jacquesqiao opened this issue Mar 5, 2018 · 3 comments
Assignees

Comments

@jacquesqiao
Copy link
Member

jacquesqiao commented Mar 5, 2018

Background

problem

In our Python executor, every executor.run will clone the program and then add feed and fetch op to the cloned program. The following profile demonstrates that the Program.clone is very time-consuming. I add a simple cache to void the program clone, and the result is very impressive.

solution

avoid the clone.

Experiment

profile script

#8674

condition

  • one card without parallel_do
batch_num before(s) after(s) after/before before/after
10 9.91367912292 7.47897624969 0.7544 1.3255395915090542
20 19.6153604984 14.6058752537 0.744 1.3429774085898059
30 28.9721696377 21.7348105907 0.7501 1.332984684490269

timeline

before optimize

image

after optimize

image

@jacquesqiao jacquesqiao self-assigned this Mar 5, 2018
@jacquesqiao
Copy link
Member Author

jacquesqiao commented Mar 5, 2018

Related issue

@dzhwinter dzhwinter changed the title speed up python executor in fluid [framework]speed up python executor in fluid Mar 6, 2018
@dzhwinter dzhwinter changed the title [framework]speed up python executor in fluid [Framework]speed up python executor in fluid Mar 6, 2018
@dzhwinter dzhwinter changed the title [Framework]speed up python executor in fluid [Speed]speed up python executor in fluid Mar 6, 2018
@jacquesqiao
Copy link
Member Author

jacquesqiao commented Mar 6, 2018

profile with parallel_do

one card with parallel_do

batch_num before(s) after(s) parallel_do & no cache parallel_do & cache
5 4.7715549469 3.86894655228 9.00337505341 7.58821201324
10 9.91367912292 7.47897624969 16.4335706234 13.0987007618
20 22.2372214794 16.6557092667 30.1410288811 23.9675991535
30 28.9721696377 21.7348105907

conclusion

parallel_do add about 40% of performance loss with the program cache

without parallel_do

image

with parallel_do

image

@wangkuiyi
Copy link
Collaborator

The existence of Feed and Fetch operators is a wrong design:

  1. The Feed operator copies data from Python variables into Fluid's variable. This enables users to write Python programs to load and augment data before sending the data to the Fluid program. However, as Fluid was designed as a new programming language, there should not be a bridge from Python variables to Fluid variables.

    We are implementing data loading and augmentation operators. Once we completes, we can remove the Fetch operator, thus no more problem as cloning the ProgramDesc to add Fetch operators every time we call Executor::Run.

  2. The Fetch operator copies variables from Fluid to Python. Similar to the above argument, we should not have Fetch operator at all. Let us polish the Print operator to make it prints the data in the format required by VisualDL, so we can do performance/accuracy analysis using VisualDL, but not in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants