Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize imports #4248

Merged
merged 13 commits into from Aug 19, 2019
Merged

Optimize imports #4248

merged 13 commits into from Aug 19, 2019

Conversation

federicotdn
Copy link
Contributor

@federicotdn federicotdn commented Aug 14, 2019

Proposed changes:
Fixes #3010.

The idea of this PR is to make the rasa module load faster inside Python. This is somewhat related to the time it takes to run rasa on the command line, but I consider that to be a separate, more complex, issue.

If we improve the load time of the rasa module, then people using our module directly will notice a slight speedup on their applications' start-up time, assuming they are not loading all sub-modules explicitly. If they are, the start-up time will not be affected.

To measure import time, first I considered just measuring the time it took to run python -c 'import rasa'. However, this wouldn't have been too precise since that would also measure the start-up time for the Python interpreter and built-in modules. A much better option was to use a new feature added in Python 3.7, importtime. With python -X importtime rasa, one can get a detailed report of how much time it took for the rasa module to be imported, and all of its sub-modules as well. I wrote a short script which runs the profiling command N times, selects the relevant parts of the output, and averages the results. It can be used with any number N and any Python module.

To find which modules it was most convenient to look at, I used tuna. Tuna will take the output of python -X importtime rasa and then display all imported modules and their import times on a flamegraph:

Screenshot 2019-08-14 at 17 03 12

(import times for master branch)

To use tuna, I ran the following commands:

$ cd /to/the/rasa/repo
$ pip install tuna
$ python -X importtime -c 'import rasa' 2> imports.txt
$ tuna imports.txt

The changes I ended up applying to the base master branch were:

  • In nlu/test.py, don't load CRFEntityExtractor early.
  • In constants.py, remove the unused variable FALLBACK_CONFIG_PATH and then remove the pkg_resources import, which represented about 28% of the total loading time for the rasa module.
  • In cli/utils.py, avoid loading the questionary library (unless type checking).
  • In training_data/loading.py, avoid loading the requests library until it's used.

After applying these four changes, the total loading time for the rasa module went from an average of 0.545 seconds to 0.260 seconds (N=15). This represents approximately a 52.3% decrease in loading time. The absolute values in seconds may vary between hardware configurations and operating systems.

This PR also removes the rasa attribute from the rasa module, which was pointing at itself (Which allowed doing rasa.rasa.rasa.rasa.rasa...).

Status (please check what you already did):

  • made PR ready for code review
  • added some tests for the functionality
  • updated the documentation
  • updated the changelog
  • reformat files using black (please check Readme for instructions)

@federicotdn federicotdn marked this pull request as ready for review August 16, 2019 11:49
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great detectives work! Do you also have a screenshot of the new import times which we can add to the issue / pr? And can we create an issue to do the import optimization for the CLI as well?

rasa/cli/utils.py Outdated Show resolved Hide resolved
tests/import_time.py Outdated Show resolved Hide resolved
tests/import_time.py Outdated Show resolved Hide resolved
tests/import_time.py Outdated Show resolved Hide resolved
tests/import_time.py Outdated Show resolved Hide resolved
rasa/__init__.py Show resolved Hide resolved
@federicotdn
Copy link
Contributor Author

Here's the new flamegraph for the updated imports:

Screenshot 2019-08-19 at 12 14 06

@federicotdn federicotdn merged commit 9330ae8 into master Aug 19, 2019
@federicotdn federicotdn deleted the optimize-imports branch August 19, 2019 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profile imports
2 participants