Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace slow json filename resolver #306

Merged
merged 3 commits into from Jul 28, 2021

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Jun 28, 2021

Issue #305

Description of changes:

For optimization purposes, we found that pkgutil.get_data(package, resource) method could be used to read a file in a much faster way than from pkg_resources import resource_filename.

Learn more about reading static files in Python.

How did I test

I used pytest-benchmark to compare the performance of the two:

$ pytest test_local_sampling_benchmark.py
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/enowell/git/aws-xray-sdk-python
plugins: benchmark-3.4.1
collected 2 items

test_local_sampling_benchmark.py ..                                                                                                                                                            [100%]


---------------------------------------------------------------------------------------------- benchmark: 2 tests ---------------------------------------------------------------------------------------------
Name (time in us)                      Min                    Max                Mean              StdDev             Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pkgutil_static_read           50.6210 (1.0)         828.0920 (1.0)       61.8523 (1.0)       22.4959 (1.0)      54.3770 (1.0)      10.4147 (1.0)       798;992       16.1676 (1.0)       11205           1
test_pkg_resources_static_read     64.9590 (1.28)     18,941.0130 (22.87)    100.6223 (1.63)     558.5882 (24.83)    79.4150 (1.46)     12.1315 (1.16)        2;111        9.9382 (0.61)       1144           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
========================================================================================= 2 passed in 2.38s ==========================================================================================

which shows that pkgutil is faster.

Noteworthy points

  • pkgutil is standard in python2 as well, so it should work fine with it.
  • I had a chance to do an e2e test and traces still make it to AWS X-Ray

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@srprash srprash self-requested a review June 28, 2021 23:43
@NathanielRN NathanielRN reopened this Jun 28, 2021
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find

@anuraaga
Copy link
Contributor

Looks like this repo's CI needs to be migrated to github

def test_pkgutil_static_read(benchmark):
def get_sampling_rule():
json.loads(pkgutil.get_data(__name__, 'mock_sampling_rule.json'))
benchmark(get_sampling_rule)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are really testing the performance of standard library functions (pkgutil and pkg_resources) instead of testing aws-xray-sdk-python functions.

For that reason I think we should not include them here? The PR comment can serve as proof that this change is worth it.

Otherwise, we can leave the tests here so the results can be reproduced later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with deleting these benchmarks (first thing that came to mind but keeping them also isn't a big deal)

@NathanielRN NathanielRN merged commit 86248a5 into aws:master Jul 28, 2021
@NathanielRN NathanielRN deleted the remove-slow-filepath-resolve branch July 28, 2021 20:23
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this pull request Aug 20, 2021
Noticed this after reading getmoto/moto#4142

Ref: aws/aws-xray-sdk-python#306

git-svn-id: file:///srv/repos/svn-community/svn@1003952 9fca08f4-af9d-4005-b8df-a31f2cc04f65
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this pull request Aug 20, 2021
Noticed this after reading getmoto/moto#4142

Ref: aws/aws-xray-sdk-python#306


git-svn-id: file:///srv/repos/svn-community/svn@1003952 9fca08f4-af9d-4005-b8df-a31f2cc04f65
@NathanielRN NathanielRN linked an issue Nov 30, 2021 that may be closed by this pull request
mergify bot pushed a commit to aws-powertools/powertools-lambda-python that referenced this pull request Dec 8, 2021
Bumps [aws-xray-sdk](https://github.com/aws/aws-xray-sdk-python) from 2.8.0 to 2.9.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/aws/aws-xray-sdk-python/releases">aws-xray-sdk's releases</a>.</em></p>
<blockquote>
<h2>2.9.0 Release</h2>
<p>See details in <a href="https://github.com/aws/aws-xray-sdk-python/blob/master/CHANGELOG.rst">CHANGELOG</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/aws/aws-xray-sdk-python/blob/master/CHANGELOG.rst">aws-xray-sdk's changelog</a>.</em></p>
<blockquote>
<h1>2.9.0</h1>
<ul>
<li>bugfix: Change logging behavior to avoid overflow. <code>PR302 &lt;https://github.com/aws/aws-xray-sdk-python/pull/302&gt;</code>_.</li>
<li>improvement: Lazy load samplers to speed up cold start in lambda. <code>PR312 &lt;https://github.com/aws/aws-xray-sdk-python/pull/312&gt;</code>_.</li>
<li>improvement: Replace slow json file name resolver. <code>PR 306 &lt;https://github.com/aws/aws-xray-sdk-python/pull/306&gt;</code>_.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/be4dea2f89e44dd52c68b23e1620b8541947e64b"><code>be4dea2</code></a> Release commit for v2.9.0 (<a href="https://github-redirect.dependabot.com/aws/aws-xray-sdk-python/issues/318">#318</a>)</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/9858cab26dd87b26cfbec852449cfcfc9be4d73c"><code>9858cab</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/aws/aws-xray-sdk-python/issues/317">#317</a> from wangzlei/master</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/85d8801d7b00ce3b97ede94e3181db5ebe7ad09d"><code>85d8801</code></a> Remove redundant error log MISSING_SEGMENT_MSG</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/05f5e8f7ab7b17c5f9f80ac01c97712a8737ceed"><code>05f5e8f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/aws/aws-xray-sdk-python/issues/315">#315</a> from aws/willarmiros-patch-1</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/e1841e668a46b0da7b83e5fdf0bb231cedf2dfb3"><code>e1841e6</code></a> Create CODEOWNERS</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/0e1f935bd2040ee7dbf0625db7f7ad780c66fb37"><code>0e1f935</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/aws/aws-xray-sdk-python/issues/312">#312</a> from maxday/maxday/lazy-load-samplers</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/f4b33f0920a28d87a59979e1262b5511c80a6dfd"><code>f4b33f0</code></a> lazy load samplers</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/86248a528a43a1cab2c5ff50056eb01f3ff8d8a6"><code>86248a5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/aws/aws-xray-sdk-python/issues/306">#306</a> from NathanielRN/remove-slow-filepath-resolve</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/aa4b2f5827bcfdf6039753cd547fcb34611684c9"><code>aa4b2f5</code></a> Benchmark tests should both consistently return</li>
<li><a href="https://github.com/aws/aws-xray-sdk-python/commit/8ad460eb5a6de440939f5ce22cfc01ac7cca7613"><code>8ad460e</code></a> Add benchmarks for json read</li>
<li>Additional commits viewable in <a href="https://github.com/aws/aws-xray-sdk-python/compare/2.8.0...2.9.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aws-xray-sdk&package-manager=pip&previous-version=2.8.0&new-version=2.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'import pkg_resources' is slow and impacts cold starts
3 participants