Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reduce memory use with fetch and slurp #68753

Closed
wants to merge 4 commits into from

Conversation

sivel
Copy link
Member

@sivel sivel commented Apr 7, 2020

SUMMARY

This PR attempts a few things:

  1. Use context managers to open files in a few locations
  2. Where possible, don't keep multiple copies of the file in memory
  3. Don't assume because we use become that a non-become user (scp/sftp/dd) cannot access the file. As such, don't run stat with become, so we know whether slurp is actually required.
  4. Have slurp return the checksum of the data, so we don't have to use the data on the controller to calculate it.
ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
lib/ansible/module_utils/basic.py
lib/ansible/modules/net_tools/basics/slurp.py
lib/ansible/plugins/action/__init__.py
lib/ansible/plugins/action/fetch.py
ADDITIONAL INFORMATION

@ansibot ansibot added WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers. affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. net_tools Net-tools category support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Apr 7, 2020
data = base64.b64encode(source_content)
with io.BytesIO() as f:
checksum = b64encode_and_checksum(source, f)
data = to_native(f.getvalue()).strip()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will still end up with 2x here. I'm interested in investigating whether we can write this to a real file, and not memory, and then have our JSON encoder be able to stream from a file handle. I attempted this before, but need to look again.

@ansibot
Copy link
Contributor

ansibot commented Apr 7, 2020

The test ansible-test sanity --test pylint [explain] failed with 1 error:

lib/ansible/modules/net_tools/basics/slurp.py:91:14: undefined-variable: Undefined variable 'to_bytes'

The test ansible-test sanity --test pep8 [explain] failed with 3 errors:

lib/ansible/modules/net_tools/basics/slurp.py:76:26: E226: missing whitespace around arithmetic operator
lib/ansible/modules/net_tools/basics/slurp.py:76:30: E226: missing whitespace around arithmetic operator
lib/ansible/plugins/action/fetch.py:39:1: E303: too many blank lines (3)

click here for bot help

@sivel
Copy link
Member Author

sivel commented Apr 7, 2020

Ultimately this PR can only easily aim to improve memory on the controller. To reduce memory on the target further would require JSON gymnastics, but I question the benefit there too.

The big change on the controller, is instead of just doing f.write(base64.b64decode(data)) we base64 decode smaller chunks and stream to a file. As well as calculating the checksum via slurp on the target, so we don't have to use memory while hashing the base64 decoded content.

@jborean93
Copy link
Contributor

I know @bcoca was looking to see if it's possible to buffer the data back in chunks and just rebuild the file on the controller only. The trouble is with the split done the Windows slurp.ps1 is now in a collection so any changes to the interface needs to be done in a backwards compatible manner.

@ansibot ansibot removed the needs_triage Needs a first human triage before being processed. label Apr 7, 2020
@sivel
Copy link
Member Author

sivel commented Apr 7, 2020

any changes to the interface needs to be done in a backwards compatible manner

That should already be done. It should work with slurp.ps1 currently without requiring any changes. I'm still just playing though. I might throw this away like I have every time in the past.

Streaming the data back in chunks via "update_json" wouldn't necessarily address it either. I think in the end, we'll have to pay the penalty either on the controller or the target. I don't know which is preferable currently.

@jborean93
Copy link
Contributor

It's mostly around your point of

Have slurp return the checksum of the data, so we don't have to use the data on the controller to calculate it.

I see you've made that an optional check but I just wanted to make sure that you were aware of it, which it seems like you are.

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Apr 16, 2020
@sivel sivel closed this Jun 3, 2020
@ansible ansible locked and limited conversation to collaborators Jul 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. module This issue/PR relates to a module. net_tools Net-tools category stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. support:core This issue/PR relates to code supported by the Ansible Engineering Team. WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants