Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating standalone binary for kapitan #323

Merged
merged 11 commits into from Aug 22, 2019

Conversation

yoshi-1224
Copy link
Contributor

Fixes issue #145

My apologies for not checking in lately. This work is quite exploratory in nature compared to other tasks so couldn't really figure the right time to report.

So far I've been checking out pyinstaller and nuitka as the tool for making the standalone binary for kapitan.

Pyinstaller

  • can create one binary file that is portable.
  • right now the size is about ~24MB
  • have tested it and works as expected (at least for compile in examples folder)

Nuitka

suggestion

I believe it is best to go with PyInstaller since we do releases on a regular basis and we want to pick a solution that gives us least hassle for every update. Fortunately, the resulting binary from PyInstaller is not so large either (projects using things like pyqt and pandas seem to have it much worse)

I'd been trying to reduce the size of the resulting binary, and the following shows the output of ls ordered by size of the binary components (up to 10th largest):

12339728  Jul 23 21:27 _jsonnet.cpython-36m-x86_64-linux-gnu.so
4218320   Jul 23 21:27 __main__ # this is the kapitan __main__
3590928   Jul 23 21:27 libpython3.6m.so.1.0
3452736   Jul 23 21:27 libcrypto.so.1.1
1594864   Jul 23 21:27 libstdc++.so.6
1335488   Jul 23 21:27 _decimal.cpython-36m-x86_64-linux-gnu.so
1006096   Jul 23 21:27 unicodedata.cpython-36m-x86_64-linux-gnu.so
944168    Jul 23 21:27 pyexpat.cpython-36m-x86_64-linux-gnu.so
845576    Jul 23 21:27 _cffi_backend.cpython-36m-x86_64-linux-gnu.so

Judging by this, I don't think there is much room for improvement in terms of the size considering the fact that we will need to manually dig through smaller unused imports (and even if that's possible we can't do much with import statements outside of kapitan). I've tried pyinstaller both with and without virtualenv but the resulting size is roughly the same. I've also used UPX for compression, but unfortunately it doesn't do much either.

TODO

  • write test script that diffs output from standalone binary and output from python and make sure nothing unexpected happens with the binary
  • explore cython + pyinstaller as @ramaro has previously explored so that we can possibly improve the speed of binary.

@uberspot @ramaro
Please let me know if this direction is ok. Thanks for your time!

@yoshi-1224 yoshi-1224 force-pushed the standalone-binary branch 2 times, most recently from 5cf56e4 to cae117c Compare July 24, 2019 10:28
@ramaro
Copy link
Member

ramaro commented Jul 25, 2019

@yoshi-1224 this is exactly it!

@yoshi-1224
Copy link
Contributor Author

@ramaro @uberspot
So the problem now is that compiling python scripts to .so files via cython would make the file size much larger. For example, I tried compiling all the .py files to .so for reclass sub-folder and 300KB (after deleting tests) ended up in about 4MB (with debug info stripped). You can refer to https://github.com/yoshi-1224/cythonize-poc/tree/master/reclass-example to see what I did if that helps.

So I believe the questions are:

  • how much increase in size can we tolerate for the standalone binary? Without any cython involved, it is about 24MB as mentioned above.
  • which part of the code improves its performance the most when cythonized? Can you help me out if you have any educated guesses on this?

Thanks!

@ramaro
Copy link
Member

ramaro commented Jul 30, 2019

@yoshi-1224 I don't think we're aiming for performance boost here, but portability instead. I'd say let's keep cython out of the picture for now (we can revisit later). @uberspot what do you think?

@uberspot
Copy link
Contributor

++ for the performance part. Usually this would run in CI pipelines anyway so this wouldn't be greatly prioritised. We can always revisit in the future if that changes. :)

@yoshi-1224 yoshi-1224 marked this pull request as ready for review July 31, 2019 09:06
@yoshi-1224
Copy link
Contributor Author

@ramaro @uberspot Thanks for the feedback!

I don't think we're aiming for performance boost here, but portability instead

I see. In that case, Pyinstaller actually advises to run it on each of the target platforms, so I guess Cython wouldn't help for improving the portability any further?

So we now need to build a pipeline for automatically building this binary on Debian (as pointed out on the GSoC proposal) for each release. Could you advise me on how I could achieve this? I think as far as building the binary goes, we could use something like Vagrant.

@uberspot
Copy link
Contributor

Travis has the following OSs available https://docs.travis-ci.com/user/reference/overview/
If you need debian specifically (which isn't there on travis), it might just be easier to build a new docker image on travis and test the build of the binary inside the image (?). That way it's isolated from the CI/CD and more flexible.

@yoshi-1224
Copy link
Contributor Author

@uberspot I added Dockerfile for building the binary. Tests are passing on Travis as well.
Do we ship this binary with helm binding as well? In that case I'll add the .so as hidden binary during the pyinstaller build process.

@ramaro
Copy link
Member

ramaro commented Aug 6, 2019

@yoshi-1224 yes good point! Please do add. Thanks

@yoshi-1224
Copy link
Contributor Author

Hi @uberspot @ramaro
All the tests are passing now! Is there anything else that I should work on regarding this PR?

pyinstaller-Dockerfile Outdated Show resolved Hide resolved
scripts/pyinstaller.sh Outdated Show resolved Hide resolved
scripts/pyinstaller.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@uberspot uberspot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go for me! :)
Just needs rebasing before merging:

Rebasing the commits of this branch on top of the base branch cannot be performed automatically due to conflicts encountered while reapplying the individual commits from the head branch.

@yoshi-1224
Copy link
Contributor Author

@uberspot
Thanks for the review! It might be easier if #342 is merged before this so that I can fix the test for the standalone binary as well in response to the change in dependency management.

@uberspot
Copy link
Contributor

@uberspot
Thanks for the review! It might be easier if #342 is merged before this so that I can fix the test for the standalone binary as well in response to the change in dependency management.

Done 👍

@yoshi-1224
Copy link
Contributor Author

@uberspot @ramaro @adrianchifor
I believe this PR is ready now. Could you please take a look?

@uberspot uberspot merged commit e3dc9b9 into kapicorp:master Aug 22, 2019
@uberspot
Copy link
Contributor

Nice! :)

@ramaro
Copy link
Member

ramaro commented Aug 22, 2019

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants