Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Reduce Wasm binary size #4203

Open
lastmjs opened this issue Oct 5, 2022 · 2 comments
Open

[RFC] Reduce Wasm binary size #4203

lastmjs opened this issue Oct 5, 2022 · 2 comments
Labels
RFC Request for comments

Comments

@lastmjs
Copy link

lastmjs commented Oct 5, 2022

Summary

As I've been building Kybra I've come across the need to include the entire frozen stdlib, along with freezing all of the developer's Python files and dependencies. When using the frozen stdlib, the resulting binary is far too large (~30 mb) to be deployed to our environment. I suggest RustPython look into ways to eliminate Wasm binary size bloat in all of its forms to optimize the final Wasm binary output, especially when freezing modules.

Detailed Explanation

As I've been building Kybra (a Python environment for a decentralized cloud), I've run into issues with the final Wasm binary produced when including the stdlib and the frozen stdlib. The Wasm environment we are deploying to has very strict requirements for Wasm binary size, which right now is ~2mb. That limit will hopefully be lifted over time. But right now RustPython will create Wasm binaries up to ~30mb in size unoptimized, and even optimizing and compressing with gzip results in binaries far too large to deploy to our environment.

I suggest RustPython implement means to reduce the size of the final binary produced when using the freezing features. This could be done in a number of ways, including module bundling, tree shaking, dead code elimination, and perhaps other ways as well.

Module bundling

For module bundling, the idea is to follow the statically defined Python imports from an entry file, and create an output directory that only includes the exact files/folders referenced from the imports. RustPython can then freeze this directory. It would be desirable for RustPython to do this type of module bundling automatically when freezing, only following the imports as statically defined.

The JS ecosystem builders/bundlers make heavy use of this kind of static analysis to create JS bundles.

Some possibly useful projects:

Some rough code that has been working locally to start the bundling process:

import kybra
import modulegraph.modulegraph
import os
import shutil
import subprocess
import sys

def handle_builtin_module(node) -> bool:
    stdlib_path = f'{canister_path}/RustPython/Lib'
    module_name = node.identifier

    if os.path.exists(f'{stdlib_path}/{module_name}'):
        shutil.copytree(f'{stdlib_path}/{module_name}', f'{python_source_path}/{module_name}', dirs_exist_ok=True)
        return True


    if os.path.exists(f'{stdlib_path}/{module_name}.py'):
        shutil.copy(f'{stdlib_path}/{module_name}.py', f'{python_source_path}/{module_name}.py')
        return True

    return False

canister_name = sys.argv[1]
py_entry_file_path = sys.argv[2]
did_path = sys.argv[3]
compiler_path = os.path.dirname(kybra.__file__) + '/compiler'
canister_path=f'.dfx/kybra/{canister_name}'
build_sh_path = compiler_path + '/build.sh'

shutil.copytree(compiler_path, canister_path, dirs_exist_ok=True)

if not os.path.exists(f'{canister_path}/RustPython'):
    subprocess.call(['git', 'clone', '--single-branch', '--branch', 'kybra_initial', 'https://github.com/demergent-labs/RustPython', f'{canister_path}/RustPython'])

path = sys.path[:]
path[0] = os.path.dirname(py_entry_file_path)

graph = modulegraph.modulegraph.ModuleGraph(path, ['test', 'unittest'])
entry_point = graph.run_script(py_entry_file_path)

python_source_path = f'{canister_path}/python_source'

if os.path.exists(python_source_path):
    shutil.rmtree(python_source_path)

os.makedirs(python_source_path)

num_nodes = 0

for node in graph.flatten(start=entry_point):
    num_nodes += 1

    print(node)

    if type(node) == modulegraph.modulegraph.Script:
        shutil.copy(node.filename, f'{python_source_path}/{os.path.basename(node.filename)}')

    if type(node) == modulegraph.modulegraph.SourceModule:
        if '.' in node.identifier:
            print(f'skipping {node.identifier}')
            continue

        builtin_module_handled = handle_builtin_module(node)

        if builtin_module_handled:
            print(f'{node.identifier} copied from RustPython/Lib')

        if not builtin_module_handled:
            shutil.copy(node.filename, f'{python_source_path}/{os.path.basename(node.filename)}')

    if type(node) == modulegraph.modulegraph.Package:
        builtin_module_handled = handle_builtin_module(node)

        if builtin_module_handled:
            print(f'{node.identifier} copied from RustPython/Lib')

        if not builtin_module_handled:
            packagepath = node.packagepath[0]
            destination_path = f'{python_source_path}/{node.identifier}'
            shutil.copytree(packagepath, destination_path, dirs_exist_ok=True)

    if type(node) == modulegraph.modulegraph.BuiltinModule:
        builtin_module_handled = handle_builtin_module(node)

        if builtin_module_handled:
            print(f'{node.identifier} copied from RustPython/Lib')

    if type(node) == modulegraph.modulegraph.MissingModule:
        print(node)

print(f'num_nodes: {num_nodes}')

subprocess.call([build_sh_path, canister_name, py_entry_file_path, did_path, compiler_path])

Tree shaking

The idea is to remove unnecessary imports. Some imports may be statically defined, but never actually used.

Some possibly useful projects:

Dead code elimination

The idea is to remove unnecessary code.

Some possibly useful projects:

Drawbacks, Rationale, and Alternatives

This will obviously take some work and possibly add complexity to RustPython. Also it isn't common for the Python community to attempt to optimize in such ways, apparent from the fact that there is not mature tooling to achieve concise bundles like in the JS community. Nevertheless, my project (until our environment increases the Wasm binary limit) and I would hope many others would greatly benefit from reduced Wasm binary sizes.

It is possible that the environment we are deploying to, the Internet Computer, increases the Wasm binary size limit. I do not know when that will happen or how high the limit will be. But even in that case, the compile time when including the frozen stdlib (even sometimes just one import using the module bundler that I created) is quite long.

Unresolved Questions

I am not sure how effective my suggestions will be and which path RustPython would like to take, but I hope that some combination of module bundling, tree shaking, dead code elimination, or something else will allow us to produce concise Wasm binaries with frozen Python modules.

@lastmjs lastmjs added the RFC Request for comments label Oct 5, 2022
@lastmjs lastmjs changed the title [RFC] Reduce frozen Wasm binary size [RFC] Reduce Wasm binary size Oct 5, 2022
@youknowone
Copy link
Member

I agree we'd better to invest to this point. Current RustPython binary size is not very fit for common wasm usage.

@lastmjs
Copy link
Author

lastmjs commented Nov 10, 2022

So I've got some data on binary sizes to share. I've tested out just a couple examples. What's shown below is the before and after binary size when adding vm.add_frozen(rustpython_pylib::frozen_stdlib()); inside rustpython_vm::Interpreter::with_init.

Project 1

Wasm binary size before: 8.2 MiB
Wasm binary size after: 19.7 MiB

Project 2

Wasm binary size before: 5.5 MiB
Wasm binary size after: 16.9 MiB

The conclusion from this simple experiment shows an increase of ~11.5 MiB. This is far too large for our environment currently, the maximum we can allow is 10 MiB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Projects
None yet
Development

No branches or pull requests

2 participants